Skip to content
Link copied to clipboard

How DNA sleuths identified the ‘Boy in the Box’ after six decades

The boy's body was exhumed on two occasions to extract DNA, in 1998 and 2019.

Colleen Fitzpatrick of Identifinders International speaks during a news conference in Philadelphia on Thursday.
Colleen Fitzpatrick of Identifinders International speaks during a news conference in Philadelphia on Thursday.Read moreMatt Rourke / AP

Genetic databases can hold the clues to solving decades-old cold cases such as the “Boy in the Box,” the 4-year-old identified by Philadelphia police on Thursday as Joseph Augustus Zarelli.

But it takes more than a few computer keystrokes to trace the family tree of a child who died in 1957, genealogist Colleen M. Fitzpatrick said at the police news conference.

“We’ve solved cases in two hours,” she said. “And we’ve solved cases in three years.”

Investigators were stymied by a series of obstacles in identifying the boy who died when DNA science was in its infancy.

» READ MORE: How an at-home DNA kit helped identify the ‘Boy in the Box’

Chief among the challenges: His DNA samples were incomplete and degraded. Yet even after researchers filled in the gaps as best they could, entering his DNA into a database was just the first step, said Fitzpatrick, president of Identifinders International, a Fountain Valley, Calif., forensic genealogy company enlisted by the police.

Fitzpatrick and police offered few additional details Thursday about the genetic sleuthing involved. But generally speaking, law enforcement agencies can identify crime victims and suspects by matching their DNA samples with those of surviving relatives — typically triangulating among multiple people who are partial matches.

It all depends on those four genetic building blocks that students learn about in high-school biology: G, C, A, and T.

» READ MORE: After nearly 66 years and thanks to DNA advances, Philly’s ‘Boy in the Box’ has a name: Joseph

Here’s a crash course on how it works, with an assist from Paul Woodbury, a genealogist at Legacy Tree Genealogists in Salt Lake City. Legacy Tree does not work for law enforcement; it conducts research for consumers trying to trace their ancestry. But the basic principles used by police are the same.

A first attempt at the boy’s DNA

When the boy’s remains were discovered in February 1957, no one had any inkling of identifying him by his DNA.

It had been just four years since scientists famously discovered the structure of this genetic molecule, in 1953. Not until the late 1970s did geneticists develop the first methods to decipher the sequence of DNA, piecing together how its code could dictate our physical traits and the likelihood of developing disease.

Even by 1998, when the boy’s body was first exhumed to collect a DNA sample, forensic genealogy was in its infancy.

» READ MORE: ‘Boy in the Box’: The history of the notorious Philadelphia homicide case

Investigators extracted a type of DNA from his teeth because it tends to be well-preserved. Called mitochondrial DNA, it is passed down through the mother. But that type of DNA represents just a small fraction of a person’s overall genetic makeup, and there were no immediate hits when the sample was entered into the few databases available at the time.

With the laboratory techniques available in the late 1990s, the rest of his DNA was too degraded to be of much use.

Filling in the gaps

By 2019, geneticists had developed a variety of statistical techniques for filling in the gaps in degraded DNA samples.

So the boy’s body was exhumed once again.

This time, Fitzpatrick and her team were able to begin piecing together a more complete picture. She offered few details at the news conference, other than to say it was a challenge.

» READ MORE: This 1975 murder was solved with forensic genealogy, tracing 8 great-grandparents to a small town in Italy

“It took 2½ years to get the DNA in shape, it was so bad,” she said.

Then came the painstaking process of trying to match his genetic code with that of others.

The power of databases

Millions of people have uploaded their genetic samples to genealogy websites in recent years. Law enforcement agencies soon started using these tools to solve cold cases.

But when critics raised privacy concerns, many such sites changed their policies, allowing police to use data only when authorized by users.

At the news conference, investigators did not disclose which databases they used, commercial or otherwise.

» READ MORE: Joseph Augustus Zarelli is only one of Philadelphia’s children whose killing needs to be solved

In any such database, the key in identifying a person is determining how much DNA they share with others. That’s a tricky concept for many to grasp, as the genomes of all humans are 99.9% identical, said Woodbury, of Legacy Tree.

Obviously, we are not all near-identical. The code that makes one person different from the next consists of small snippets that are scattered throughout that mostly identical genome — represented by variations in those basic building blocks, G, C, A, and T.

How much DNA you share with your relatives

For every gene that constitutes our makeup, each person inherits one copy from the mother, and one copy from the father.

The copy from the mother can be either the one she inherited from her mother, or the one she got from her father.

Same thing on the paternal side. For any particular gene, the copy you inherit from your father can be the one he got from his father (your grandfather) or from his mother (your grandmother).

» READ MORE: Philly’s Girl in the Box remains an overlooked mystery

That means on average, a person shares 25% of their DNA with each grandparent, Woodbury said. (Half-siblings share that level of DNA with each other, too.)

But there’s a range, because in each case, it’s like rolling the dice. For most people, the degree of relatedness between them and their grandparents ranges between 17% and 34%, according to 23andMe, a popular ancestry site.

First cousins share an average of 12.5% of their DNA, in most cases ranging from 4% to 23%.

So if any two people in a database are a 20% match, in theory that means they could be first cousins, half siblings, or grandparent and grandchild. Additional legwork is required.

In the case of the Boy in the Box, investigators said they identified cousins as well as half-siblings on his mother’s and father’s sides. Again, details were scant. But when investigators got in touch with possible relatives, Fitzpatrick said, they were able to assemble a family tree.

“There were cousins on both sides of the family that could only funnel to that one individual,” she said.

The process, she said, yielded one unequivocal identity for the Boy in the Box: Joseph Augustus Zarelli.