When a detective thriller by a man no one had ever heard of came out a few years ago, computer scientists made a surprising find. An analysis of the text suggested the author was not a man after all, but the famed creator of Harry Potter, J.K. Rowling.
The technique they used is called stylometry — identifying authors by their word choices and writing styles — and it is back in the news with allegations that 76ers general manager Bryan Colangelo operated five anonymous Twitter accounts.
This time, the writing analysis came with a 21st-century twist, according to the sports and entertainment website the Ringer, which posted the story Tuesday night. Evidence that the mysterious posts were the work of one person was further cemented by the fact that these accounts followed similar groups of other Twitter accounts, the Ringer said. The Sixers have launched an investigation, as the author of the posts criticized players and disclosed sensitive team information.
The Ringer article did not disclose who performed the analysis that identified Colangelo, only that it was someone in the field of artificial intelligence — simulating human intelligence with computers. Nor did the article say what data-crunching methods were used. But computer scientists who read about the findings said that in general, determining if the same author is behind multiple writing samples is a straightforward problem — provided there is a big enough sample of writing to analyze.
"It wouldn't be that hard," said C. Lee Giles, a professor at the College of Information Sciences and Technology at Pennsylvania State University.
Proving that the author was Colangelo himself, on the other hand, would be tricky, as writing styles can be imitated, Giles said.
Various types of text-analysis software can be used to help identify authorship, relying on such characteristics as how often certain words are used, and in what order and position. The more sophisticated approaches rely on a technique called machine learning — meaning that the analyst "trains" a computer model to recognize hidden attributes in a given author's writings, then compares those results with the attributes of other writing samples.
One such method, called Doppelganger Finder, was developed by Drexel University computer scientist Rachel Greenstadt and colleagues. She said she has used the software to analyze online postings by cyber criminals who were discussing the theft of credit card numbers, and that it also has been used by the FBI.
As for the Twitter accounts said to be the work of Colangelo, Greenstadt said making the case would require more than just finding similarities among the five accounts. To validate the findings, the analyst would then want to show that the writing styles from those accounts were not only similar to one another but also were different from those in a sample of other posts on the same general topic of basketball, she said.
"You might want to compare a group of similarly followed and liked basketball personality-type people on Twitter," she said.
Even then, such findings fall short of proof. In the case of Rowling, an analysis in 2013 strongly suggested that The Cuckoo's Calling, said to be written by Robert Galbraith, was in fact written by Rowling, based on the writing style in her novel The Casual Vacancy. Rowling soon admitted this was the case.