Regional quirks: Datz phila on Twitter
A guy from Philadelphia may have to concentrate a bit to understand the speech of a California surfer dude, and vice versa. But they'd do fine with each other's written words, right?
A guy from Philadelphia may have to concentrate a bit to understand the speech of a California surfer dude, and vice versa. But they'd do fine with each other's written words, right?
Maybe not - if they're using Twitter.
A new study from Carnegie Mellon University, based on a sophisticated computer analysis of 380,000 tweets from 9,500 users, found that the famously brief electronic messages can come in distinct regional flavors.
You think a joke is hella funny (meaning very funny)? Good chance you're from Northern California.
Type suttin instead of something, and you might be from southern New England or New York.
And Philadelphia? Surprise, surprise. The region's cyber-vernacular includes a term that doesn't belong in a family newspaper. It's an abbreviation that means the sender is cracking the (expletive) up - basically a cruder version of LOL (laughing out loud).
The researchers, members of Carnegie Mellon's computer science department, caution that their results come from just a week's worth of data in a rapidly expanding, fluid medium. More analysis is needed to nail down the regional uniqueness of specific words or terms, the scientists wrote in a paper describing the results.
Still, when they presented their findings this month in Pittsburgh, at the annual meeting of the Linguistic Society of America, they caused a bit of a sensation.
"It's a lot of fun," said Sali Tagliamonte, a professor of linguistics at the University of Toronto, who heard the presentation. "We can actually see regional dialects."
The researchers analyzed tweets that were sent from mobile phones and had a "tag" specifying the sender's location. They eliminated users who follow more than 1,000 people or who had more than 1,000 followers, in order to filter out commercial and celebrity sources.
They did not tell the computer how many regions to look for, said Noah A. Smith, one of the authors.
"You let the model discover how many regions best explain the data," said Smith, an assistant professor at CMU/Language Technologies Institute.
Each time the authors ran the model through its paces, it started by conducting what was essentially a random search of the data - applying statistics to determine the number, location, and usage patterns of regions. So it came up with a slightly different result each time, generally about a dozen regions, Smith said.
Interestingly, some of the regional variations in Twitter did not correspond with the regions that are defined by accents and other differences in the spoken word, said lead author Jacob Eisenstein, a postdoctoral fellow at CMU.
For example, though the model identified a Philadelphia-centric region, the "cracking up" abbreviation was also prominent in Pittsburgh and Cleveland - two cities where you'd get a blank stare if you asked for a wooder ice.
It's not clear why certain Twitterisms don't correspond to traditional dialect regions, but it could be related to the users' social networks, Eisenstein said. For example, a Pittsburgh-Philly link in word usage might have arisen with people who went to Pennsylvania State University, he said.
Other "words" that were common in Philadelphia, as well as the rest of the Northeast, included datz for that's and mayb for maybe. Not surprising, the analysis also discovered certain abbreviated place names were common in Philadelphia-area tweets, such as philly, pa, and nj.
The regions were determined by analyzing messages from 80 percent of the users. The researchers then tested their model by using it to predict the locations for the remaining 20 percent.
They were able to predict a person's location with a median error of about 300 miles.
That's not quite the level of precision claimed by the fictional professor Henry Higgins, who boasts in the musical My Fair Lady that he can listen to a man's accent and tell where he lives within six miles.
Still, it's something, said Tagliamonte, the Toronto scholar.
"This is like the 21st-century version of Professor Higgins," Tagliamonte said of the computer model. "It's on a grand scale. We can actually see regional dialects."
Though Higgins would surely be appalled at - how might he put it? - the informal nature of the discourse on Twitter.
An actual sample from Philadelphia: "yooo datz crazy wit u boy."
Not mentioned in the paper was whether the authors find themselves using any regionalisms in their own tweets. Eisenstein is from central Jersey, but you'd never know it on Twitter.
"I don't post anything," Eisenstein said. "I just read stuff."