And all this time we thought our Internet speak was universal — connecting sea to shining sea for a big jolly Twittery universe.
Nope, say researchers at Carnegie Melon University in Pennsylvania, who went totally 2011 on the linguistic community and created a program that would sort through 4 million words worth of Tweets and organize them by location. (“Because no human is going to be able to plow through 380,000 Tweets,” says a spokesman.)
According to the all-knowing Oracle bot thingy, Northern Californians say “hella” too much (duh) and the freaks of Lake Eerie make upside-down winky faces with their tongues sticking out — ;d — while constantly discussing BIEBER and CHIPOTLE.
But that's not all. Wait until you see the top Tweets from Los Angeles:
Here are the examples published in the study, under five random categories the researchers chose (we don't really get why).
Basketball: #KOBE, #LAKERS, AUSTIN
Popular Music: #LAKERS, load, HOLLYWOOD, imm, MICKEY, TUPAC
Daily Life: omw, tacos, hr, HOLLYWOOD
Emoticons: af, papi, raining, th, bomb, coo, HOLLYWOOD
Chit Chat: wyd, coo, af, nada, tacos, messin, fasho, bomb
OK — so many hilarious things to talk about here. Omw is an obvious tribute to the fact that we're commuting 99 percent of the time, with nothing to do but Tweet from our smart phones under the steering wheel like the master cop-evaders we are, and of course Angelenos would take any sign of raining as worthy of a 140-word rant. And you gotta love Cholo shout-outs like papi, nada y tacos.
But uh, does anyone else think this feels suspiciously like Twitter if Twitter took a time machine to the '90s? Come on — fasho? Bomb? Coo? Who talks like that anymore? However, according to the data-collection stats in the intro to “A Latent Variable Model for Geographic Lexical Variation,” all the Tweets were taken from last spring. How embarrassing.
In case you don't believe us, direct from the study:
The main dataset in this research is gathered from the microblog website Twitter, via its ofﬁcial API. We use an archive of messages collected over the ﬁrst week of March 2010 from the “Gardenhose” sample stream, which then consisted of 15% of all public messages, totaling millions per day. We aggressively ﬁlter this stream, using only messages that are tagged with physical (latitude, longitude) coordinate pairs from a mobile client, and whose authors wrote at least 20 messages over this period.
Byron Spice, media guy for the university, assures us we could very well be an exception to the rule.
“It's not like, gee, everybody in L.A. uses these terms, but if you look at the map [pictured below], you can see, based on the colors — you can see outliers,” he says.
Spice explains: “These are all items, topics, terms that are more commonly seen in L.A., as opposed to New York or Boston or something. If you see this in a Tweet, you would start to think, this might be somebody in L.A. or from L.A.”
Apparently, when an Angeleno wants to say “very,” we use af (that's “as fuck” to you, grandma). When a New Yorker, on the other hand, wants to say “very,” they use deadass. Where we use u, they use youu. Other common New York 'net slang includes cab and oww.
Fascinating stuff, right? But what does it all mean?
According to Jacob Eisenstein, one of the researchers: “One thing I think that it shows is that people really have a need to communicate their identity — their cultural identity and their geographic identity in social media.”
Some more awesome linguistic hypotheses from the university's press release:
Studies of regional dialects traditionally have been based primarily on oral interviews, Eisenstein said, noting that written communication often is less reflective of regional influences because writing, even in blogs, tends to be formal and thus homogenized. But Twitter offers a new way of studying regional lexicon, he explained, because tweets are informal and conversational. Furthermore, people who tweet using mobile phones have the option of geotagging their messages with GPS coordinates.