As I’ve been working on my digital essay, one thing I’ve been struggling with is how to show what the #tweetku community, or any hashtag community, looks like, without forcing readers to go experience for themselves.
I could try to depict this hashtag public by saying it has X number of contributors, or X number of tweets between Date A and B–but that doesn’t feel especially persuasive or meaningful. So, a portion of my research has been spent trying to figure out the best ways to incorporate and present quantitative data from social media sites.
The digital essay I want to contribute this week is a report on Twitter data visualization by Marc A. Smith, Lee Rainie, Ben Shneiderman, and Itai Himelboim, entitled, “Mapping Twitter Topic Networks: From Polarized Crowds to Community Clusters.”
A quick summary: in an effort to better understand how Twitter political conversations happen, these guys analyzed many different conversations visually using software Node XL in order to recognize patterns:
Our approach combines analysis of the size and structure of the network and its sub-groups with analysis of the words, hashtags and URLs people use. Each person who contributes to a Twitter conversation is located in a specific position in the web of relationships among all participants in the conversation. Some people occupy rare positions in the network that suggest that they have special importance and power in the conversation.
What they found was that, through visual analysis, they were able to recognize at least six different kinds of network crowds.
- The Polarized Crowd, which features “two big and dense groups that have little connection between them.”
- The Tight Crowd, which is “highly interconnected.”
- The Brand Cluster, which is made up of similar topic-driven commentary from many disconnected participants.
- The Community Cluster, in which a popular topic has devolved into several, separate hubs of communication.
- The Broadcast Cluster, in which “many people repeat what prominent news and media organizations tweet.”
- The Support Network, which has a similar premise to the Broadcast Cluster except that the organization at the center or hub of the conversation is also replying and responding to many of its disconnected users (think of big business Twitter accounts that try to solve issues for their clients via Twitter).
This report has been extremely helpful for me in at least two ways: first, it gives me a set list of types of communities to compare #tweetku and other hashtag publics to, as well as ways of discussing the implications of being one of these community types. The #tweetku hashtag public, tiny as it is, is definitely a “Tight Crowd” community with many highly-interconnected members that all use the same or similar hashtags and respond to one another and with very few isolated members.
Second, this report gives me a better sense of methodology and resources–now that I know this kind of visualization is possible, I will be able to do it for myself. Unfortunately, Node XL is only available on Windows, so until I drag myself to the library for a day, I won’t be able to use that software.
Luckily, I’ve been able to find similar, if not quite as intensive, resources online. Using ScraperWiki, I’ve been able to get a lot of data–information about all of @TheTweetku’s followers, and information about every single tweet that includes the hashtag #tweetku or #tweetkuchallenge since April 22nd (unfortunately, it won’t let me look back farther than that).
With that information downloaded as a spreadsheet, I can then use Google Refine to clean up the data–fix it so that it catches all of the @mentions and #hashtags independently, cluster the locations together as much as possible, and edit it so I can export a new spreadsheet with only the necessary data.
Once I have that, I can use either Google Fusion Tables, Raw, or Gephi, to do the work of creating a visual element to display the data. Here’s what I’ve come up with so far:
- A map of @TheTweetku followers
- A look at the #tweetku community–the #hashtags (yellow) or @users (blue) involved, sized by frequency (this is visualization most similar to the original report–see the “Tight Crowd” interconnectedness?)
- A list of #hashtag publics associated with #tweetku, arranged by frequency
Although my attempts at data visualization aren’t anywhere near as grand as the report I’ve shared, I wouldn’t have known how to do them much less that they were possible without having read it. That’s a reason worth sharing, if nothing else.