Can Twitter be a good data mining tool?

Twitter has evolved much from its humble beginnings as what was simply a microblogging service. As people’s needs grow, we often also find innovative applications for existing tools and services, and Twitter is no exception. For instance, when I saw my Twitter friends exchanging messages as if they were on a public chatroom, I was at first taken aback. But now it’s the norm. And the system even supports exchanges of messages outright, whether private or for public consumption.

There are still a lot of potentially big uses of Twitter. For instance, people have been attributing the trending topics (which are popular mentions of keywords and #hashtags) to discussions that are currently popular. And now there’s the proposal to use a different syntax to insert other metadata into your Tweets. Zembly founders Todd Fast and Jiri Kopsa have proposed this on

Twitter Data is a simple, open, semi-structured data representation format for embedding machine-readable, yet human-friendly, data in Twitter messages. This data can then be transmitted, received, and interpreted in real time to enable powerful new kinds of applications to be built on the Twitter platform.

Here is an example Twitter Data message:

I love the #twitterdata proposal! $vote +1

The variable $vote which is under the subject #twitterdata then gets an incremental addition of 1 point. The idea is to use Twitter as a vehicle for data, which can be mined using other applications (via Search or API).

By proposing an embedded data format, our goal is not to turn Twitter into a mere transport layer for machine-readable data, but instead to allow semi-structured data to be mixed fluidly with normal message content. To these ends, we have chosen a syntax that conceptually resembles the use of Twitter hashtags, albeit with different syntax and semantics, and which allows humans to interact with data in a reasonably normal way.

To see an example of this at work, is displaying a widget that shows the “votes” cast for the idea, both affirmative and negative.

I think there is merit to the idea. Twitter itself doesn’t necessarily have to have this functionality built-in. But with the popularity of such a syntax, third party apps will more easily be able to mine raw data from tweets.

Do you think this is a good idea? Or is there already an existing mechanism for doing exactly this without the need for introducing a new syntax or format?

13 thoughts on “Can Twitter be a good data mining tool?

  1. Twitter might become one of the best tools for finding out what is happening in the stream of consciousness conversations that are happening on twitter, with some powerful tools to help you work it all out.

  2. I would agree because information available are instant and if used properly we can transform this into real-time data mining tool. The great thing about twitter is accessibility of real-time information, available at the moment of happening to whole world.

  3. I have not caught the Twitter Fever but for those I know who have it seems like a great networking and informational site. I plan on getting aboard the Twitter Train
    Jd Webb

  4. Twitter is still very young. It’s really hard to say where they will be 5 years from now. Of course, I am hoping for the best, since I LOVE Twitter so much

  5. Yes Twitter infact has got lot of followers itself and the latest big fish is Barack Obama. He himself has his own profile there. Check out!

Comments are closed.