January 20, 2011

Analytics with Twitter Data

Twitter is one of the largest "data-producers" on the Web presently. Am not sure about the exact numbers of the storage that the tweets require on a daily basis, but a few TBs would not surprise me; add to that the spurts in volumes when there is a controversy or some event happening. All of this leads to interesting data that needs to be deciphered; and also some awsome research work that can be applied to manage the data efficiently for the users and engage them more with Twitter.

When i was looking for possible features that I might actively use, i actually could list a few of them. I am pretty sure that the Product Managers at Twitter would have some of these features in their TODO list, but would be interesting to see when these actually get implemented; or the rationale behind not implementing them.

1. Users who follow you, but you do'nt follow them.
2. Users whom you follow, but they don't follow back.
3. Notification when a user stops following you. I need to research on why Twitter does not have this - was this by design?
4. Trend analysis of users who follow/quit you - based on the tweets that you do.
5. Show the most active users and lazy users - active and lazy are defined by the number of tweets and also the popularity of the tweets.Popularity can also be measured by how much discussion a tweet generates, or how much retweets happen for that tweet.
6. Automatic lists and follow suggestion : when we follow a user, twitter can suggest which would be the most likely fit for a user based on his tweet patterns. The present Suggestion scheme is not all powerful and needs some tweaking.
7. Discover clusters/groups of the followers. Centrality of users - show a graph wherein this relationship can be displayed.
8. Decipher moods/sentiments from the tweets; or other possible natural language processing techniques that can be applied on the tweets to gather interesting patterns or insights.
9. Usage analysis
  a. Based on the day of the hour we can find out do people tweet often during mornings or evenings.
  b. Do people prefer the web or mobile devices for tweeeting. What % of people uses other apps?
  c. Who retweets you often? or what category of tweets by you get retweeted often or generate the maximum discussions.
10. Most famous tweets for the day/week/month - based on retweets, follow-up discussions, celebrity status of the tweeter, number of followers.
11. Duplicate detection of tweets. Also, automatic compression of tweets which fall in a thread. This would help a lot in reducing the information clutter.
12. what is the similarity between two users - based on the nature of tweets. Corollary would be : what topics/categories does a user often tweet on?
13. Better trend analysis.
Post a Comment