December 16, 2010

Mozilla Open Data Visualization Competition

The following are the results of my analysis of the data from the Mozilla Open Data Visualization Fall 2010 contest. I downloaded witl_small.tar.gz (from "A Week in the Life of a Browser - Version 2" ) which contains a sample of the data for my analysis.

I did not download the full set as my bandwidth has been crappy for sometime and also was running out of time. (The queries can be run on the FULL data though.) Mozilla has provided many attributes related to the various activities on FF (and there are SO MANY of them!). Since i stumbled on this contest pretty late in the game, i was unable to analyse ALL the attributes/dimensions. I preferred tackling a few questions in good detail than analysing many dimensions without much depth.

So my analysis consists of the following 4 visualizations which try to answer 4 different questions.

Tools Used : Protovis, HighCharts, Python, SQLite3 (Excel was used for Preliminary analysis/data cleansing)

1) What is the Web usage pattern of people of different age groups?
  Or in other words, What is the average number of hours spent by someone who is 30 years old?

2) Is their a correlation between the number of years being associated with Firefox and the number of hours spent on the Web daily?
or in otherwords, do people who have used Firefox for 3-5 years or more, spend more number of hours using the Web Daily?

3) What kind of bookmark activity do people do who are associated with Firefox for a number of years(we analyse *only* those who use any of the bookmark feature)
i.e, how is the bookmarking creation/choosing/modifying spread among the bookmarking operations?

(In the above chart, you will find 3-6m column being empty - the reason being, there was no data for this in the sample - i hope that the same is present in the full data set).

4) How do different age groups function w.r.t various features on the Firefox?
Note: this chart is to be read vertically - i.e, for a given feature, lets say Private Mode, which is the age group which uses this feature often? You will find that on viewing the column Privatemode, the age group 18-25 has the darkest color, which means that this is the age group which uses the feature often. Hence, the color gradient from the lightest to the darkest encodes the least to most often used.

[I have deliberately avoided explaining the interpretations and understandings - as I believe that the numbers speak for themselves. However, any doubts in the charts can be explained]

My 2nd entry to this competition can be found here.
My 3rd entry to this competition can be found here.
Post a Comment