May 31, 2014

Visualizing - Executed Offenders in Texas

I happened to stumble on this thread in HN, which was trending. The thread is on executed offenders in Texas since 1982. It has totally 515 persons in its list and the sheer number of people in the last few decades was what surprised me, and that too in a single US state. My personal interest in tracking crime in India(viz coming soon) led me to do a quick analysis of this data and some of the numbers were interesting(not to mention 515 in itself!).

Note that this is purely an experimental visualization and my intentions are not to play with the numbers of the dead. Also, I would highly recommend the readers of this post to read the comments in the original thread at HN to get some very interesting view points.

Now off to some charts...I hope all of them are self-explanatory and do not need any commentary.

Number of people executed in the Age Group

People Executed by Year

Top-10 Counties with maximum executions

Tag Cloud of the Last Names of the executed people

Executions by Race (Note : I do not know why the data contained this facet!)

May 21, 2014

Visualizing Funding of Companies in India

It all started with me trying to understand the funding scene in India and how companies are getting funded - at which stages, how much, from where and who are the primary investors. With this, the hunt for data started and culminated in the Crunchbase Exports(as on 1-Apr-2014). The data was structured well, but OpenRefine was used to cleanup the data - cities with typo in their names and different cases were clustered into simple buckets. Other than this, no other manipulations were done. The data, for India, mainly starts off from Jan-2005(and ends at Mar-2014) and there are a total of 1150 records contains various details of the investments made. I do not think this is an exhaustive list, but it was a good start to looking into it and getting answers to some of my questions.

Lot of cool visualizations can be done to capture the various insights from the data, but I think histograms do a pretty good job and are readable to a vast majority. Lets proceed...

The first was to understand the spread of companies across cities, and without even thinking twice, Bangalore simply wins with the maximum number of companies. A slightly distant second is Delhi(this includes Noida, Gurgaon etc - if the details matter to the reader).

It is imperative to know the funding obtained across cities, and here too Bangalore wins with 4.6B$ and Delhi comes a close second at 4.4B$.

Fortunately, Crunchbase contains the details of the funding type and the other associated details. The spread of funding - as in, the type of funding and the count of it was an interesting thing to see and followed the expected patterns of Angel being in the top-slots.

But, it is important to know how much money do these different funding types bring to the table, and the patterns just got reversed with Angels going off from the top slots. I think, it would be extremely useful if Angel occupies the top-slots - this would signify that the startup ecosystem has no dearth for money and many startups are getting benefited due to angels; it is to be understood that the quantum of money involved in one particular round of Series-b(and above) is substantially more and is not to be compared with that of Angels.

The following chart would be useful as it superimposes the number of companies with a particular funding and the sum of the money raised in that type.

Probably, the following chart would best show the point above. It average money involved in a particular fundting type and shows the average and the maximum in such a category. Seriec-C+ has an average of 45M$ and the max is 200M$ (Tokyo's SoftBank investing in InMobi); whereas Private-Equity has an average of 49M$with a maximum of 300M$ (USA's Quadrangle Group in Tower Vision).

The top investors are listed in the below viz with Tiger Global Management(TGM) being in the first slot with 718M.

But the above number starts making more sense when the reader knows that TGM has invested only in 10 rounds whereas IDG ventures has invested in 48 rounds.

Half of the investments(7.5B$ out of the 14B$ invested since 2005) are primarily coming from USA, with India itself coming a close second and many other developed nations occupying the tail.

With the above, it is an added bonus if we known when the investments came in and does this have any bearing. Though I have not yet done any correlation of when the investments came in(i.e which quarter) and the eventual success of the company, it is interesting to observe the pattern in the following chart. Q1 clearly is the winner with the maximum funding and also the max companies getting it.

And finally, if that was a histogram(bar-chart) overdose, lets use a Sankey Diagram to visualize the money coming from different countries and flowing into companies situated in different cities in India. This graph is actually interactive and width of the arcs shows the amount of money involved in the funding round and clicking on it takes to details - but for the sake of this blog post, a screenshot of it should probably end this analysis.

Click on the Image to view it in  full size.

May 15, 2014

Elections - Lok Sabha 2014 Analysis : Trivia

Word Cloud of Candidate's Family/Last Names

Word Cloud of Candidate's First Names

Word Cloud of the Political Party Names

Youngest Candidate : 
Ravikant Yadav  . IND. 21 years. JAUNPUR, UTTAR PRADESH .
Oldest Candidate     : 
Ram Sundar Das . JDU. 93 years. HAJIPUR, BIHAR

Youngest Crorepati : 
Farooq Khan . BSP . 25 years. JAIPUR, RAJASTHAN.
Oldest Crorepati     : 
Lal Krishna Advani. BJP. 86 Years. GANDHINAGAR, GUJARAT

Constituencies with Max Candidates  : 
Constituencies with Least Candidates : 
Candidate with the Longest Name:
Venkata Swetha Chalapathi Kumara Krishna Rangarao Ravu [ 54 Years old. YSRCP. Vizianagaram, Andhra Pradesh ]

Also,you might like:

Elections - Lok Sabha 2014 Analysis : Criminal Cases

Total Number of Candidates : 8234

Number of Candidates with Criminal Cases : 1398
Assets Held by Candidates with Criminal Cases 

Rs. 10,734 crores

Number of Convicted Candidates : 29

Assets Held by Convicted Candidates : 

Rs. 112 crores

Top-10 Candidates with cases against them (party and education mentioned along)

Cases vs Party
Clean and Accused Candidates in Parties

Percentage of Candidates who have cases Pending against them across Parties

Convicted Cases across Parties

Top-10 States with maximum number of Cases
Top Constituencies with Max Cases (Kanyakumari and Thuthukudi, which are the top-2(with 350+ cases) - candidates belonging to AAP, have been removed for easier readability)

Cases vs Education of Candidates

Gender and Age vs Cases

May 14, 2014

Elections - Lok Sabha 2014 Analysis: Money Power

Number of Candidates: 8,234

Total Assets Declared

Rs. 40,300 crores or Rs. 403 Billion

Total Liability

Rs. 3,255 crores or Rs. 33 Billion

Number of Candidates by Age-Group

Assets Declared by Age Group

Top-10 Richest Candidates
Spread of Assets by Education
Assets Declared by Gender and Age-Group

Assets Declared by Gender

Assets by Party

Top-10 States by Assets Declared

May 12, 2014

Book Review : Everything I Ever Needed to Know about Economics I Learned from Online Dating by Paul Oyer

[This article was published in The Hindu Business Line on 12-May-2014]

When apples cannot be compared with oranges, is it pragmatic to compare and learn about economics from the world of online dating? Can the adventure of finding a prospective spouse online follow the trends of supply and demand? Does the process of buying and selling have any correlation with the way you choose your date? How do you think a book could be structured where there is no talk or special treatment of commodities, and economics is explained in a context where no money is being transferred?

"Everything I Ever Needed to Know about Economics I Learned from Online Dating" is a result of what happens when an economist and a Professor who teaches and sees economics everywhere sits down and correlates his experiences and observations with the world of Online Dating.  Paul Oyer, the author of the book, is a Professor of Economics at Stanford Graduate School of Business and has around two decades experience in economics training.  The book published by the Harvard Business Review Press is divided into ten chapters wherein the author reflects upon different facets of online dating and economics in simple terms. Key microeconomics concepts like search, signaling, adverse selection, cheap talk, statistical discrimination , thick markets and network externalities along with apt story telling makes it a mellifluous read.

At the very beginning of the book, the author is careful enough to assert that the partner market is one where both sides have to settle for each other, just as in the job market. It's different from grocery shopping, as in the latter the groceries don't have to love you back. Lying, the non-cooperative part of game theory, coupled with exaggeration is often seen on dating websites. Factors like appearances (especially height and weight), income often digress from the actual, and there is an honest attempt where everyone wants to seem as attractive as possible and not fatter, poorer and uglier.  When talk is cheap and profile inflators have incentives to tell lies, then would you lie to beat your competition? Without getting into the morality of this practice, and the questionable veracity of the information, the author points out a few examples of offline verification in China and Korea which generates the right 'signal' for the prospect but also is an expensive process. Companies under pricing shares at an IPO to signal quality and make it easier to raise more cash in the future is one such that signals you really mean what you say.
Demand, the most crucial concept in economics, is driven not by the product but by other users, and hence demand is driven by demand, and this phenomenon of network externality is best cited using the example of an online dating site where no one wants to be the lone user using the website. A product has a network externality if one added user makes the product more valuable to other users, very similar to malls and singles bars in the physical world. And hence size of the market also matters, in what is being termed as 'thin' and 'thick' markets where the options available in the market also drive the buyers and sellers suitably. Wider set of opportunities leads to shopping around and the author introduces us to the sly technique of 'exploding offers' which can lead to some interesting dynamics.

Oyer calls the entire world of online dating to be a game of hidden information and 'statistical discrimination' - people invariably end up hiding information about them which can lead the reader of their profile to be judgmental, but he is careful enough to point out that people act in a manner that hurts members of a certain group though they have no negative feelings toward that group. He cites examples in the real world from locking your car while riding through a poor neighborhood to racial profiling in airport security, where some sort of statistical discrimination continues to exist and can work in people's favour at times. This is primarily due to statistical relationship and correlation rather than the individuality of the person involved. But before you cringe with the examples, he cautions and observes that the detrimental effects of stereotyping are pervasive and substantial. 

The chapter on Positive Assortative Mating is probably one of the most interesting chapters in the book wherein the author states by citing various research methods that the 'best' always pair of with the 'best', when it comes to physical attractiveness, income, race, education etc and this is essentially a non-random scheme and can be easily ordered. Though we might end up being paired with people who are more like we are in terms of characteristics, the author gives us insights into negative assortative mating, wherein people at the opposite end of spectrums might end up being productive for the organization when paired together as guilt and shame can come into the frame forcing high-output from even a poorer performer.

The author talks about good looks, education and higher salaries in what could be a controversial penultimate chapter and states quite bluntly that 'it pays to be attractive' as studies show that attractive people end up getting paid better. There are brilliant insights in this chapter on how completing a year of education can actually lead to better pay and how education indeed has a big causal effect on the money that you bring back home.

Author's sense of humour is reflected all throughout and especially in the last line of every chapter. His treatment of the subject is highly compassionate and the light hearted story-telling makes it a worthwhile read. The lack of a prescription to succeed in the world of Online Dating and anecdotal evidences all the way from online advertising bidding wars to dynamics in a homosexual couples, keeps the reader engaged even if the reader is clueless with the dating lores. The formatting of the book necessitates credits as it is done in way such a way that understanding is not an afterthought while reading. The multi-paradigmacy style is often seen as a winner in contemporary books and the author has furthered his analytical repertoire by cross correlating economics with other societal practices and structures. 

Selected Lines:  
When picking a life partner, I don’t get to pick the best one available. I get to pick the best one available who picks me back. In this way, the online dating search process is much more similar to the job search process than it is to the house hunting process.
My partner is truly wonderful. If I kept looking, I could probably do better. But I have to earn a living, make dinner, practice the piano, and do a bunch of other stuff. So I’m going to settle for this person and move on with life. It could certainly be a lot worse.