I came across a pretty interesting stats program called user insight
in Japan that was recently released by UserLocal who are also the creators of Japan's cell phone analytics program ugokuhito
At first glance, user insight looked pretty normal. Heat maps, page views, gender analysis...wait gender analysis? Yeah, apparently this product analyzes what percent of your users are male/female. It also gives you information on the age range of your users (example: 20% in their teens, 30% in twenties e.t.c).
I don't know of any other stats program (except nakanohito and ugokuhito) that does this so the first thing that came to mind was how are they getting this information and is it reliable? If accurate, this kind of information would be extremely valuable. I would want to know age-gender demographics because it helps me build more usable sites but also is helpful information to present to potential advertisers so they know their target audience.
So how did they do this? After reading the article by ITMedia
and a few other blogs, here is a theory.
1) User Local has a list of about 400 IPs of known corporations and organizations around Japan. (It could be more). Some other blogs said they used whois information or google maps API and matched it with IP location and registered business of that address. Either way, they some how have a list of IPs for many organizations in Japan.
2) In Japan, pretty much in any bookstore, you can easily buy thick books on corporate demographics (either for stock holders or in books for job seekers).
3) You put the two together and you have a database with 400 organization's sex-age demographics and their IP.
So if you get a visit from xx.xx.xxx.xxx and this matches with ABC all girls school, there is a pretty good chance that the visitor is female and is a student. Or you might get a visit from company XYZ, average worker age 35 and 70% male workers. Using this method you probably could also do industry analysis (IE: you get 80% of your visitors from the IT industry).
Obviously, only a small percentage would match the database but doing this you end up with a method to calculate gender-age demographics based on probability. Sort of like TV ratings or Alexa that takes a small sample of the population.
If this is their method, my sympathies go to the poor intern that did the data input but nonetheless, I thought it was pretty impressive and thought I'd share. If anyone else has any other thoughts on how they did this I would love to hear it.
I know MSN runs this
but that is based off of their massive user base (I think).