TechHui

Hawaiʻi's Technology Community

Global Data Inflation and Sexy Statisticians

During an insanely bumpy plane flight today I read an excellent article in the Economist on data inflation and its implications. Apparently, the amount of data generated globally is, "...growing at a terrific rate (a compound annual 60%.)" IDC completed a study last year that estimated the world generated 1.2 zettabytes of data in 2008. A zettabyte is a trillion gigabytes.


Some interesting data points:

  • Experiments at the Hadron Collider generate 40 terabytes of data per second
  • By 2013 the amount of traffic flowing over the internet annually will reach 667 exabytes
  • Facebook hosts over 40 billion photos
  • Wal-Marts transaction database has reached 2.5 petabytes (it records about 16,700 transactions per second)
  • Data management and analytics is a $100 billion industry growing by 10% a year

No wonder Google's chief economist predicts the job of statistician will soon become the "sexiest" around. :-)

Views: 95

Comment

You need to be a member of TechHui to add comments!

Join TechHui

Comment by Konstantin A Lukin on March 31, 2010 at 6:40am
Yea, it is difficult to argue with a book recommendation, though thank you for sharing it.
Comment by Ken Berkun on March 27, 2010 at 1:01pm
Nothing changes. Replace "personal information chef" with "flapper" from Gulliver's Travels. And if you haven't read it, drop Facebook for a day and read it.
Comment by Konstantin A Lukin on March 27, 2010 at 12:12pm
my challenge every day is to be that personal information chef to each of my client...they need very specialized information filtered, abstracted, and often analyzed... something the human is still best at...
Regarding Isotov's comment, I really like how this is worded: 'personal information chef'. It is like cooking quite a bit.. as there is really no 'definite' answer to how data can be mined. IMO humans have an advantage at this given our intrinsic ability to feel/visualize data on different levels. It's like a machine can 'cook' according to a 'recipe', but it is our 'humanness' that makes it imperfect and hence so unique.
Comment by Sonia N. Isotov on March 25, 2010 at 2:50pm
Regarding Lukin's comment .."How do we make sure that people can get to the information they need when they need it, and it is presented in a way they understand?"... being an independent business researcher ... I found this article interesting as well. At first a little worried that I may be closed for business soon, and then realizing that my challenge every day is to be that personal information chef to each of my client...they need very specialized information filtered, abstracted, and often analyzed... something the human is still best at ... fortunately for me ... but then again, it is a luxury purchase for most ...
Comment by Konstantin A Lukin on March 11, 2010 at 4:44pm
I don't think we can really get a neutral organization or retrieval of information, but I think it is healthy to be aware of this when we work with it :)
Agreed. I generally try to use common sense when digesting information. Usually every article contains some interesting points combined with opinions, poor wordings, etc.. I try to piece these together to make sense of the bigger picture.

It is also true that some organizations provide somewhat biased information. Like you suggested, I also try to be aware of that, observing the fundamental patterns of their bais.

It is quite interesting, that human brain has the capability to visualize information, even if data does not have 100% quality/integrity. I try to use mental visualizations to digest data, without getting too involved in gory details. It's almost like starting with a bird's eye view, and then slowly descending into more detailed realms of information.

There is also another technique, where one grabs a certain interesting fact, and works one's way up to understand bigger influences that caused it to happen.

One's search path could be influenced by a lot factors, but IMO it mostly depends on agility of one's mind, capacity of one's understanding, and a general desire to absorb new information.
Comment by Viil on March 11, 2010 at 4:03pm
How do we deal with a mindbogglingly amount of information and no authorities to guide us? Good question. IMO best authorities are us. Collectively we make up for an evolving authority. Otherwise we'll just head towards totalitarism.

I agree with you that a true information democracy system would be ideal, and the Internet is definitely a promising platform for this as more of the global population get access.

As long as quality information is available at our fingertips via a simple search, I think it would be enough for people to digest it one bite at a time.

The catch is that there are several aspects of simple search that can be biased. What influences what kinds of data are going into the searchable data collection, what metadata structure is used to describe and represent the data, the techniques (vectors, algorithms, etc.) used to organize the data, the techniques (weighting?) used to organize the search results, and how the search results are to be presented?
I don't think we can really get a neutral organization or retrieval of information, but I think it is healthy to be aware of this when we work with it :)
Comment by Konstantin A Lukin on March 10, 2010 at 11:54am
How do we deal with a mindbogglingly amount of information and no authorities to guide us?
Good question. IMO best authorities are us. Collectively we make up for an evolving authority. Otherwise we'll just head towards totalitarism.

How can we use technology to provide filtering, guidance, validation, and credibility to the information needed to move towards a sustainable future for everyone on this planet, in a way that despite the rapidly increasing amount of information it is still made easily accessible and usable?
IMO Google is doing a great job providing simple intuitive and free tools for us to use. Like was mentioned earlier, I think everyone needs to get involved, digesting their piece of the pie and presenting it in simple and intuitive fashion, like for a 3 year old child. (avoiding unnecessary complexity)

It's one of those old 'divide and conquer' principles, applied to the whole population, where specific tasks are assigned to specific individuals with responsibilities at hand. Basically every company, agency, etc.. needs to work on their general transparency (without disclosing proprietary secrets :)

For example, take Monsanto Corp, that's been genetically engineering seeds and experimenting on local farmlands. IMO they should make their research more open to the public. Let people provide more input on their activities, vote on possible dangers, etc..

One effect of the self-publishing nature of the Internet is the corrosion of hierarchies and authorities within information presentation (e.g. newspapers
I think newspapers are great in providing certain points of view. However, we should always remember that media is influenced by economy, basically they get paid for making a story juicier. Google News makes a nice organization of stories, etc.. I no longer read printed paper news.

As long as quality information is available at our fingertips via a simple search, I think it would be enough for people to digest it one bite at a time.

Such growing chest of tools should provide for good armor to deal with 'mind-boggling' complexity of the information age.
Comment by Viil on March 10, 2010 at 11:06am
How do we take advantage of this increasing load of potential knowledge? IMO by applying it towards a sustainable future for everyone on this planet. Otherwise this knowledge is quite useless, potentially dangerous and harmfull.

I agree with Konstantin that taking the growing amount of information we have about our planet (and beyond, for that sake) and turn it into knowledge that is applied towards a sustainable future should be the main goal. The bigger question is how do we do this?

Information technologists, like many of us here on techhui, can use this goal to guide us in our work of making information available to people. Projects like Google's Public Data Explorer is a good example of technologists working to make some of the existing information more accessible to the the general public. Others should follow to make sure that there are multiple ways these data are presented.

Presentation is key. A big part of understanding/digesting information and making useful conclusions depends on presentation. I think there is no clear solution for this, that's why everyone needs to get involved, do their part to make sure we are heading in the right direction.

How information is presented can strongly influence how it is interpreted and internalized as knowledge within the individual receiving it. Information will always be presented selectively and subjectively as we humans cannot totally disengage our personal norms and values when making decisions about what information to include and how to present it. Even something as formal as statistics can be highly subjective depending on which statistical methods are being chosen and how the findings are being presented (e.g. scales, context, visualizations).

So, we need multiple "voices" presenting the same information in different ways, something the Internet is increasingly facilitating. One effect of the self-publishing nature of the Internet is the corrosion of hierarchies and authorities within information presentation (e.g. newspapers). On the positive side it undermines information monopolies and opens up for more alternative information sources, providing them with access to a larger audience (e.g. Iran, China). On the negative side it undermines information authorities, putting charismatic extremist leaders side by side with peer reviewed scientific publications. This can increase the potential of some people to get confused by the credibility of simplistic anecdotes presented by charismatic religious leaders versus the often more complex scientific theories that relies on centuries of scrutinized research (creationists, racist organizations like KKK, etc.).

How do we deal with a mindbogglingly amount of information and no authorities to guide us? How can we use technology to provide filtering, guidance, validation, and credibility to the information needed to move towards a sustainable future for everyone on this planet, in a way that despite the rapidly increasing amount of information it is still made easily accessible and usable?
Comment by Konstantin A Lukin on March 10, 2010 at 9:24am
How do we make sure that people can get to the information they need when they need it, and it is presented in a way they understand?
Presentation is key. A big part of understanding/digesting information and making useful conclusions depends on presentation. I think there is no clear solution for this, that's why everyone needs to get involved, do their part to make sure we are heading in the right direction.
Comment by Konstantin A Lukin on March 10, 2010 at 7:27am
Gus & Kostya - Sorry for the bait and switch title. :-)
@Dan - not really sure about the bait part (what are you trying to catch?), but 'sexy' sure did sound exciting :), though not so much in a way it was applied..

To me, 'Sexy Statisticians' almost sounds like the sound of inevitability. It's like we are heading into a world of 'statistics' at high speeds and with our eyes closed..

I understand about the 'Economist' view on these matters, but in addition would be also like to see a 'greener' application of statistics as well, like so many cities are going green, so many communities have become sustainable, etc.. Otherwise the 'dryness' of these numbers is causing global warming in my neighborhood :)

How do we take advantage of this increasing load of potential knowledge?
IMO by applying it towards a sustainable future for everyone on this planet. Otherwise this knowledge is quite useless, potentially dangerous and harmfull.

Sponsors

web design, web development, localization

© 2024   Created by Daniel Leuck.   Powered by

Badges  |  Report an Issue  |  Terms of Service