Big Data Is In Need of a Big Brain

One of the big wide-open opportunities for innovation in the 21st century is around “big data”. Every day, approximately 2 billion gigabytes of data is created worldwide, and it’s growing exponentially (doubling per-capita every 40 months). In other words, we are flooded with information. The challenge is to process that information in a timely intelligent way so that we may live longer happier more productive lives (assuming all those things are not contradictory).

I think the task at hand is similar to the one the human brain does on a regular basis: it takes in several gigs of sensory data a second, and uses only a tiny fraction of that to arrive at facts about the past, present, and future state of its surroundings. As the universal stream of data grows (beyond comprehension), the collective machine brain has to grow with it as it scrambles desperately to pick out the useful bits.

Companies like Google are leading the way in that effort. But of course since big data is flooding every crevice of our lives, every company battling it out for our dollar, will have to invest in some kind of “big data analytics”. There are lots of opportunities for incremental improvements, but there are even more opportunities for futurists and dreamers to write books about the inevitable rise of intelligent machines, who will surely be based on some mixture of neural networks and genetic programming ;-)

The following is a good Google Tech Talk discussion on the current trends in big data:

The Cat and Mouse Game of General Intelligence

Google researchers published a paper that at once fills my heart with hopeful joy and eternal sadness. Joy because people care and are investing resources into developing intelligent systems. Sadness because of how poorly such systems perform 40+ years after the release of Kubrick’s 2001: A Space Odyssey.

The authors of the paper put together a large neural network that ran on a thousand 16-core machines for three days “learning” from a dataset of 10 million 200×200 pixel images.

The task is to train a face detector without ground truth (labeling images as containing a face or not). This task is absurdly difficult and I would even say just plain absurd. It’s like trying to teach a child algebra by giving him addition problems, but not ever telling him how to do addition or what the right answer is. It’s a fascinating and brave question to ask, because of how counter-intuitive it is.

Not surprisingly, the “breakthrough” that the paper touts is a 15.8% accuracy of classifying the objects in one of 20,000 categories. This is apparently a good improvement over the previous state of the art. My question is, in what universe is 15.8% deserving of a New York Times article? Granted it does exceed the approval rating of Congress, but that’s about it.

I don’t mean to be so dismissive. This is an excellent paper that scratches at the surface of an immense mystery: the gap between the most powerful supercomputer and the most primitive human brain? What’s even more exciting is that Google is funding this research and even more importantly putting its immense computational resources behind it.