Wednesday, December 26, 2012

Big Data, Small Bets - Avoid Seeing Patterns Where None Exist



Guest Post by Darden Professor Robert Carraway
Big data and small experiments—what could appear more seemingly incongruous?  Yet the truth is: these two trends, one from the world of analytics, the other from the world of innovation and change, can beBig Data: water wordscapepowerfully combined to drive sustainable success in a highly uncertain world.
Big data is a product of the technology revolution that is now well into its third decade. Thirty years ago, sophisticated analytical techniques promising extraordinary insights were lacking but one thing: the data to inform them.  The promise was clear: if you simply start measuring and tracking everything, from minutely segmented sales and resource usage metrics to every conceivable macroeconomic variable of remote interest, we will be able to identify all manner of relationships, correlations and insights, the net result of which will be the capacity to much more effectively and efficiently allocate resources to take advantage of opportunity and drive results.
The message was received, loud and clear.  In fact, perhaps too loudly and clearly.  Today, huge volumes of data, touching almost every aspect of the world in which we live in, are available at virtually the click of a mouse.  Want to know the relationship between sales of industrial cleaning systems in western Pennsylvania and capacity utilization in that region’s steel industry?  Not a problem.  Need to improve customer satisfaction?  Simply cluster customer complaint data by category, and while you’re at it, build a simulation model that accurately projects the impact of pulling any of several levers, all informed by the reams of data at your fingertips.
And therein lies the problem.  Two problems, actually.  First, analysts can no longer use the excuse, “MY analytical technique would work great if only we had the data on which to apply it” (emphasis on the “MY”).  There is now sufficient data to inform many analytical techniques and approaches.  So in lieu of the obstacle posed by lack of data, we now face an even more challenging set of issues: “Which technique(s) should I use?” and “What is this telling me?”  Which segues into the second, even bigger problem created by big data: you can probably find a technique and set of data to support almost any story you wish to tell.  Never has the aphorism, “There are lies, damn lies and statistics,” been more appropriate.  Everywhere we look, we see something, and if it’s not what we like, we simply look elsewhere until we find it.
One of our brain’s oldest and most valuable capabilities is the ability to recognize patterns, even when they aren’t there.  A colleague of mine at theUniversity of Virginia used to conduct an experiment in class.  He would ask one student to use a random-number generator to produce a list of random numbers, all between 1 and 10.  He would ask another student to compose, on her own using only her perception of what random means, a list of “random” numbers, also between 1 and 10.  He would then have a third student present the two lists to him, without telling him which was which.  My colleague says he could always identify which list had been truly randomly generated and which had come from the student’s mind: the one with all manner of apparent patterns and repetitions was the truly random list.  The student, on her own, would assiduously avoid any patterns or repetitions.  This is because we are wired to believe that any apparent pattern must have an identifiable cause, and thus cannot be random.  Hence, any time we see anything remotely resembling a pattern, we assume there must be something non-random causing it.
This capability was (and perhaps still is, in many contexts), a wonderful survival mechanism: when in doubt, assume the worst and run away.  Far better to think you see a pattern that isn’t there than to dismiss a pattern as random variation only to find out (to your extreme detriment!) that the pattern was actually caused by a predator.
So, you have a brain that is predisposed to “see” patterns that aren’t really there.  On top of that, this same brain has now developed sophisticated analytical techniques that enhance its ability to “see” such patterns.  Before you know it, if you’re not careful, you have embarked on a high profile, resource-intensive organizational initiative in pursuit of a pattern you think you have “seen”, only to discover that it wasn’t really a pattern at all, simply an artifact of randomness.
How do you protect yourself from falling prey to this trap?  Herein lies the role of “small experiments” that my colleague, Jeanne Liedtka at the Darden School, and others have explored.  An “experiment” differs from big “data mining” (the term used for mucking around in big data for whatever you can find) in that it is deliberately constructed to test whether or not a perceived pattern is real or a figment of our overactive imaginations (and analytical tools).  A carefully constructed experiment can make you far more confident that what you have spotted is real and therefore actionable.  The key to a good experimental design is to “stack the deck” against your pattern being real.  If it emerges as still present in the results of your experiment, you are far more likely to have discovered something of potential value.
Why “small”?  Given our enhanced proclivity to “see” patterns (thanks to our analytical tools), the existence of big data enables us to identify a myriad of potential opportunities.  If exploring each requires a significant investment of time and resources, we are far more constrained in the opportunities we can pursue.  Plus, we face far more pressure to “guess” correctly, i.e. identify a priori which patterns are most likely to be real and lucrative.  By keeping experiments “small”, in terms of time and resources, we can explore many more possibilities much more quickly.  The net result is an organization that is in constant innovation mode, rapidly discarding ephemeral leads and doubling down on ones whose fundamental assumptions persist.
How do we conduct small experiments?  The precise design of the experiment is highly context dependent, and often requires reaching out to the market.  But here we come full cycle: we can often use the same big data we mined for our initial potential insight to in turn test that insight.  By generating insights on a subset of our data (possible since we have so many of them), we can experimentally test these insights on “holdout” samples (data we deliberately do not use—or “hold out”—from our original search).  If the pattern persists over the holdout sample, we can then move to test it further by designing an experiment to gather new “live” data, facilitated by precisely the same techniques used to collect the big data to begin with.
Companies like Progressive Insurance and Capital One who have built successful businesses around their ability to mine big data are well aware of this virtuous cycle.  Their current competitive advantage stems from an ability to move rapidly through this “big data, small experiments” cycle.  Their use of sophisticated analysis and search techniques is informed by their understanding of the danger imposed by our inherent capacity to discover patterns that aren’t real.  Hence, they have deliberately developed an ability to effectively and efficiently use small experiments to sift and sort potential opportunities.
What does this mean for your company?  “Analytics” is a hot topic right now, induced by the existence of big data on which to apply its many and varied tools.  Many view analytics as promising to transform their business and innovation processes, enabling them to move quickly, nimbly and effectively.  But with every blessing comes a curse: if ye seek, ye shall find.  No matter how effectively you act on something that’s not real, you won’t get where you think you’re going.  The enhanced ability that exists today to spot patterns and identify potentially exploitable relationships must be accompanied by the ability to do some good, old-fashioned “fact checking” in the form of small experiments to confirm assumptions and hypotheses.  The million dollar question is, “Is it real?”

No comments: