Steve Lohr has a write up in today's NYTimes: Mining of Raw Data May Bring a Surge of Innovation about McKinsey & Company's report on Big Data: The Next Frontier for Innovation, Competition and Productivity.
I think we need to challenge assumptions about the inputs... compare the inputs from "hoovered" personal data to that of what people assemble in personal data stores operating in a Personal Data Ecosystem.
Execs from Rapleaf and Intellius have admitted publicly, recently, that they know half their data is bad, they don't know which half. I also sat recently with the woman from Experian who is in charge of segregating and keeping separate data from the internet (verses financial data which is regulated) for their offerings about users. When I posited that a lot of her data was likely wrong, she agreed.
User's obscure their data intentionally because they are scared.
For myself, I can tell you that in the last few years, I have obscured data online (birthdate, zip code, name, address, phone number, preferences, email addresses) as well as health info (not to my doctors, but to data collectors whom I do not trust yet claim they never share the data. For example, you can't get a mammogram in SF / Children's Hosp without sharing a huge amount of very personal data.. so i made it all fake because I don't trust the lab and who they sell the data to...). And I fake it to the pharmacy when they ask for more than my basic info to fill a prescription. In fact my current insurance company has my name and birthdate a little wrong and i'm not correcting them.. because it makes it harder to aggregate my data across systems. Oh.. and my bank spells my name: Hoddler .. and has a slightly incorrect address (don't you love how they key in the wrong data!) and i'm not correcting that either.
I fake all sorts of stuff on and offline... I fail to correct bad data... I know many others do too.. I have since 1994 been faking my data online. Somehow even then, without understanding the privacy issues or how the internet worked then, I just didn't trust the system because I knew then we had no privacy protection in this country (US). As I began working with online technology in 1997, and started really understanding it, I've felt more than ever the need to obscure my data and make it difficult to combine in a pivot about me.
I get that this security by obscurity and mistakes doesn't cut it, but it's the best I can do right now.
So my question for the McKinsey research people is: have they factored this in?
And have they factored in that users have obscured enough information that me at one site cannot be aggregated with me at another site?
Or have they factored in that the people at institutions who key in the data from our driver's licenses get it wrong (my bank with my name and address) or the insurance co (my application correctly filled out.. with my name and DOB) or whatever?
The answer is to give us proper protections for our data. 4th amendment protections and rights over sharing of our data, so that we make sure the data is right. We can aggregate our own data in Personal Data Stores. Then we can trade fairly for that data if we agree to being included in the big data systems McKinsey is saying will help us so much.
I agree big data analytics can help us as a society, but not without good data, and not without including users into the system, as equitable players who deserve to have rights over our data, including choice and autonomy to participate in big data systems.
But until then.. big data is working with databases that are half right.. because we don't have choice, autonomy, rights or protections as users, and that's the first problem with McKinsey's assumptions.Posted by Mary Hodder at May 13, 2011 03:16 PM | TrackBack