May 29, 2011
Discussion: Building for a Personal Data Ecosystem - A Case Study
Just left the Quantified Self conference where I led a session in the last breakout on "building for a personal data ecosystem." Since we weren't on the official program, i was very happy to be holding something in an Infinity session. Fifteen or so people came, and I talked about Personal Data Ecosystem Consortium and our mission for a user centric data model where user's control their data through agents, or Personal Data Stores. I also mentioned what I was seeing at the event, which was lots of folks building apps, making new silos of data, and repeating the model where users' data is in question as to who owns it, and users don't really have access to their data except through the a service's website and possibly an API that might send a little data somewhere else (like twitter or facebook).
I suggested that in a Personal Data Ecosystem, apps makers could take data from their users and send it straight through to the users' Personal Data Stores (PDS). That way if the app or hardware changed or ceased to support their old systems, the user would have their old data to play with in their PDS. And I talked about open formats for the data (think.. what about an open format for Heart Monitor data, where you pulse is described and you can take that data anywhere). Services could think about just providing a great service, instead of trying to manage all the user data storage and security. Users would control their data in their Personal Data Stores/Lockers/Banks, and I said that a bunch of companies were building these PDSs, including Sing.ly which is building the Locker Project.
Sing.ly happened to have someone there, Jared Hansen, who is a developer in the open source project. And there was a guy from Basis, Bashir, who is building hardware (like a wristwatch) that you monitor things like your heartrate with.. though it does monitor many other things as well on your body. We also had a couple of health researchers there, plus other health and wellness companies looking at data, as well as Ian Li, of Carnegie Mellon who is researching data collection and normalization, and a woman from the EFF. And we had a couple of users who talked about what users need.
After a few minutes, Bashir from Basis explained their dilemma around the hardware which isn't all that profitable for them. So initially they were questioning what to do with the data and how to monitize the company. Should they sell the data, or give it to users, or charge uses for it, or give it away to developers who could create a great ecosystem by building lots of apps, thus driving more sales? And who's data is it?
WOW. WOW!!!!
So we were off an running, with the impromptu Basis use case of how to get the value of the data, include the user and let the user have choice and autonomy, and how to leverage what is being done out in the marketplace and with developers creativity with data. Oh.. and don't forget about participating in microformats and Activity Streams creation to make bottom up grass-roots standards for the data formats and exchanges.
We talked through what it would mean to give away the data, support users and ask them if they wanted their data included in studies, get additional revenue for Basis while maintaining the inclusion of the user in the process and what developers could and should do. We brainstormed a lot of things, and covered the good and bad points of how it would all work and how to support Basis' market model while still being good and fair to the users.
I have no idea what Basis will do, but I would love it if they would join the Personal Data Ecosystem Consortium in the Startup Circle, to help build out ways to make a user centric data system for user's wellness data collected with Basis hardware.
What an amazing opportunity Basis has for doing the right thing for users, and leading the wellness and personal data ecosystem by creating a win-win for themselves and users. They could create a new market for wellness data, that is user driven.
Frankly, we need more discussions like this. It's not about Do Not Track models where we kill all the data plus the value of it, and it's not about "business as usual" where the user isn't included and businesses do whatever they want with user data.
It's about creating markets that do right by users and have companies making money ethically and conversing with us in the market.
Thanks to everyone who came! We had many representatives of the relevant stakeholders and the discussion was enlightening and rare.. but one I hope to make more common in the near future!
May 28, 2011
Where is the Personal Data Awareness? And what are the Missed Opportunities at QS2011
I'm at the Quantified Self Conference in Mountain View today and tomorrow.
A few thoughts. There are lots of people here from various disciplines: health care, tech companies like 23andme.com that marry personal genomics and tech, apps makers and health and wellness hardware makers. And lots of folks just wanting to track themselves.
Sessions are preprogrammed (in other words, the conference is all done top down broadcast mode), and now and then in people's statements, a person will pass along the vibe of the old style medical industry (that is: we know more than you and we'll tell you what's true.. that mode was in the opening session where we were lectured to). Though I just walked through all the sessions in round 1 and the individual break out sessions are more discussion mode which is great to see.
There was a near complete lack of consciousness about protecting user's data as I walked in and spent a few minutes in each of the first 6 sessions. The impicit assumption was that "we" (builders, companies, etc) can take data and use it for whatever "we" want. Building systems that aren't just about more silos with data lock-in, or building for a Personal Data Ecosystem model where users keep their own archives and data, and then choose where their data goes, what purpose it's used for and control what is happening isn't on the radar. It is especially important that we look at issues of privacy, control, autonomy, choice and transparency for the highly personal, very sensitive data collected around personal wellness and health.
There is a single session, led by lawyers about privacy in round 2. But the rest of the sessions do not seem to be aware at all that they need to build from concept on for privacy, data control by the users, where users keep their data and the applications, devices and monitoring tools "use" the data with permission.
And there is no session about personal data control, where the QS apps would work on a Personal Data Store. I've asked to have one.. but we'll see if they decide to let me do it. The assumption is developers will just build more silos with more data collected, about you, crossed with other data about you, that after combined, creates yet another silo of data. There may be an API available, but effectively, the data is stuck in another silo, that a regular user can't really get at it, hold it, control it, share it, correct it or delete it.
It's dismal.. thinking about how all this highly personal data is just assumed to be owned by apps makers and companies and users are just cows in a big milking system. The participants of QS are just continuing the tradition started by the health industry and continued by tech company silos in making the users say "Moo." Pick your ecosystem and prepare to be milked.
Lastly, I'm really happy to report that the QS organizers decided to order a really healthy vegetable lunch salad (with either chicken or tofu on it).. Great work on that front!
May 13, 2011
McKinsey's Research Arm Claims Big Data Mining Will Save Us All

Steve Lohr has a write up in today's NYTimes: Mining of Raw Data May Bring a Surge of Innovation about McKinsey & Company's report on Big Data: The Next Frontier for Innovation, Competition and Productivity.
I think we need to challenge assumptions about the inputs... compare the inputs from "hoovered" personal data to that of what people assemble in personal data stores operating in a Personal Data Ecosystem.
Execs from Rapleaf and Intellius have admitted publicly, recently, that they know half their data is bad, they don't know which half. I also sat recently with the woman from Experian who is in charge of segregating and keeping separate data from the internet (verses financial data which is regulated) for their offerings about users. When I posited that a lot of her data was likely wrong, she agreed.
User's obscure their data intentionally because they are scared.
For myself, I can tell you that in the last few years, I have obscured data online (birthdate, zip code, name, address, phone number, preferences, email addresses) as well as health info (not to my doctors, but to data collectors whom I do not trust yet claim they never share the data. For example, you can't get a mammogram in SF / Children's Hosp without sharing a huge amount of very personal data.. so i made it all fake because I don't trust the lab and who they sell the data to...). And I fake it to the pharmacy when they ask for more than my basic info to fill a prescription. In fact my current insurance company has my name and birthdate a little wrong and i'm not correcting them.. because it makes it harder to aggregate my data across systems. Oh.. and my bank spells my name: Hoddler .. and has a slightly incorrect address (don't you love how they key in the wrong data!) and i'm not correcting that either.
I fake all sorts of stuff on and offline... I fail to correct bad data... I know many others do too.. I have since 1994 been faking my data online. Somehow even then, without understanding the privacy issues or how the internet worked then, I just didn't trust the system because I knew then we had no privacy protection in this country (US). As I began working with online technology in 1997, and started really understanding it, I've felt more than ever the need to obscure my data and make it difficult to combine in a pivot about me.
I get that this security by obscurity and mistakes doesn't cut it, but it's the best I can do right now.
So my question for the McKinsey research people is: have they factored this in?
And have they factored in that users have obscured enough information that me at one site cannot be aggregated with me at another site?
Or have they factored in that the people at institutions who key in the data from our driver's licenses get it wrong (my bank with my name and address) or the insurance co (my application correctly filled out.. with my name and DOB) or whatever?
The answer is to give us proper protections for our data. 4th amendment protections and rights over sharing of our data, so that we make sure the data is right. We can aggregate our own data in Personal Data Stores. Then we can trade fairly for that data if we agree to being included in the big data systems McKinsey is saying will help us so much.
I agree big data analytics can help us as a society, but not without good data, and not without including users into the system, as equitable players who deserve to have rights over our data, including choice and autonomy to participate in big data systems.
But until then.. big data is working with databases that are half right.. because we don't have choice, autonomy, rights or protections as users, and that's the first problem with McKinsey's assumptions.
