August 10, 2006

Thought Fashion: Are You In or Out?

It's true. I peeked.

Yes, I downloaded the AOL files. And I peeked. Why? Because I wanted to write this blog post and I wanted to see for myself what sort of gestures people were making as they searched for porn or socks or how to bury their pet birds or wives they'd just killed. I also needed to see the form the data was in. And I'm a voyeur just like everyone else in and around this story, and I wanted to rubberneck my way into other's private intellectual spaces.

But it's not right. The part where I and every news outlet, blogger, reader and looky-loo has been engaging in, judging people by their searches, making assumptions and behaving as if we ourselves have never made any searches or expressed any thoughts that would not look funny to someone else.

It's also not right because the data is personally identifying. Reporters have been tracking down people based upon their searches. It's not that hard, if you yourself are a good searcher.

What was it Bob Blakely said? About how "dragging all human behavior into the public is literally totalitarian." He is the chief security and privacy scientist for IBM's Tivoli Systems. "If you erode privacy, you erode liberty, because people don't tolerate things going on in front of them that they don't approve of." I was struck by how succinctly he answered the question that is always asked of people who object to the government or some other large and powerful entity as compared with you: What do you have to hide? If you're not doing anything wrong?

Every article on AOL's mess up that says something like AOL's Disturbing Glimpse Into Users' Lives is buying into this whether they know it or not. Thank you CNet for reaffirming our intolerance.

Let's get clear on the definition of "aggregated" data. For us geeks, we use this term often, as we reassure those whose data we work with that aggregation means we are removing anything personally identifying, and placing it with other user's data, so that it's just a pile of anonymized data that could never be distinguished by the person. An example might be the aggregation of all the searches on "dog," where who did them is removed but we know that 38 people searched on that term during a particular hour and day.

But users don't think that way. They hear the word, aggregated, and they think the data handlers are aggregating everything the system may know about just them, specifically and personally, and lumping it all together. Talk about miscommunication. And it terrifies the non-geeks.

What we really should be saying is that the data is "anonymized" and therefore you are safe. AOL's data was not safe because it was not anonymized, and for users, it was their definition of aggregated.

The AOL data which lumped each user's searches together with a user ID over three months, making profiling and finding them easy, meant that AOL provided enough data in some cases to indicate a lot about who the data related to very specifically. Leading to judgments by the rest of us. About the people who do or think things on the edges of society.

And why is this wrong? Because it hurts people. It makes them feel defensive about their own thoughts and ideas.

So, well, if you aren't doing anything wrong, what *do* you have to hide? Well, everyone has something they do or think about that would be an edge thought or that in one context would be in the middle, but in another, must be defended as it resides on the edge. And that would be disapproved by someone. Something the rest of society might not tolerate.

Intolerance leads to the totalitarian. We, the human race, have been intolerant since the beginning of time. What we are intolerant of is a moving target depending on the fashion of the day. In the 30's in some places it was fashionable to be intolerant of Jews and gays. In the 40's it was Germans and the Japanese, and in the 50's communists and socialists. In the 60's it was civil rights proponents and hippies and in the 70's liberals. In the 80's we were back to communists, and in the 90's it was Hispanics (remember all the state propositions outlawing them from medical care?). And what is it today? Islam? Are thoughts you think today and the cultural references associated with them that are in the middle going to fall to the edges in the next decade?

We have used the fear of all these intolerable people and their thoughts as excuses to hunt for more proof of their intolerableness by surveilling everyone in society and searching through all the detritus of our lives. With digital data more available, we think we can find the proof we need in these edge thoughts. And then we will persecute the people having them. And what better way to do it now that the internet, ISPs and heavily used search systems can provide one or another level of very personal, thought data. Search terms, or a database of intentions as John Battelle has talked about so much, are one slice of your data that tell a lot about you. And if we can get it in a neat little file, machine readable and searchable and quantifiable, then well, why not?

If you believe that sacrificing freedom to keep freedom is the way to go, then you probably don't see any problem with demonizing people who have thoughts you don't like. Especially if those thoughts are in the form of passing gestures such as search terms plugged into a browser.

But until we decide (or default) into a Minority Report society (and change our constitution), we are not yet convicting people for thinking things. Everyone has had the thought that they'd like to kill someone once or twice in their lives. But people, the vast vast majority, don't do it. The idea that we demonize someone for searching on this, which is a gesture I would put into the fleeting thought category for almost everyone, is taking an edge thought, which we all have from time to time, and putting it firmly under the scrutiny of the middle. I believe we really only want to find people who make serious plans to hurt others, or actually carry it out. That is what our law it based on, and the premise of our society. But to track everyone, their searches, their every digital gesture, and expose it in one or another ways is going to be troublesome. And it begs a question I've asked before: is your digital identity your personal intellectual property? Is your Google identity yours or someone else's? And by extension, is your clickstream a personal expression (carefully chosen and shaped by you)? In other words, can you copyright your clickstream and exert ownership?

There are at least two choices. One of them is to do what we are doing now: have ISPs and search services collect this data, and when asked by the government, have it turned over. But that means the data is still in many ways secret. Of course the companies don't want the data getting out because it is proprietary. And neither does the government, because they don't want anyone to know quite how much is out there about you, in case you are trying to cover your tracks or you want to defend yourself. But having all the data, the government has the upper hand. And secrets are powerful. How do you show, if you are being accused of something based upon your searches, that everyone else searches on those same things too? That it's actually a social norm? If you can only ask for your own searches to defend a case against you, and not everyone else's, in order to compare yourself to it, you won't be able to argue social norms which judges rely heavily on when making decisions.

But there is another choice. And that brings up the Attention Trust premise (I'm a Board Member) which is that people own a copy of their own data, no matter where they do things: purchases, Google searches, or AOL clickstreams, or anywhere else you might land in a browser on your computer. As a co-owner of your data, you can take it anywhere and do what you wish. There could be many business models built upon this data controlled and shared by users. Google takes all the data they collect and plugs it into AdSense. If lots of users took their own data and made it available voluntarily, a new and more 'open source' style AdSense could be created.

But much more importantly, something like Steve Gillmor's Gesture Bank, where users opt-in their clickstream information, in an anonymous form, exists to open up this kind of data. The Bank will make the aggregation of anonymized data available to anyone for any purpose. While this may lead to businesses working from this pool of searches and clicks, it also means that a growing pool of data is there to show the edge thoughts and potentially unpopular ideas people may exhibit. The pool can be used to defend against totalitarian efforts to single out in secret those who are out of fashion politically. Which may turn out to be you. Or someone who uses your computer.

That I think is far more important than an open source AdSense, though a business built upon this data would likely justify and make a better case for us to have a Gesture Bank of ideas and thoughts that support political freedom.

Seth Goldstein and Steve Gillmor already offer Root. net users the opportunity to put their data into the Gesture Bank if they wish, though any person can contribute to this anonymous pool of user data. And for that matter, attention streams can be sent to multiple services.

And, at the October 4 Attention Conference, Steve and Seth will announce Attention Soft. Stay tuned.

Posted by Mary Hodder at August 10, 2006 12:21 PM | TrackBack