Monday, May 17, 2010

Google: Goof or Snoop?

Google’s extraordinary acknowledgment that they have been collecting private internet transmissions in the process of constructing their Street View database raises questions that have not yet been properly aired. The don’t-be-evil crew has had its specially equipped cars crawling urban streets around the world, creating the photographic record that enables map searchers to see street-level images when they click on a location. This already alarmed privacy hawks in Europe (where privacy laws are much stricter than here), since it means that everyone’s home, and maybe their car, pets, lawn furniture and messages to the postman, are on display before the whole world. But the new disclosure goes a lot further.

Google says its cars were using the locations and IP addresses of wi-fi hotspots along its route to verify its position, a plausible notion because the data collected from the different cars have to be concatenated, and discrepancies would cause headaches. How then to explain why Google collected not only this minimal information, but also “snippets” of email and the websites users were visiting at the time of the drive-by?

In its repeated public apologies, Google claims that collecting and storing these data was an accident, a coding error, a breakdown in communication between multiple teams collaborating on the project. This is not credible. Even if it were necessary to acquire all the information streaming from these users to identify their location—which seems strange to me, but I’m not a network geek—whatever technique used to extract the few items they needed could have been used to expunge the rest. Can anyone really believe that vast quantities of personal data were not only captured but stored at significant expense simply because of an oversight?

And let’s take it one step further. Google’s business model is to sell your personal data to advertisers, nothing more or less. The ads you see on search pages and to the side of Gmail messages have value because they are tailored to the content of your internet activity. Google captures the content and sells the ads. They have every incentive to capture as much content as they can.

Distilling ad-relevant data from messy, real-life internet use is extremely difficult, which is one reason why Google employs lots of very smart people. Their struggle to constantly improve on this distillation is fed by the massive web data to which the company has access. This is where the Street View cars come in: as a byproduct of their positioning system, they were collecting random web activity which Google could not otherwise obtain.

A key question for investigators, then, should be whether the personal data Google admits to collecting and storing was used by any unit of the company for analytical purposes.

Beyond this, we should recognize the paradox at the heart of Googledom. In many ways this is an admirable company, driven by idealism. Their business model allows them to offer a range of very high-quality services for free. They have a commitment to making the world’s information more accessible to the ordinary user, which is mostly a good thing. But in the final analysis, the money comes from advertisers, and they pony up because Google’s individual-level data permits a degree of customization marketers could only have dreamed of until recently. This generates a powerful financial incentive for Google to undermine privacy whenever the opportunity arises, as it does every time you use one of their tools. This company, and others in the same line of work, needs to be tightly regulated.

6 comments:

cian said...

How then to explain why Google collected not only this minimal information, but also “snippets” of email and the websites users were visiting at the time of the drive-by?

This is probably one of those instances where if don't understand the technology you should refrain from commenting. If the code to verify location had been implemented without any thought of privacy implications (highly likely) by a programmer against a deadline, then this is exactly what I would expect to happen. Yes you can strip out the contents of the packets, and anonymise this stuff, but its extra work.

Its possible that this was deliberate, but it seems unlikely to me as there are much easier and cheaper ways of getting the information if you really want it. If you're going to break the law, FAR FAR easier ways. But legally google has this stuff anyway. If you use google's search engine, gmail, etc - then they have a pretty good idea as to where you're located. And the quality of that data will be much better than the tiny glimpses they would have got from the Street View cars.

Also, the amounts of data involved would probably be quite small by Google's standards.

Don't get me wrong. I think don't think Google are particularly trustworthy, and they need to be watched. Its just in this instance their story sounds plausible to me.

Peter Dorman said...

Cian,

My mind is entirely open on this issue, and I've been hoping that people with expertise, like you, would respond and shed more light.

I agree that Google has far more data on its users through "normal" channels than it could hope to acquire through these roaming cars, but the cars are acquiring a cross-section of all users, not just Googlers and Gmailers. As a data geek in other contexts, I'm inclined to think this could be valuable. That doesn't mean, of course, that Google extracted this value, much less that it deliberately tried to create it.

But this is an econ blog, so I wanted to foreground the issue of incentives.

TheTrucker said...

I think that the issue of advertising and public relations is at the forefront of social science and has been for a long time. I can see no way to "regulate" free speech outside of presentations posing as "NEWS" and actual political lies maliciously delivered by actors undermining the public good.

The protection against lies by political activists and others must be based on claims of damage to the public welfare and damages must be punitive. Faux Noise would not remain in business more than a few months if class action suits against lying were legally sustainable.

I say all of this because the collection of data regarding the general habits of large segments of people is not an invasion of individual privacy. The fact that "large segments" are composed of individuals is unfortunate, but any personal invasion depends on the use of the data as opposed to the existence of it. I don't see any way to "regulate" the gathering of the data.

cian said...

Peter,
but if you think for a moment about how the data would have been collected, you can see why it would be of fairly low quality. Even in a low density suburban area, a car might pick up 20-30 seconds of activity (my guess is it would be much less). That's a tiny sample size, and there's a pretty high chance that there would be no active internet activity occurring, or it would be pretty meaningless to an onlooker.

They deserve to be censured for this, and they may well be charged in the EU as it was illegal here. I just have a tough time believing that this was intentional.

I agree entirely about google, and there has been a general decline in this area (see the complaints about Google Buzz, for example), though Facebook are far, far worse.

Min said...

Ixquick:

http://www.ixquick.com/eng/protect-privacy.html

Min said...

The Trucker: "I say all of this because the collection of data regarding the general habits of large segments of people is not an invasion of individual privacy."

Indeed it is not. For instance, if I use my credit card to make a purchase, information about that purchase is available to the credit card company. If the company makes use of that information by pooling it with information about other users, and the pooled information does not identify any user, then my privacy has not been invaded.

The Trucker: "The fact that "large segments" are composed of individuals is unfortunate,"

Unfortunate????? I do not think that it is unfortunate per se.

The Trucker: "but any personal invasion depends on the use of the data as opposed to the existence of it. I don't see any way to "regulate" the gathering of the data."

I do not know all the ins and outs, but there is what is called the "expectation of privacy". A surveillance camera on a public street corner does not violate the expectation of privacy. One underneath a grate in the sidewalk surely does, even though the sidewalk is a public area. Email is a gray area, because it is sent through a network of computers, and can be read at various points by the owners of those computers and their agents. But intercepting email when you are not part of the delivery network is questionable. And it could be made illegal, if it isn't.

OC, the privacy of corporations is protected to a greater degree than the privacy of individuals. Daddy, can I be a corporation when I grow up? ;)