What would your biography look like if it was written by Google? Would Google emphasize the same aspects of your character and life story that you would?
If a set of court cases currently working their way through the Spanish justice system is any indication, Google would likely select the moments in history when you were at your lowest, splashing them across the book cover.
Imagine you run a childcare facility. A group of deeply religious parents spot a picture of Richard Dawkins on your desk, and decides you are Satanist and likely sexually abusing their children. Questionable psychiatrists, practicing the respected art of hypnotism, find plenty of repressed memories in the kids, and an indictment is secured. Prosecutors, doing their job, train the children to cry on the stand. After a month of hyped-up media coverage of the trial, the jury finds in your favor, the case is dismissed, and no grounds for appeal are found.
How would Google see this set of events?
Likely, it would look something like this:
How Google sees the world. |
This isn’t strictly Google’s fault. Rather, it reflects the priorities of the media and bloggers, both of whom share humanity’s well-understood cognitive biases. We prefer negative news, tend to believe the first thing we hear, and suffer from confirmation bias. This informs the priorities of media and, more importantly in the case of Google, the blogosphere and web forums which churn out millions of hypertext links. These links inform the main algorithm Google uses to rank the search results it presents to users. At a very high level, the higher the number of links to a particular piece of information, the higher that information will be featured in search results, your Google biography.
The problem with this scheme is obvious in our hypothetical case. Anyone running a search for the name of your child care center, or, worse, your name, will be bombarded by links to news stories reporting the unsubstantiated claims of the children and parents, along with your desperate pleas of denial. The few web pages that cared to report the verdict of the case will likely be buried far past the top ten links, well beyond the first pages of results. In terms of a biography, the result of the case would be a minor footnote in the chapter detailing the allegations.
In most countries, this is all fine and dandy. After all, the news reports are more or less factual, and even if some bloggers stepped over the line into libel, you would have to go after them one by one. Not so in Spain. Spanish law contains a concept known as “the right to forget,” which I roughly understand as giving individuals the right to request biased historical aggregations of information about them be corrected or removed. Metaphorically, a biography must present one’s full life, not just one side of the story. The concept is separate from libel law, as the information in question may be perfectly factual. Instead, it focuses on privacy and fairness. While the individual news stories are perfectly legal, an aggregated history must not merely present out of date information.
Two interesting questions come out of the Spanish cases. First, do search engines and other aggregators of information have any responsibility to present a balanced picture in the results of a query? More fundamentally, how much should we change laws to reflect technological realities, and how much should we demand technology bend to laws, even if such bending is sometimes technologically impossible?
In the U.S., the legislative and judicial history related to computer technology has been largely dominated by attempts to force reality to confirm to legislative and judicial ideals. Classic examples include the Digital Millennium Copyright Act, various export regulations on cryptography, and attempted filtering schemes (i.e., opt-out censorship). Historically, I’ve been rather dubious of such efforts, doomed as they are in their goal of constraining information. Information may not literally want to be free, but the people who desire it haven’t shown themselves easily constrained by law. On the level of ideals, I find attempts to stop the free flow of information to be attacks on free speech and expression. Why can’t I describe an cryptography algorithm in code if I can print it in a book? Many of the regulations have this level of insanity associated with them.
However, the Spanish cases have made me partly reconsider. Google and other aggregators of information would certainly provide a better service if their results weren’t consistently one-sided for many types of search queries. Technologically, the problem of “balance” is unlikely to be completely solvable. Any process to determine balance is likely be either overly bureaucratic or prone to censorship. But if I am ever so unlucky as to go through a smear court case, it would certainly be nice if the first 20 pages of Google search results didn’t just present the accusations.
In the end, it is probably better to roll out the typical liberal solution, and call for better education. Making sure everyone is aware of their own cognitive biases and understands how to use a search engine properly would in the long term, (and in an ideal world), partially solve the problem without setting dangerous precedents that might lead to information censorship. But I’m not as convinced as I once was that there isn’t a little room for remedies involving the law. I would be interested to know what you think.
There is of course the option of revenge, which works only if carried off with finesse. A recent news item reported such a turning of tables by a parent woken up at 4AM by an automated phone call from the school system. More appropriate to this post are the many celebrations of revenge in book and film, an exemplar of which is Sydney Pollack's Absence of Malice
Posted by: narayan | January 24, 2011 at 11:29 AM
The temptation to begin this comment "Before I stopped robbing banks..." is strong, but ultimately I'm a wimp. I haven't stopped robbing banks. I mean, I never started robbing them! Oh, crap.
How can I disagree with a call for better education and for a clearer awareness of what makes search engines tick? There is also need for a complementary adjustment: the technology itself needs to work better. Lately, Google itself has taken a few hits (pun?) for producing results laden with spam, a problem purportedly addressed more thoughtfully by young upstart engines. But the fact of the matter is that these search engines have never been useful outside a fairly narrow range of simple research projects, such as shopping.
The call for "balance," however, is misplaced. It can't be produced technologically, at least not without also misrepresenting the data. The state of affairs in the world is what your diagram depicts. That is balance. Nobody is "bombarded" with anything. These are just the results. Compare shopping online. If I'm shopping for coffee makers and my searching produces dozens or hundreds of models--multiple color options, functions and features, price points--have I been bombarded with results? Or have I been presented with an opportunity for choice?
But a picture of Dawkins on your desk at a childcare facility. That's weird.
Posted by: Dean C. Rowan | January 24, 2011 at 12:13 PM
Here's a personal counter-example from the last two days - of bad news being superceded by good. Please excuse it if you find it tangential.
I did a search on 'Manu Joseph' merely to check on the man's credentials and was surprised to find a link to my adverse review of his book at the bottom of the first page of results. A few hours later I repeated the search and the link had risen to number two. A few minutes ago I found that it was now at the bottom of the second page.
What's going on Cyrus? Does this mean that the rank of a link on Internet search results is to be modeled as a Random Walk process? Could the process of ranking be Markovian and, by implication, memoryless?
Perhaps an easier tactic than revenge is to flood the net with feel-good items in sufficient number. Would that be a workable remedy? If so, what are its parameters?
If you have a scientific explanation I'll gladly have it off-line.([email protected])
Posted by: narayan | January 24, 2011 at 12:42 PM
Since I have had no response from Cyrus let me add that I don't know what the fuss is about. Google comes up with NOTHING pertinent to my searches with all possible combinations of keywords from my Sotomayor post of three years ago or less. In effect, it is lost to posterity.
Posted by: narayan | January 26, 2011 at 05:12 PM
Sorry not to get back to you until now Narayan. To answer your question, the way that Google handles requests can result in different results, even from second to second. Basically, various sets of machines that have slightly different copies of the index used to generate results are used to handle queries. Each query that is made is routed semi-deterministically to one of these clusters, such that the same query can produce different results at different times. If a query doesn't have that many hits, or if the hits are similarly ranked, small differences in these clusters can account for the changes in rank you report.
It's interesting that you bring up the concept of random walks, as they play the central role in how Google's page rank algorithm works. Indeed, the page rank of a document is, more or less, the probability that an infinitely long random walk across the entire web will finish on that particular document. This is a memoryless process in the statistical sense, but that can give the mis-impression that topology doesn't matter. To the contrary, topology plays a very important role, as a document with more incoming links is much more likely to be hit by a random walk than one with only a few links. If a document is closely connected with a highly connected site, then it too is more likely to be reached by the walk.
This provides a partial answer to your question: flooding is hard, as for the flooding to be effective two things need to be true: first, the pages you add need to be linked to by others. If they sit there, all by themselves, they'll have almost no effect on the final page rank of the target document. Second, they need to be linked to by more than just the other flood documents, else they will still have very little effect. Indeed, Google filters for graph components that are internally well connected, but contain almost no links from the outside. I'm not familiar with how that process works, but the idea is to protect from flooding, such that what you suggest isn't possible.
Page rank is only one of the many factors Google now uses to generate the listings you see, but I believe it remains one of the most important.
To summarize, one negative post isn't going to slander someone with a top position result in Google. But a media blitz of bad publicity over accusations would certainly dent search results for a long time to come, particularly if the person affected isn't normally someone who produces news (i.e., a non-celebrity). I would guess it is a marginal situation, but life must be hell for those it affects.
Posted by: Cyrus Hall | February 01, 2011 at 10:55 AM
A bit more on the emerging European concept of the right to be forgotten:
http://www.theatlantic.com/technology/archive/2011/02/in-europe-a-right-to-be-forgotten-trumps-the-memory-of-the-internet/70643/
Posted by: Cyrus Hall | February 03, 2011 at 05:04 PM
Very interesting, Cyrus. So will this be achieved upon explicitly requesting that certain data be removed from the internet? By individuals or the courts? Will this apply to something as inconsequential (albeit embarrassing at a later date and in a new light) as a youthful photo or reckless speech? Or will it be something more life altering such as false criminal charges which later proved to be groundless? I can understand the "right to forget" being a legitimate quest for the latter scenario but not the first. Europeans may value their privacy more than others but I doubt that it is necessary to scrub the 'net of all offending (but harmless) items.
Posted by: Ruchira | February 03, 2011 at 11:21 PM
I think that what you describe is currently lacking in the media, and derivatively, Google, exists already: it's Wikipedia.
Sure, it also has its own biases, but given that the media works better on negative news than positive follow ups, people should just go to your Wikipedia page to check if the accusation your child care facility got are really true or not. Wikipedia is almost obsessive in recording the most up-to-date facts.
Of course it's not the best example, since the way to handle biographies of living people is quite a heated topic of debate within Wikipedia itself, but for sure I think that crowdsourcing would help balancing the way these things are portrayed in the media.
Posted by: Giovanni | February 19, 2011 at 07:19 AM