Saturday 9 January 2010

How can I delete embarrassing stuff from the Internet Archive?

Have you ever tried to locate something on the internet you know you previously read, but can’t because it’s no longer there?

I’ve recently come across a website that will be very useful when I try to recall stuff that had been posted, but was subsequently taken down or otherwise removed by the website owner. Is it a British site? Come on, you must be kidding. No, it’s based in an office somewhere perhaps around 300 Funston Ave, San Francisco, CA 94118. This is the address that appears in the Archive’s privacy policy. The funny thing is, however, that when you use the Google Maps “Streetview” tool, what you get when you ask to visit 300 Funston Ave is an image of a Christian Science Church, not an office block.

So is the Internet Archive run by a charitable organisation, a church, or by some higher power?

Ok, so I have no idea who really runs this site. But I do know that it’s “Wayback Machine” can be used to locate and access archived versions of the web site. Although the public facing version of the site explains that “we can't guarantee that your site has been or will be archived. We can no longer offer the service to pack up sites that have been lost. We recommend using the Warrick Tool.”

I wonder what would happen if any of the staff have done some moonlighting and archived other pages that have appeared on the web. Or if any of the staff have been given an order by the US Department of Homeland Security, perhaps citing the PATRIOT Act, requiring it to archive a bit more stuff. Dunno, and I shouldn’t ask, really. I don’t like asking questions if I haven’t already got a hunch about the answers.

The Archive assures people who want to have their site's pages excluded from the Wayback Machine by explaining that it “is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine.”

The Archive explains that it “collects Web pages that are publicly available the same ones that you might find as you surfed around the Web. We do not archive pages that require a password to access, pages tagged for "robot exclusion" by their owners, pages that are only accessible when a person types into and sends a form, or pages on secure servers.”

So, if I were an Internet Archive employee and wanted to be a bit naughty and do some moonlighting, I suppose all I would need to do is re-write the computer program to delete the bit about ignoring pages tagged with robot exclusions. That bit might be simple. Not sure about unencrypting material placed on forms or pages sent to secure servers, though.

Why was I looking at this in the first place? Well, towards the end of last year I was (professionally) involved in an incident which, within 24 hours, had pushed the war in Afghanistan, the model Katie Price and the media personality Jordan completely off the front pages of all the serious newspapers and media outlets in the UK. (I now know what Gordon Brown must feel like on a bad day). And today I’ve been surfing the net to locate some colourful images to supplement the inevitable set of PowerPoint presentations that I’ll be delivering about the incident. What surprised me was the amount of information still available about the bloody thing. And what shocked me was the coverage given to it in Wikipedia. I thought to myself, just how will anyone be able to rescue their reputation if this stuff is never to be allowed to die? I mean, I work for a large company, and yet almost one fifth of it’s Wikipedia entry has been taken up by information about that one single incident. Outrageous.

If anything, the internet has totally re-written the rules about the dissemination of digital media, and the rights (or lack of rights) that people have to remove content which has been given undue prominence. If I were a criminal, perhaps I could rely on the provisions on the Rehabilitation of Offenders Act, which allows certain criminal convictions to be spent, or ignored, after a set period. I wonder whether the Internet Archive will adopt an equivalent policy?

On the one hand, I hope it won’t. Because try as hard as I can, I still don’t want anyone to forget people like Lord Jeffrey Archer, and what he got up to in the past. Perhaps if he were to renounce his peerage, I might be persuaded of the view that private people deserve a private life. But I don’t hold to the view that celebrities should automatically be entitled to airbrush out of their past material which is of a commercial disadvantage to them.

But on the other hand, where an individual (or a company) has been caused embarrassment of damage to its reputation in a wholly inappropriate way, then I fail to see why the internet should be allowed to make a permanent record of it. They may publish, but I may perish – and if I did, I might be very very, sore about that.