Web Keeps Growing, but ‘Link Rot’ Erodes its Roots



That the Internet and its human-facing web (or “World Wide Web”) are in a constant state of evolution, expansion and renewal borders on being so obvious that it is perhaps not worth pointing out. But all of that growth also is accompanied by some loss, a fact many people might not fully realize – until they click on a long-bookmarked web page and get back the dreaded 404 error.

Google CEO Eric Schmidt famously said in 2010 that humanity was generating as much data every two days as it had from the dawn of time up through the year 2003. That claim later attracted skepticism but conveyed an essentially correct idea: we are churning out information at a rate barely conceivable even 10 years ago – more than two exabytes each day, according to this infographic from the New Jersey Institute of Technology.

How much information is that? Maybe you have purchased a portable backup hard drive in the last year or so. Laid flat, you would need a stack of those about four times the height of Mount Everest to store two exabytes. Again, that is just one day of our collective information output.

All that furious growth, though, obscures an important fact about the web: that it is also in a constant state of disintegration and decay.

Every day, some content is being lost. Some of that loss is by design. Information becomes outdated and either is replaced or removed from publication as part of a project plan. Even if the information itself is still valid, other factors may cause it to be deprecated. Personnel within a group responsible for maintaining a website may change, and material once deemed essential to the website’s mission may later be judged to be less so. Thoughtful web design leaves signposts to redirect visitors who click on a long-trusted browser bookmark months later, but that best practice often is ignored, exacerbating an Internet-wide problem known as “link rot.”

Some link rot is unintentional, and there are many reasons why it occurs. For example, a study tracing the online presence of the Occupy movement of 2011-12 found that only 41 percent of the 933 Occupy-related websites selected for study in December 2011 remained operational in April 2014. Some websites simply had vanished, while others had been co-opted by unrelated groups or interests.

Nearly 140,000 articles on the English language Wikipedia have been flagged as containing web links that are no longer functional, and those are just the ones that have been identified by the online encyclopedia’s giant corps of volunteer editors.

Recognizing that the web was developing into a vital record of modern human civilization, the nonprofit Internet Archive launched in 1996 and began to save copies of every publicly accessible website it could reach. The result was the now-famous Wayback Machine, which enables visitors to reach into the web’s past and visit sites as they were up to 15 years ago.

That same archiving technology is now also available through the Internet Archive’s web archiving service, Archive-It, which enables universities, state archives, government entities, museums, and non-government nonprofits to archive effective “snapshots” of websites as they exist at a given point in time on the world wide web.

It may be true that information can live forever on the web, but that doesn't mean it will.

