Solving the Problem of Reference Rot with Web Archiving


Most people who work with online content have become familiar with the problem of link rot, but the online medium also has given rise to a lesser known, more subtle threat to the integrity of information: reference rot.

Like link rot, the threat of reference rot can be mitigated through the appropriate use of web archiving services, such as Archive-It™. First, however, it is helpful to understand how reference rot is different, and how web archiving services can help solve the problem.

reference_rotFrom the Library Stacks to a Click of the Mouse

Court rulings, scholarly works and many other publications throughout the modern eraeven articles on the free online encyclopedia Wikipediatypically cite primary sources to indicate the origin of information or to draw connections to an earlier work on the topic. When done well, a citation allows an interested reader to delve more deeply into the subject and engage in their own independent research.

Before the information age, this was most easily done in a library, ideally with the help of a reference librarian. As publications have moved online, however, the process of following citations has become simpler. A single click of the mouse largely has replaced the need for repeated queries of the library information system (or, in some places, the old reliable card catalog) followed by journeys into the stacks.

That may be good news for information seekers, but not necessarily for authors or other people charged with maintaining the integrity of online publications. Creating good citations in the online medium is arguably more difficult today than ever.

That is because the web is constantly changing; the information an author cites today may be very different six months, six years or 60 years from now. Confusion can reign if the original source material is not preserved at that moment in time.

Online Content Changes Constantly

Part of the problem is link rot. A hyperlink-based citation in one web publication is only useful as long as the resource it points to remains at the web address, or URL, specified by that hyperlink. If the original resource is taken down or relocated to another web address, the citation in the document that referenced it will be broken.

On the English-language Wikipedia alone, more than 137,000 articles (as of March 2015) include citations that have been broken by link rot and been flagged by the site’s readers and editors; the actual number of link rotted citations on the site’s 4.7 million articles likely is much larger. This is not a Wikipedia-specific issue, howeverlink rot is endemic across the web.

Reference rot is even more pervasive. A hyperlinked citation afflicted by reference rot does not lose its destination resource as it does with link rot. Instead, the URL remains valid, but the content of the cited resource itself has changed since the original citation link was created. A cited table of figures, for example, may be updated over time, leading to confusing discrepancies between a passage written, say, five years ago and the newer source material the link continues to reference.

In the print medium, reference rot rarely is an issue for properly formatted references. Most reference styles specify a particular edition, revision or issue of the source in question, which must then be individually retrieved. Updates to such static, printed sources, such as a correction label from a publisher, occur relatively infrequently. For the purposes of reference integrity, this compares very favorably with the fragility of online references.

Web archiving services such as Archive-It solve the problem of reference rot by bridging the gap between the stability and permanence of print resources and the speed and dynamism of the online world. A properly configured public web archive enables electronic document versions to be cataloged and accessed in a manner similar to that of books and periodicals in a physical library. Authors and online document editors simply point their citations to the permanent, archived copy of the resource rather than a live link that could be moved, disabled or updated by a third party after the piece is published.

With Archive-It web archiving in place, an institution such as a research library could offer its clients an online citation service that substantially reduces (if not eliminates) the problems of link rot and reference rot.

