Tuesday 2 September 2008

2008 End-of-Term Website Crawl to Preserve Gov't. Info.

The Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive and the U.S. Government Printing Office are joining together for a collaborative project to preserve public U.S. government web sites at the end of the current presidential administration ending Jan. 19, 2009. The project will document federal agencies' presence on the internet during the transition of government and to enhance the existing collections of the five partner institutions.
The project has 2 parts:
1. The Internet Archive will do a "comprehensive crawl" of the .gov domain (all of the URLs identified for this project) beginning in late August 2008, and again in early 2009, after the inauguration.
2. Prioritized Crawls: Selected URLs will be crawled at a greater frequency and depth than the comprehensive crawl, in order to capture websites that are at risk of rapid change during this time, or of disappearing altogether.
The project team is now calling upon government information specialists -- including librarians and law researchers -- to assist in the selection and prioritization of web sites to be included in the prioritized crawl, as well as identifying the frequency and depth of the act of collecting. Those who sign up to participate will be provided a link to a web-based tool to facilitate the collaborative work of this project. Participants will be asked to review URLs to determine if they are in scope or out of scope for the project, and may also add in-scope URLs not appearing in the comprehensive crawl URL list.

No comments: