December 2006
Library and Archives Canada (LAC) is pleased to announce that the Government of Canada Web Archive is now available for consultation on-site in our Reference Room at 395 Wellington Street, Ottawa.
The Library and Archives of Canada Act received Royal Assent on April 22, 2004 and for the purposes of preservation it allows LAC to collect a representative sample of Canadian websites.
Websites are now being officially recognized as part of our documentary heritage and the modernization of the definition of what constitutes a 'publication' has broadened our scope and understanding of what we need to collect for future generations of Canadians.
Our aim is to ensure that all acquired web sites are preserved in our permanent collection where they can be made available to Canadians over the long term for consultation and research. Client access is now being provided on-site; in the future, we will provide broader access to Canadians via the Internet.
LAC staff who require access to the Archive to facilitate their daily work will initially use a password which can be obtained from the Legal Deposit Internet Unit, Published Heritage Branch.
The Archive includes 1, 489 Federal Government web sites which were harvested using the Heritrix web crawling software between December 22nd, 2005 and March 24th, 2006. Approximately 1.8 TB of data was collected, comprising over 40, 000,000 digital objects. To create the Archive, LAC used a suite of software tools (Heritrix, NUTCH/WAX and WERA) which are all OpenSource Software and were developed by the International Internet Preservation Consortium (http://netpreserve.org/about/index.php) in which LAC is a member.
The web sites contained in the Archive encompass subject matter produced and presented online by the Federal Government of Canada at the time that the harvest took place.
A second harvest of the Federal Government web domain (.gc.ca) was started on October 25th, 2006. LAC also began harvesting the web sites of Canada's provincial and territorial governments during that same week. Further study of the archived websites will help to determine the frequency of future harvests. Analysis performed on the results of the first .gc.ca domain harvest has greatly aided in the configuration of the web crawling software and the development of other software tools utilized for the second harvest.
The Government of Canada Web Archive was developed and implemented through the collaborative work of staff in several areas within LAC and their efforts are much appreciated.
If you have questions about LAC's web harvesting activity, please email: web-archives-web@lac-bac.gc.ca.
For historical information visit: Archived What's New