About Our Web Site Archives & Access Program: History and Procedures
The North Carolina State Government Web Site Archives & Access Program (WSAAP) contains web sites created by North Carolina’s state agencies. These web sites are collected using an automated web crawler operated by the Internet Archive, a 501(c)(3) non-profit founded in 1996 with the purpose of offering permanent access to collections that exist in digital format. Through the Internet Archive’s subscription service, Archive-It, the State Library and State Archives are able to build, manage and provide long-term access to this unique web archives. To find out more about what web sites are included in the WSAAP, please see "What will you find in the Web Site Archives?"
Archiving Tool Evaluation
The State Archives and State Library began looking for an automated web archiving solution in 2004. Between 2004 and 2006 three options were evaluated: a tool known as Capturing E-Publications (CEP), the Web Archives Workbench tool, and Archive-It. The three tools offered very different functionality. To view a summary of the pros and cons of each tool based on these evaluations, click here. In the end, Archive-It was chosen as the solution because it is hosted, reliably captures specified content and stores it in a preservation-friendly format, offers easy end-user access to archived content, and is reasonably priced. An additional advantage was that in 2008, the Library and Archives were able to have the Internet Archive add to the WSAAP all of the content from NC state agency websites that the Internet Archive had captured back to 1996. So when you search the WSAAP today, you are searching all of the content collected since 1996.
Archiving Process and Documentation
In order to implement an automated web archiving solution, a list of North Carolina state agency domain names had to be created. In North Carolina there is no structure that all state agencies follow in creating domain names (i.e., all state agency domain names do not end in a specific way, like "state.nc.gov") and there is no place where all agency domain names are tracked. So the Library and Archives used the domain discovery tool from the Web Archives Workbench to create an initial list of state agency domain names.
Once an initial list existed, scoping rules needed to be defined. So State Archives and Library staff drafted documents outlining the scope of the Website Archiving Program, including justification for the website archiving program, guidelines for manual website archiving, standards for general scoping and frequency of capture using automated tools, and procedures specific to the technical constraints of Archive-It. To maximize the success of the archiving program, staff also notified state agency Chief Information Officers and Public Information Officers of the program and what their webmasters could do to support the program. As a result, there was an open line of communication to discuss any issues encountered by the agencies or by the Archives and Library in implementing this program.
Because there is a limit on the amount of content that can be collected under the annual Archive-It subscription, certain sites that are not updated as often are collected less frequently. To determine which sites are collected at what frequency, staff evaluate each domain against certain criteria. Also, each time websites are collected, an analysis is performed to determine how to refine the next crawl so that it best matches the scoping criteria and to identify whether or not any webmasters need to be contacted to address access issues. Lastly, changes in agency domain names or new agency domain names are sought out for inclusion in the next capture cycle. If you would like to suggest a domain name for inclusion in the WSAAP, please e-mail the domain name and it will be evaluated against the current scoping criteria.
Of course, the scope of the WSAAP will change over time. For instance, recently social media sites have become popular communication tools for state agencies and many communications using these sites are considered public records. Consequently, the State Library and Archives staff have made the decision to expand the scope of the WSAAP to include state agency social media sites. In order to make it clear how best to configure such sites for capture, State Archives and Library staff (working in conjunction with staff from the Governor's Office and the Office of Information Technology Services) published best practices guidelines and an accompanying online tutorial for state agency employees who use social media for business purposes. The Library and Archives will continue to refine the Web Site Archiving program to ensure that state government information disseminated through the web is available for access into the future.
Social Media Archive BETA
In 2012, the Web Site Archives & Access Program expanded its social media archiving through a partnership with ArchiveSocial, a North Carolina startup providing dynamic capture of social media from Facebook, Twitter, and LinkedIn. Currently in BETA, this project has already garnered positive attention and should be expanding to include additional state agencies soon.