#CRL_Leviathan Session 3: New Models of Stewardship: An Agenda for CRL and North American Research Libraries

Government Records and Information: An Inventory of the Major Threats and Challenges YouTube-logo-full_color 
Bernard Reilly, President, CRL

  • Major threats and challenges:
    • Scale of the challenge > the enormous and constantly growing volume of information and government records being produced, as well as data being collected and disseminated > now must use computer-assisted research and applications to analyze this content for decision-making, preservation, context-generation, etc.
    • The unknown unknowns > we don’t know what we’re missing  and we don’t know the size of what we’re dealing with > this makes prioritization, decision-making, and the framing of a preservation strategy very difficult
    • There is no longer a distinction between what’s a government record and what’s a government document > in the digital realm these terms are almost interchangeable > a lot of material falls in a grey area (is an agency website a government record or document?)
    • We’re facing reductions in capacity and funding in the big organizations that have traditionally been trusted with the long-term preservation of government information > at the same time, these organizations are being asked to do much more and the reality is that they will only be able to do less
      • These institutions will play different roles in the future > NARA could be a collaborator with government agencies in decisions on systems used and adopted to produce and manage government documents and records – there has been start to this process with the development of NARA guidelines on this topic
      • Different priorities of higher education institutions > used to be stewards of information and content needed by researchers, but there are so many pressures on these institutions and the return on investment is farther in the future, which makes it harder for institutions to provide support
    • Larger role of tech companies and cloud > in the short term these companies support the maintenance of digital content and in providing services for use and discoverability of this content > social media companies sometimes considered to be de facto repositories > must use caution, as content could easily disappear
    • Bigger role of private companies like ProQuest mediating use and access to government content > increasingly these companies are in possession of data about how the content is being used, which could be very valuable for libraries
    • Must get a better understanding of civic-minded organizations > these organizations are in the government information supply chain > there are many things we don’t know about the systems of private organizations that are storing government information and how the finances of these organizations work > there is a lack of economic transparency

Panel Discussion: Prospective Roles and Actions for Libraries and CRL YouTube-logo-full_color
Mary Case, University Librarian, University of Illinois at Chicago

  • Who gets to determine the value of the information? Governments, scholars? > there is tension between the desire of scholars to have everything available and how we manage all of this information with the laws and rights that govern it
  • The technologies do exist for improve metadata creation and document tracking, but it is not yet being applied in the domain of government information
  • We need to rebuild the connection between records managers and content creators > re-education on importance of documents for institutional memory and our ability to move forward
  • A big challenge lies with the fact that everything has become data > there is a push to ensure that all information can be mashed, taken apart, and recontextualized in some manner
  • When we talk data, we’re being encouraged to get closer and closer to the point when the data is being created > moving up stream will be critical for us
  • When trying to address the challenges of born digital information we need to think outside the FDLP box and find ways to do more, collaboratively
  • What can research libraries do?
    • Tell the story and explain why preserving government information is important, why any government data is critical for our present and our future
    • There are several local and regional consortia > can we model and do pilots with local and regional governments and researchers to see how we might manage information more locally
    • Urge more movement above stream (encourage GPO and NARA to put out best practices – don’t wait)
    • Have more conversations > can we figure out how to agree with ProQuest and others to create a trusted digital repository where if the company goes out of business we can maintain access > we can’t keep replicating

Brent Roe, Executive Director, Canadian Association of Research Libraries

  • Government information needs to be held independently of government > what role do research libraries have in this?
  • Some realities in Canada:
    • At the federal level, we need to work with government information in both languages
    • Crown copyright at the federal level (still 50 years)
    • Potential constraints on what researchers can do with information, but policy allows quite liberal use of content
  • Some current developments:
  • Many existing initiatives bring together research libraries and government information:
  • Maybe we don’t need to keep everything? We should consider this perspective
  • It may be most logical for us to simply buy what’s provided by third parties because we don’t really have the capacity to preserve everything ourselves
  • Is there a way that research libraries can collaborative on web crawls? > each institution could be responsible for different domains or agencies
  • Research libraries have an important advocacy role to play
  • How to find money for these new opportunities / projects? > we need to come around to the idea that we may have to start and hope that the projects will get supported in the future

Ingrid Parent, University Librarian, University of British Columbia and former Assistant Deputy Minister, Library and Archives Canada

  • The situations in Canada and the United States have many similarities and differences, but what is clear is that a lot needs to be done
  • Support at the international level is needed to ensure long-term accessibility and integrity > generally not enough is happening > in European libraries the focus is on copyright exceptions for libraries and e-lending and digital government information is not top of mind
  • National archives and libraries are addressing some challenges, but there is little collaboration
  • Many questions about who does what, how do we do this at the international level, and who pays for what?
  • PERSIST: UNESCO Digital Strategy for Information Sustainability > helping secure mechanisms of good governance and access to information, government documents are part of this scope
  • There is no one organization that can tackle the issue of born digital government information > collaboration with a variety of stakeholders will be key
  • There is a group in INFLA that deals specifically with government information and this may be a way for us to bring these issues to the international stage > perhaps an advocacy statement can be drafted to take to the international level
  • We are all struggling with digital big data, but the technology is there to be used creatively to meet our objectives and we should embrace and explore this > let’s be ambitious and pragmatic

Conclusions: a New Strategic Framework for Collective Action by North American Academic Libraries and New Multi-year Priorities for CRL
Bernard Reilly, President, CRL

  • Analysis > we need to know more about what we don’t know > how gov info is produced, managed, and distributed and how gov records are declassified > more about systems and software involved inside governments > more about the organizations tasked with long-term management of government information > more about commercial actors, NGOs, and civic organizations > more about the consumption and the uses of government data and records
    • Need to locate and exploit expertise within the CRL community > collections development communities, libraries, people that know gov info and data, etc.
    • Need to know more about what’s at risk > the problems with declassification and things going missing > many problems seem to be in the access pipeline
    • In danger of losing material from foreign governments with unstable regimes and corrupt governments
  • Communicate > we need to tell the story better > articulate what we want and what we need
  • Audit > We need to audit commercial repositories and preservation repositories
  • We need to stop talking about e-government and paperless government documents as if they are manageable within the framework of the FDLP
  • Do we really want to put so much money in web archiving? What are we getting out of it? Is what we’re archiving going to be used? Should we focus more on at risk data that will matter if it disappears?
    • We should look more closely at the research being done by social scientists, economists, historians, etc. who are using web content and have stake in the near-term preservation of content

#CRL_Leviathan Session 2: Libraries and the Information of Governments

Keynote: Approaching Leviathan: The Dangers and Opportunities of “Big Data” 
John S. Bracken, Director, Journalism and Media Innovation, Knight Foundation
YouTube-logo-full_color Adobe PDF icon

  • How to deal with big data is only half the story > we must also focus on organizational culture and adaptation or we lose track of the importance of culture and people
  • There is so much data now > what’s important is the process, what you do with it, and the talent you build around it > we must adapt and create a bridge between traditional skills and new quantitative approaches
  • There is skepticism about technology and our reliance on it and this is colliding with the emerging culture of break things and focusing on future and the next challenges
  • How does the civic sector do a better job of adaptability to build the tools that people want and need?
  • “Make something people want or move on” outlook is much harder to accomplish in civil society
  • The biggest cognitive switch we need to make is enabling ourselves to make mistakes
  • The Knight foundation works in the space of news and journalism, but links it to the community > learn more about the Knight Foundation here: http://www.knightfoundation.org/

Government Records and Information: Real Risks and Potential Losses 
James A. Jacobs, Data Services Librarian Emeritus, UC San Diego, and technical advisor for CRL Certification Advisory Paneldfd
YouTube-logo-full_color Adobe PDF icon

  •  There are many gaps in what we know: no list of born-digital government information, no list of all government websites, no list of preserved born-digital gov info
  • What we do know: FDLP libraries have preserved millions of volumes of non-digital government information and most born digital information is not held, managed, organized, or preserved by libraries
    • Preservation is at the mercy of budgets and social priorities > risk increases if  persevering agency is the creator and doesn’t have preservation as mission or if preserving agency governed by politicians
  • The production of digital documents is far outpacing what’s being done to preserve these documents
  • Key issues:
    • Versioning
    • The need for persistent URLs
    • The need for temporal context (ex: link to version of document or site that author linked to at time of publication and not updated version)
    • E-government issues (e-gov often hides information behind services > how to we preserve this information)
    • Relying on government for preservation and free access (most agencies do not have the mandate to preserve indefinitely – this is even the case for GPO)
    • Collections need services to provide important context for interpretation
  • When we create dark archives we’re not creating a value for our community > we need to create immediate value for our users
  • Who should preserve?
    • Option one: the government alone
    • Option two: the government with non-governmental partners (ex: GPO + LOCKSS-USDOCS)
    • Option three: non-governmental organizations without government cooperation (ex: Internet Archive)
  • There are different methods for selecting what needs to be preserved (the solutions should be mixed and the issue should be tackled collaboratively)
    • Broad web harvesting (ex: Internet Archive)
    • Focused selection (ex: by agency or title by tile)
    • Digital deposit (ex: deposit by creators to memory institutions)
  • When planning for preservation focus on different user-communities: don’t look at the web and decide what to preserve, look at the web and preserve based on what users will need
  • Every library should participate in digital preservation > it’s about building the value of libraries > collections and services should be reliable and useful > shared collections and services can be built with different contributions – not all libraries have to be data centres
  • Summary of key points:
    • Preserve born digital government information – the technology exists
    • Every library can and should participate
    • We can add value to the information by building collections of use to our user communities

The Digital Future of FDsys and the Federal Depository Library Program: A Public Policy Analysis 
R. Eric Petersen, Specialist in American National Government, Congressional Research Service
YouTube-logo-full_color Adobe PDF icon

  • Challenges
    • Access and service (tangible, digital, or both?)
    • Costs (Less print distribution, but still costs libraries to maintain
    • FDSys – there is no good model for permanent digital retention > we will have to update software and touch digital assets to make sure access continues > ongoing investment and responsibility required > every 8-10 years will require entire overhauls and updates
    • Born digital materials > identification, retention, preservation, service
    • Tangibles > retention, digitization, consolidation, service
  • Lack of consensus around:
    • What is to be captured > how to count – websites / documents vs. records
    • How to capture and by whom > GPO / FDSys, originating agencies, third parties
  • Legislative change is slow without clear agreement regarding the solutions among stakeholders
  • Before Congress will engage, we need clear proposals that are broadly supported and offered by stakeholders and interested parties > they must cover issues such as enduring standards for digital retention, who collects and retains born digital content and tangible content, and how the costs will be managed

Panel Discussion: New Models of Access: The Role of Third Party Aggregators and Publishers
YouTube-logo-full_color Adobe PDF icon

Susan Bokern, VP, Information Solutions, ProQuest

  • We all have different roles to play and there’s enough content to go around
  • ProQuest’s essential role is to add value to content
  • ProQuest is focusing on researchers and the improvement of workflow processes to create new research output > enabling researchers to access content more efficiently, providing tools to improve workflow, visualization and analysis tools, not just about content but also about context
  • The process of adding value begins with market research (surveys, advisory boards, focus groups to identify known and unknown needs) > creating acquisitions strategy to develop collection > preserving content or data > keeping the technology up to date > identifying where and how to obtain the content
  • ProQuest takes preservation seriously > content is stored on their own servers > currently exploring a longer-term storage and preservation solution (ex: Iron Mountain)

Robert Lee, Director of Online Publishing and Strategic Partnerships, East View Information Service

  • East View is an aggregator for academic institutions and a variety of international governments
  • Some example projects: GIS, big data, political rallies ephemera
  • Big focus on content from Russia and China > not usually seeking or producing translations, but going after the information and data that’s not always available elsewhere or not the same as what’s provided in English
  • There is an operational risk is that the information received could later be reclassified
  • In China, content can be made available and digitized very quickly but it can also disappear or be blocked quickly, too
  • Interested in exploring cross-platform solutions for content

Robert Dessau, CEO, voxgov

  • Voxgov harvests materials from over 10K web destinations each day > every 6 mins the system looks for new URLs > 49 diff types of documents (fact sheet, social media, congressional, federal register, speeches, etc.)
  • The collection process has evolved rapidly > learned to identify when a website’s format has changed to maintain quality intake of data > 18-22%, depending on the group, falls into the broken link category
  • Interested in tracking conversations from beginning to end to allow a much deeper and more comprehensive level of research
  • The involvement of third parties in the preservation and access process is inevitable
  • Mining the text we have to bring value has not yet been realized

#CRL_Leviathan Session 1: Libraries and the Records of Governments

The CRL Leviathan conference, Libraries and government information in the age of big data, took place in Chicago on April 24 and 25, 2014. CLA-GIN’s Co-moderator, Catherine McGoveran, attended the conference and has compiled notes of the key points from each presentation. The following are notes from the first session, Libraries and the records of governments. Notes from session two, Libraries and the information of governments, will be posted in the coming days. 


Welcome and Keynote: Information, Transparency and Open Government: A Public Policy Perspective
Thomas S. Blanton, Executive Director of the National Security Archive at George Washington University

  • Born-digital document production is far outpacing the physical documents we have in our government archives from the past two centuries
  • There are many barriers that limit an accurate understanding and adoption of open government
  • The Electronic Records Archive – a flagship legacy system – doesn’t come close to being comprehensive
  • Documents from the Clinton administration ordered declassified have not yet reached the shelves of the national archives – a lot more support is needed (financial and personnel) to speed up the declassification process and make these documents available
  • Research libraries are on a trajectory from the collection and preservation of special collections to data curators > our future is in the interactive collaboration with others, crowd-sourcing to make sure the data is available
  • The opening of government data can help make incredibly important revelations, which can lead to better government and consumer decisions
  • The National Archives and Records Administration (NARA) will fail to meet mission unless it becomes an offsite backup for electronic government records
  • Only 2% of what the government creates gets saved at NARA
  • We know of the huge power of the National Security Agency (NSA) to retrieve, store, and link records > NSA does records management well and this expertise should be used by the National Archives for off-site back up of government information > this would fit well with national security mandate and could be a way for the agency to restore trust and engage in the civic duty of preserving and making available government information

Historical Research and Government Records in the Era of Big Data: a Historians Perspective
Matthew J. Connelly, Professor of History, Columbia University

  • The government info available is the function of a political process > it is the relationship between knowledge and power > where do electronic records fit into this?
  • There is a crisis in democratic accountability and national security > it is a national security issue when departments and agencies don’t have functioning / accurate archives, it opens the government to foreign threats and is an issue about which every citizen should care
  • It is doubtful that, as historians, we will ever be able render a complete account of government documents, records, and decisions
  • We don’t know what we don’t know
  • The government should move more aggressively to use data mining to manage records
  • Archives are also sites of expectation, not just memory > they’re about the future

To bring together the records of the past and to house them in buildings where they will be preserved for the use of men and women in the future, a Nation must believe in three things. It must believe in the past. It must believe in the future. It must, above all, believe in the capacity of its own people so to learn from the past so that can gain in judgment in creating their own future.Roosevelt

  • The system is so overloaded that the info that should be protected has suffered because of over protection
  • Historically, secrecy has been in the eye of the beholder, which makes it difficult to set a classification standard that satisfies everyone
  • With the transition to electronic, hundreds of thousands of paper records were lost, because they were not migrated to digital and not kept
  • The budget for declassification has diminished and the budget for keeping secrets has skyrocketed > though there has been a huge growth in the amount of information created, there has been a steep decline in the amount of records declassified
  • The amount the government is currently spending on declassification is 15% of what was spent in the late 90s
  • Data mining can help us identify gaps in the documents released and withheld by government and text analysis can help us identify the trends of declassified words and issues
  • By comparing redacted and later released unredacted documents, we can see the patterns of official secrecy > this could help government find what topics are more sensitive, which and aid the classification and declassification process

Read Matthew Connelly’s recent article on declassification policies, “The Ghost Files”, in a recent issue of  Columbia magazine. Visit Connolly’s Declassification Engine: http://www.declassification-engine.org/

Panel Discussion: Preserving the Electronic Records of Governments: Issues and Challenges
YouTube-logo-full_color Adobe PDF icon
Paul Wester, Jr., Chief Records Officer for the United States Government, National Archives and Records Administration

  • There is high level administrative support for records management (presidential memo) > we need to change how we manage records and make them available to the public and we cannot do things on an individual level any more
  • Directive developed with deadlines and guidelines to transform how records management is done across the government
  • Directive goals: transform the entire record keeping function from analog to digital automated approach
    • Federal agencies must manage all permanent electronic records in an electronic format by December 31, 2019
    • All agencies must manage both permanent and temporary email records in an accessible electronic format by December 31, 2016
  • Agencies must manage documents in automated ways to be effective
  • Training, awareness, and accountability are the main focus of the directive
  • How we manage email will transfer to how we manage other types of electronic records
  • We need to focus on records of relevance and work with universities and archives to do research to set records free and build new connections to records (collaborate to build exhibits, showcase research projects, provide context, etc.) > value could be short term but visibility will be long term

William A. Mayer, Executive for Research Services for the National Archives and Records Administration

  • Focusing on building a national framework for archives research services to reach more people (now limited to physical archives locations)
  • NARA is changing the way staff access and interact with the web and with records > as learning is human-to-human we still need humans to be involved with dealing with records
  • Archives are seen as the end of the records management pipeline, but involvement needs to start farther upstream
  • NARA still has 30 years of paper records to come to Archives
  • While we may need to consider how to bring in more records to do data analysis, we also need to figure out how to get rid of those records that really don’t have value because we don’t have the capacity to preserve everything
  • In terms of web harvesting, one issue for consideration is the capture of content-rich intranet sites
  • NARA would like to engage in more small partnerships around building context-rich interfaces for resources

Cecilia Muir, Chief Operating Officer, Library and Archives Canada

  • Our government planning to ensure that over 98% will have access to high speed internet even in remove parts of country by 2017 (Digital Canada 150, p. 7)
  • Library and Archives Canada (LAC) has a link to at least three initiatives in the government’s Action Plan on Open Government
  • Shared Services Canada is consolidating the government’s digital services
  • LAC mandate:  ensure documentary heritage of Canada is preserved, be the source of enduring knowledge accessible to all, facilitate cooperation among library and archive communities, serve as the continuing memory of government of Canada
  • Departments and agencies focus on managing information of “business value” and LAC receives records of “enduring value”
  • There has been a shift in thinking about the separation between government records and publications > people are less and less concerned about format and now focus on content and value of content
  • A risk-based approached needs to be implemented to manage the increasing amount of information
  • LAC is interested in collaborating with various partners to support research, access, and context building

Paul Wagner, Director General and Chief Information Officer, Information Technology Branch, Library and Archives Canada

  • Documents must be accessible, or at least discoverable, for us to meet our mission
  • We are moving towards digital curation model > the goal is to become a trusted digital repository > this used to simply be a tech based solution (buy right system), now it’s about capacity and ability to work in a digital world
  • We need to have same rigour for digital assets that we have for physical documents and we’re not there yet
  • Not all digital assets need to be in Government of Canada data centres, as not everything is private > we are going to work with the private sector to see how we can manage and preserve these records
  • Context is key, as many clients don’t know how to interact with the data / information > we need to provide the context to them so they can understand > LAC can create user experience to provide context and access while the data may be held somewhere else
  • The data and information we have is valuable, but only in context > user contributed content / analysis that takes massive amounts of data and finds trends and stories are what create the value in that information


Updates on NARA records activities can be found on the NARA Records Express Blog: blogs.archives.gov/records-express

Canadian government documents digitization registry

At Government Information Day at the University of Toronto on November 1st, there was significant interest from participants in the development of a “registry” of completed, ongoing, and proposed digitization projects for Canadian government documents.

A bilingual Google form to collect and store digitization project information has been created to get this project off the ground. This platform was chosen because it can act as a simple starting point – data can easily be updated and exported to another tool in the future.

The registry is now open and you can contribute content in English and French.

If you have content to contribute, please, enter the details directly in the form at the following link: http://bit.ly/gov-docs or send an email to gsg@uottawa.ca.

Results of the form can be viewed here: http://bit.ly/gov-docs-results

** Note: Every bit of information helps, so if you heard of a project but are not sure of all the details, send an email to gsg@uottawa.ca so they can begin to investigate.

The success of this project will rely heavily on the participation of members in our community. Please take a few minutes to contribute to and share this important project. Again, this is a starting point. Some of our colleagues have expressed an interest in further developing this project in the future, so let’s get a solid foundation established!


L’événement « Government Information Day » à l’Univeristé de Toronto, ayant eu lieu le 1er  novembre 2013, a suscité beaucoup d’intérêt auprès des participants à l’égard du développement d’un registre de projets de numérisation, soient terminés, en cours ou proposés de documents du gouvernement du Canada.

Pour lancer cette initiative, un groupe de l’uOttawa a créé un formulaire Google bilingue pour recueillir et enregistrer les renseignements sur les projets de numérisation. Nous avons choisi cet outil parce qu’il s’agit d’un point de départ simple ; l’information peut être facilement mise à jour et exportée vers un autre outil à l’avenir.

Nous vous invitons à participer à ce projet, soit en anglais ou en français.

Si vous avez du contenu à partager, veuillez nous le faire parvenir au moyen du formulaire se trouvant au lien suivant : http://bit.ly/gov-docs ou par courriel à gsg@uottawa.ca.

Les résultats de ce projet peuvent être consultés ici : http://bit.ly/gov-docs-results

*** Notez que tous les champs du formulaire ne sont pas obligatoires donc ne vous inquiétez pas si vous ne remplissez pas tous les champs. Si vous manquez de temps pour le remplir complètement ou si, par exemple, vous avez entendu parler d’un projet, mais n’êtes pas sûr des détails, envoyez-nous un courriel afin que nous puissions commencer la recherche. Chaque petit détail compte.

La réussite de ce projet dépendra en grande partie de la participation des membres de notre communauté. Veuillez prendre quelques minutes pour contribuer et prendre part à cette importante initiative. Nous soulignons le fait qu’il s’agit d’un point de départ. Certains de nos collègues ont exprimé un intérêt à faire évoluer ce projet ultérieurement, alors aidez-nous à leur offrir une solide assise !