big data

#CRL_Leviathan Session 1: Libraries and the Records of Governments

The CRL Leviathan conference, Libraries and government information in the age of big data, took place in Chicago on April 24 and 25, 2014. CLA-GIN’s Co-moderator, Catherine McGoveran, attended the conference and has compiled notes of the key points from each presentation. The following are notes from the first session, Libraries and the records of governments. Notes from session two, Libraries and the information of governments, will be posted in the coming days. 


Welcome and Keynote: Information, Transparency and Open Government: A Public Policy Perspective
Thomas S. Blanton, Executive Director of the National Security Archive at George Washington University

  • Born-digital document production is far outpacing the physical documents we have in our government archives from the past two centuries
  • There are many barriers that limit an accurate understanding and adoption of open government
  • The Electronic Records Archive – a flagship legacy system – doesn’t come close to being comprehensive
  • Documents from the Clinton administration ordered declassified have not yet reached the shelves of the national archives – a lot more support is needed (financial and personnel) to speed up the declassification process and make these documents available
  • Research libraries are on a trajectory from the collection and preservation of special collections to data curators > our future is in the interactive collaboration with others, crowd-sourcing to make sure the data is available
  • The opening of government data can help make incredibly important revelations, which can lead to better government and consumer decisions
  • The National Archives and Records Administration (NARA) will fail to meet mission unless it becomes an offsite backup for electronic government records
  • Only 2% of what the government creates gets saved at NARA
  • We know of the huge power of the National Security Agency (NSA) to retrieve, store, and link records > NSA does records management well and this expertise should be used by the National Archives for off-site back up of government information > this would fit well with national security mandate and could be a way for the agency to restore trust and engage in the civic duty of preserving and making available government information

Historical Research and Government Records in the Era of Big Data: a Historians Perspective
Matthew J. Connelly, Professor of History, Columbia University

  • The government info available is the function of a political process > it is the relationship between knowledge and power > where do electronic records fit into this?
  • There is a crisis in democratic accountability and national security > it is a national security issue when departments and agencies don’t have functioning / accurate archives, it opens the government to foreign threats and is an issue about which every citizen should care
  • It is doubtful that, as historians, we will ever be able render a complete account of government documents, records, and decisions
  • We don’t know what we don’t know
  • The government should move more aggressively to use data mining to manage records
  • Archives are also sites of expectation, not just memory > they’re about the future

To bring together the records of the past and to house them in buildings where they will be preserved for the use of men and women in the future, a Nation must believe in three things. It must believe in the past. It must believe in the future. It must, above all, believe in the capacity of its own people so to learn from the past so that can gain in judgment in creating their own future.Roosevelt

  • The system is so overloaded that the info that should be protected has suffered because of over protection
  • Historically, secrecy has been in the eye of the beholder, which makes it difficult to set a classification standard that satisfies everyone
  • With the transition to electronic, hundreds of thousands of paper records were lost, because they were not migrated to digital and not kept
  • The budget for declassification has diminished and the budget for keeping secrets has skyrocketed > though there has been a huge growth in the amount of information created, there has been a steep decline in the amount of records declassified
  • The amount the government is currently spending on declassification is 15% of what was spent in the late 90s
  • Data mining can help us identify gaps in the documents released and withheld by government and text analysis can help us identify the trends of declassified words and issues
  • By comparing redacted and later released unredacted documents, we can see the patterns of official secrecy > this could help government find what topics are more sensitive, which and aid the classification and declassification process

Read Matthew Connelly’s recent article on declassification policies, “The Ghost Files”, in a recent issue of  Columbia magazine. Visit Connolly’s Declassification Engine:

Panel Discussion: Preserving the Electronic Records of Governments: Issues and Challenges
YouTube-logo-full_color Adobe PDF icon
Paul Wester, Jr., Chief Records Officer for the United States Government, National Archives and Records Administration

  • There is high level administrative support for records management (presidential memo) > we need to change how we manage records and make them available to the public and we cannot do things on an individual level any more
  • Directive developed with deadlines and guidelines to transform how records management is done across the government
  • Directive goals: transform the entire record keeping function from analog to digital automated approach
    • Federal agencies must manage all permanent electronic records in an electronic format by December 31, 2019
    • All agencies must manage both permanent and temporary email records in an accessible electronic format by December 31, 2016
  • Agencies must manage documents in automated ways to be effective
  • Training, awareness, and accountability are the main focus of the directive
  • How we manage email will transfer to how we manage other types of electronic records
  • We need to focus on records of relevance and work with universities and archives to do research to set records free and build new connections to records (collaborate to build exhibits, showcase research projects, provide context, etc.) > value could be short term but visibility will be long term

William A. Mayer, Executive for Research Services for the National Archives and Records Administration

  • Focusing on building a national framework for archives research services to reach more people (now limited to physical archives locations)
  • NARA is changing the way staff access and interact with the web and with records > as learning is human-to-human we still need humans to be involved with dealing with records
  • Archives are seen as the end of the records management pipeline, but involvement needs to start farther upstream
  • NARA still has 30 years of paper records to come to Archives
  • While we may need to consider how to bring in more records to do data analysis, we also need to figure out how to get rid of those records that really don’t have value because we don’t have the capacity to preserve everything
  • In terms of web harvesting, one issue for consideration is the capture of content-rich intranet sites
  • NARA would like to engage in more small partnerships around building context-rich interfaces for resources

Cecilia Muir, Chief Operating Officer, Library and Archives Canada

  • Our government planning to ensure that over 98% will have access to high speed internet even in remove parts of country by 2017 (Digital Canada 150, p. 7)
  • Library and Archives Canada (LAC) has a link to at least three initiatives in the government’s Action Plan on Open Government
  • Shared Services Canada is consolidating the government’s digital services
  • LAC mandate:  ensure documentary heritage of Canada is preserved, be the source of enduring knowledge accessible to all, facilitate cooperation among library and archive communities, serve as the continuing memory of government of Canada
  • Departments and agencies focus on managing information of “business value” and LAC receives records of “enduring value”
  • There has been a shift in thinking about the separation between government records and publications > people are less and less concerned about format and now focus on content and value of content
  • A risk-based approached needs to be implemented to manage the increasing amount of information
  • LAC is interested in collaborating with various partners to support research, access, and context building

Paul Wagner, Director General and Chief Information Officer, Information Technology Branch, Library and Archives Canada

  • Documents must be accessible, or at least discoverable, for us to meet our mission
  • We are moving towards digital curation model > the goal is to become a trusted digital repository > this used to simply be a tech based solution (buy right system), now it’s about capacity and ability to work in a digital world
  • We need to have same rigour for digital assets that we have for physical documents and we’re not there yet
  • Not all digital assets need to be in Government of Canada data centres, as not everything is private > we are going to work with the private sector to see how we can manage and preserve these records
  • Context is key, as many clients don’t know how to interact with the data / information > we need to provide the context to them so they can understand > LAC can create user experience to provide context and access while the data may be held somewhere else
  • The data and information we have is valuable, but only in context > user contributed content / analysis that takes massive amounts of data and finds trends and stories are what create the value in that information


Updates on NARA records activities can be found on the NARA Records Express Blog: