Keynote: Approaching Leviathan: The Dangers and Opportunities of “Big Data”
John S. Bracken, Director, Journalism and Media Innovation, Knight Foundation
- How to deal with big data is only half the story > we must also focus on organizational culture and adaptation or we lose track of the importance of culture and people
- There is so much data now > what’s important is the process, what you do with it, and the talent you build around it > we must adapt and create a bridge between traditional skills and new quantitative approaches
- There is skepticism about technology and our reliance on it and this is colliding with the emerging culture of break things and focusing on future and the next challenges
- How does the civic sector do a better job of adaptability to build the tools that people want and need?
- “Make something people want or move on” outlook is much harder to accomplish in civil society
- The biggest cognitive switch we need to make is enabling ourselves to make mistakes
- The Knight foundation works in the space of news and journalism, but links it to the community > learn more about the Knight Foundation here: http://www.knightfoundation.org/
Government Records and Information: Real Risks and Potential Losses
James A. Jacobs, Data Services Librarian Emeritus, UC San Diego, and technical advisor for CRL Certification Advisory Paneldfd
- There are many gaps in what we know: no list of born-digital government information, no list of all government websites, no list of preserved born-digital gov info
- What we do know: FDLP libraries have preserved millions of volumes of non-digital government information and most born digital information is not held, managed, organized, or preserved by libraries
- Preservation is at the mercy of budgets and social priorities > risk increases if persevering agency is the creator and doesn’t have preservation as mission or if preserving agency governed by politicians
- The production of digital documents is far outpacing what’s being done to preserve these documents
- Key issues:
- Versioning
- The need for persistent URLs
- The need for temporal context (ex: link to version of document or site that author linked to at time of publication and not updated version)
- E-government issues (e-gov often hides information behind services > how to we preserve this information)
- Relying on government for preservation and free access (most agencies do not have the mandate to preserve indefinitely – this is even the case for GPO)
- Collections need services to provide important context for interpretation
- When we create dark archives we’re not creating a value for our community > we need to create immediate value for our users
- Who should preserve?
- Option one: the government alone
- Option two: the government with non-governmental partners (ex: GPO + LOCKSS-USDOCS)
- Option three: non-governmental organizations without government cooperation (ex: Internet Archive)
- There are different methods for selecting what needs to be preserved (the solutions should be mixed and the issue should be tackled collaboratively)
- Broad web harvesting (ex: Internet Archive)
- Focused selection (ex: by agency or title by tile)
- Digital deposit (ex: deposit by creators to memory institutions)
- When planning for preservation focus on different user-communities: don’t look at the web and decide what to preserve, look at the web and preserve based on what users will need
- Every library should participate in digital preservation > it’s about building the value of libraries > collections and services should be reliable and useful > shared collections and services can be built with different contributions – not all libraries have to be data centres
- Summary of key points:
- Preserve born digital government information – the technology exists
- Every library can and should participate
- We can add value to the information by building collections of use to our user communities
The Digital Future of FDsys and the Federal Depository Library Program: A Public Policy Analysis
R. Eric Petersen, Specialist in American National Government, Congressional Research Service
- Challenges
- Access and service (tangible, digital, or both?)
- Costs (Less print distribution, but still costs libraries to maintain
- FDSys – there is no good model for permanent digital retention > we will have to update software and touch digital assets to make sure access continues > ongoing investment and responsibility required > every 8-10 years will require entire overhauls and updates
- Born digital materials > identification, retention, preservation, service
- Tangibles > retention, digitization, consolidation, service
- Lack of consensus around:
- What is to be captured > how to count – websites / documents vs. records
- How to capture and by whom > GPO / FDSys, originating agencies, third parties
- Legislative change is slow without clear agreement regarding the solutions among stakeholders
- Before Congress will engage, we need clear proposals that are broadly supported and offered by stakeholders and interested parties > they must cover issues such as enduring standards for digital retention, who collects and retains born digital content and tangible content, and how the costs will be managed
Panel Discussion: New Models of Access: The Role of Third Party Aggregators and Publishers
Susan Bokern, VP, Information Solutions, ProQuest
- We all have different roles to play and there’s enough content to go around
- ProQuest’s essential role is to add value to content
- ProQuest is focusing on researchers and the improvement of workflow processes to create new research output > enabling researchers to access content more efficiently, providing tools to improve workflow, visualization and analysis tools, not just about content but also about context
- The process of adding value begins with market research (surveys, advisory boards, focus groups to identify known and unknown needs) > creating acquisitions strategy to develop collection > preserving content or data > keeping the technology up to date > identifying where and how to obtain the content
- ProQuest takes preservation seriously > content is stored on their own servers > currently exploring a longer-term storage and preservation solution (ex: Iron Mountain)
Robert Lee, Director of Online Publishing and Strategic Partnerships, East View Information Service
- East View is an aggregator for academic institutions and a variety of international governments
- Some example projects: GIS, big data, political rallies ephemera
- Big focus on content from Russia and China > not usually seeking or producing translations, but going after the information and data that’s not always available elsewhere or not the same as what’s provided in English
- There is an operational risk is that the information received could later be reclassified
- In China, content can be made available and digitized very quickly but it can also disappear or be blocked quickly, too
- Interested in exploring cross-platform solutions for content
Robert Dessau, CEO, voxgov
- Voxgov harvests materials from over 10K web destinations each day > every 6 mins the system looks for new URLs > 49 diff types of documents (fact sheet, social media, congressional, federal register, speeches, etc.)
- The collection process has evolved rapidly > learned to identify when a website’s format has changed to maintain quality intake of data > 18-22%, depending on the group, falls into the broken link category
- Interested in tracking conversations from beginning to end to allow a much deeper and more comprehensive level of research
- The involvement of third parties in the preservation and access process is inevitable
- Mining the text we have to bring value has not yet been realized