Forever Data: A road to digital sustainability
Author: Elena Toffalori
Inspiration for this post came from conversations with our President Ruth Tringham, and from chats with communities we are lucky to work with.
How to ensure long-term digital sustainability and preservation of your research data
Working with cultural digital heritage we are often presented with the challenge to empower communities and teams to make the best decisions for the future of the archives, collections and cultural knowledge they are entrusted to preserve. Digitizing resources is often perceived and presented as a must to keep up with today’s world, and undoubtedly makes it easier to share, analyze, access and manage data. But can we trust digital technologies to take good care of our precious memories when everything about them evolves at such a high speed (storage media, devices and data formats, new looks, new user interface)?
Your precious photos or videos from a few years ago stored on CDs and DVDs have an average life expectancy of less than 10 years due to reasons ranging from oxidation to de-bonding of adhesives between its layers, to – ahem – newer computers not even mounting optical drives. While it’s harder to estimate the average lifespan of a web page, anyone who works with websites knows that anything beyond static, pure HTML, will become illegible by browsers within a couple of years, and likely it will appear visually outdated even faster.
Lots of bad news here! So here are a couple things that we have found helpful when assisting our clients into our highly ephemeral digital world.
1. Understand the difference between your archival data and its narrative(s)
Too often funding granted to cultural projects is tied to a derivative product – one of many possible narrations derived from the data – rather than on the curation of digital originals and ensuring their long-term sustainability. (Read more about what digital original means in digitization and born digital objects)
For example, if you are producing a video out of laser-scanned 3D data or photogrammetry, the audience for your video might never have access – or be concerned with – your 3D point cloud, much less the original photogrammetry picture series. But guess what – a new video can be produced from those originals at any time – not the other way around. Same goes for complex Virtual Reality applications, where funding can be heavily unbalanced due to the high costs of realization and user experience design.
If you are creating an online gallery from a photo collection, there will probably be a process of selection, resizing/resampling and maybe even whole categories of data that will be left out depending on the platform and format you choose for publication. Your designated publishing medium might not support video, or display image metadata, some object ought not be shared publicly due to the culturally sensitive content depicted or referenced, or you might not consider it safe to make geolocation information public. This is not different from the process of traditional publication, like preparing a monograph.
In addition to that, in the case of digital publication, any technology you are adopting will likely become obsolete within a few years.
Ask yourself what part of your data is the irreplaceable, valuable result of your research and work – and what constitutes a digital narration – one of many possible ways of providing access to it, and make sure you differentiate your approach and investment accordingly.
At CoDA we are big fans of 3D from Photogrammetry because image type digital originals have a longer history as digital files than 3D, they rely on established standards both for image formats and metadata (mainly EXIF for technical, IPTC for descriptive), and are extremely accessible thanks to the wide adoption and proliferation of dedicated software. An image series is a solid core that allows optimal sustainability.
2. Plan for routine data maintenance - as there's no way around it
First of all, let it be said that just like physical material or equipment, no digital format is exempt from some level of maintenance (Read more about this topic on page “The Project Archive: Storage and Dissemination” in the ADS guides to good practice).
What careful planning and informed choices can do for you is stretch the time needed between revisions by up to several years, as well as make it possible for yourself and future curators to preserve your data and ensure that it is accessible and complete.
In many cases, these revisions will likely still be more frequent than those required by an analog media (think film or paper); at the same time, because of specific features of digital, such as redundancy, ubiquitous access and ease of management/analysis, a well-planned digital strategy can increase your data’s overall lifespan (not to mention relevance to its audience), while minimizing the risk of accidental loss.
To preserve digital content and provide service to users and designated communities decades hence, custodians must be able to replicate the content on new media, migrate and normalize it in the face of changing technology
3. Focus on your "core" archive preservation
As a direct consequence, for the purpose of sustainability, it is greatly more important to research and optimize the format, storage and accession of your “core” data than that of the digital narration. It’s fine – and effective! – to experiment with advanced technologies and engaging presentations for your digital content, but not at the cost of losing track of the archive that made it all possible.
Picking your file formats carefully (check out these seven criteria suggested by the Library of Congress as well as the chart below) is very important, so is choosing how to archive – and whether to trust an official repository or a cloud based service for data storage. Make sure you can always get your data back, get local backups, and that migration will be an option when the current format becomes antiquated.
4. Document your project for long-term access and reuse
Some amount of documentation is usually stored with any digital archive, for instance project background or contacts. When planning documentation of your project, aim for more rather than less – and focus on explaining usage, access and workflows across the actual data: provenance, tools used to capture and process it, file naming convention adopted in the project and the reasons behind it, dates (it’s incredible how often digital content doesn’t have a date associated with it!), and methodology.
Don’t leave information out only because it’s implicit in the archive or the single files – redundant and basic information (such as a list of the content and structure of folders and files) can sometimes save the day when an archive is damaged or partially lost, or its file formats have become illegible.
Document your work with simple, durable formats like images and simple text files. For files that rely on proprietary technology or specialized software, such as 3D, video, databases and so on, storing the software along with the data is always an option. But sometimes preserving the content of your data means also providing a copy of it in a “simpler” format – think still image renderings from your 3D file, a spreadsheet export from your database tables, a CSV inventory of your media and related metadata, and so on.
Once again, think of your core archive as the source material, not for your digital narrative, but potentially for infinite narratives and interpretations. The ultimate goal of digital preservation is allowing for data to be accessed and used in a far away future – account for this from the beginning, and make sure your documentation allows someone completely new to your data to understand it and use it for a new narration.
“Documentation should be disseminated using the most effective available media, including graphical, textual, video, audio, numerical or combinations of the above.”