In this article, Jeff Sengpiehl, CTO at Key Code Media, breaks down definitions and best practices for data management for media production. If you need to formulate a strategy, don’t hesitate to book a consultation with Key Code Media. When it comes to collaborating on content for television, film, or the internet, there is so much information available on how to write better, shoot better, and edit better. However, just as you see a massive chunk of mass below the surface of an iceberg- data management is one of the most important parts of operating a smooth production process- happening below the surface.
Why is Data Management Important?The typical production will have a stack of hard drives or camera cards with media from the field. A bonus if the production uses a shared storage server for video collaboration within a facility. The problems with only using hard drives can be numerous. Hard drives fail, media can be corrupted, ransomware attacks can lock up entire projects or the whole NAS environment, or disorganized data can make it impossible to re-monetize footage for future productions- it only made sense on the first run-through. Data Management & Data Protection can be as simple as having a backup hard drive, LTO tape, or cloud backup. There are good, better, and “enterprise” options for addressing ransomware protection, disaster recovery, and business continuity. Let’s break down the terms, then get into best practices and recommendations.
Archive TiersA media archive is simply a place to store production assets for later use. Depending on when you need the files, how the production team will use them, and the overall cost will determine the type of archive method. Active Archive – This is where you store media and files that need to be used often. For example, active B-Roll, establishing shots, and recaps of previous shows or seasons. Depending on the use case, this could also be known as ‘evergreen’ material. You need to use the media quite a bit, so it needs to be easy to access when needed but not as expensive as storing it on your top-of-the-line shared storage. Inactive Archive – This is not an exact science, but let’s say you need a set of media files every 6-9 months. Let’s say you also cannot wait 18+ hours to retrieve the media when you need it. An inactive archive is great for this. Inactive Archives typically come in the form of inexpensive spinning disks, LTO tape (without a rapid retrieve setup), or a deep cloud archive solution like Amazon S3 Glacier. Deep Archive – This is the copy to keep the lawyers and compliance managers happy. Just like how Disney remonetizes past film assets with “The Vault,” your production may need that original media later. Deep archives tend to be original camera formats or fully mastered materials. Back in the day, the deep archive was physical film reels or boxes of videotapes. The cost is typically low, and the media would be shipped to far-off vaults for safety. Today’s deep archives tend to be in deep freeze cloud archives or LTO tapes that are on library shelves. The cost is higher- the cloud isn’t free, and LTO tapes need occasional re-packing to stay current on LTO formats.
BackupsA backup is typically a work-in-progress snapshot. Project data, mezzanine, editorial media or VFX. Typically, it’s a quick recovery spot to quickly grab something to go back to, in case something is accidentally deleted. Some teams will transfer backups into archives- depending on their workflow. Other teams have no need for these snapshots after delivery, and that’s okay too. Depending on the depth of the backup, the material kept there, and the location of the backup, this could also be a small-size disaster recovery area.
Business ContinuityIn some ways, Disaster Recovery and Business Continuity are cousins. Business Continuity is essentially the rental car coverage in the Disaster Recovery “Insurance Plan.” Business Continuity is the understanding that deadlines do not stop if there is a disaster- a level of redundancy that allows you to continue to deliver, despite the roadblocks thrown up. For smaller productions, backups can be the business continuity- if the primary storage dies, someone can open their backups and continue to work off of them directly, with very little lost time. For some, a nearline storage solution may make sense, making a copy of your primary storage and allowing staff to continue working- if anything goes wrong. Sometimes this is a triage process. What material do you need access to continue working towards your deadlines? Other Business Continuity concepts are more robust. They address scenarios like the entire building losing all the data or a large-scale disaster making data or systems inaccessible. This could be a copy of all work in progress- a large-scale backup, living in a geographically separate facility. Alternatively, it is in the cloud, rapidly accessible to alternate creatives to continue work from anywhere. The concept is simple, your business continues to work- regardless of circumstance. Sometimes this continues while the larger effort goes on in the background- the recovery from the disaster.
Disaster RecoveryDisaster Recovery- for better or worse- has become the catch-all phrase for many concepts previously discussed. But what is it? It’s a very simple concept- When something terrible happens, Disaster Recovery is the process and strategy to recover from the point of disaster. From a physical point of perspective – if a building suffers a fire, flood, or earthquake- the process that is followed is to pull out what you can keep, rebuild this building, and eventually reopen and pick up where you left off. This is a long-term process that involves insurance companies, contractors, and painters. While that process is happening, you sometimes cannot use the workspace at all. Depending on the damage, this could take months or even years to recover from. How are Business Continuity and Disaster Recovery different? Business Continuity lets people continue to work while Disaster Recovery rebuilds the ability to work as you did before. From a data perspective, just having your media and files backed up “somewhere else” is the first step of the Disaster Recovery process. Your business will need to decide the importance of proximity for your backup- whether it needs to be stored in a different building, or a different state. This is a concept known as “blast radius”. How large of a disaster can you recover from. Once your previously damaged facility is back to being operational, the data can all be repaired and restored. Disaster Recovery takes as long as it needs to take. The important parts of disaster recovery and business continuity are how current, and accurate the backup files are compared to your primary storage.
The “3,2,1” Media Insurance RecommendationNow that you know the terms, let’s look at some basic recommendations for media management for a production. It comes down to one simple phrase that’s easy to memorize. Say it with me:
“3 backups of the data, stored on 2 different types of mediums, with 1 copy off-site.”Here are 3 examples of the “3,2,1” architecture:
- Primary, Nearline + LTO Offsite – Primary spinning disk contains the work the team is actively working on. A backup copy is on nearby nearline storage. That data is also pushed to LTO tape, and the LTO is taken to a location across town.
- Mezzanine Media + LTO Camera Originals, LTO Offsite – Primary spinning disk contains the mezzanine media. The camera originals are on LTO Tape, with a copy of the tape in the storage on the other side of the country. Camera originals are pulled to primary spinning disks for finishing before delivery.
- Nearline + Cloud Spinning Mezzanine, Cloud DR – All camera originals are ingested to nearline spinning disk. A copy of everything is pushed to a cloud deep archive, like AWS Glacier. After the data push is completed, a transcode process creates mezzanine materials for primary storage for the creative team to access. The primary storage is backed up to a more available tier of cloud storage, pushing incremental changes every hour.