NetApp Crash-Consistent Snapshots

Monday, February 27, 2012

We performed a disaster recovery (DR) test at our DR site in October 2011 which involved recovering Oracle database VMs with storage on a  SAN.  Once the database VM was recovered from a NetApp volume snapshot, it was started and we discovered Oracle went through its usual crash recovery and the database was running as expected.  No need for me to recover using the online (hot) backup files.

This prompted me to investigate whether this is a valid way to recover the Oracle database VMs for disaster recovery. I found a NetApp Technical Report called “Using Crash-Consistent Snapshot Copies as Valid Oracle Backups” which appears to support my discovery during the disaster recovery test.  I highly recommend reading the report as it contains valuable information on Oracle backup and recovery functionality.

Just a little background on our environment.  The Oracle databases are used for development so some data discard is acceptable.  The database VMs are backed up (snapshot/volume) nightly using VSC/SMVI.  The NetApp volume snapshots are copied to tape which are used to restore the data at DR.

The document describes a crash-consistent image in section 6.

“Some customers would like to have the ability to capture a backup of an online database without requiring the database to be in backup mode. In such a circumstance, the backup copy (if implemented correctly) will be technically equivalent to an image of a crashed database. That is, it is in a state similar to an image of an Oracle Database that has experienced an abrupt power loss, a server crash, or a shutdown abort. Such images are described as crash consistent.

When an instance fails or crashes, the data in the buffers of the SGA are not written to the data files, and a special flag in the control file is not set to indicate that it was a normal shutdown. Therefore, when a crash image is started, it triggers an automatic crash recovery to be performed by Oracle where the necessary redo contents are applied. This process brings the data files to a transaction-consistent state by rolling back uncommitted transactions and preserving all committed changes up to the point when the instance failed. A crash-consistent image without physical corruption can be reliably started and recovered by Oracle’s automatic crash recovery operation.”

Section 6.1 references My Oracle Support Note ID 604683.1 regarding Oracle’s guidelines for supporting third-party snapshot technology as a backup/restore solution.  Oracle will officially support backup, restore and recovery operations using third-party snapshot technologies assuming that the third party snapshot technology can meet the prerequisites.  Oracle states:

“The third party vendor needs to guarantee and held accountable that their snapshots conform to all the following requirements:

  • Integrated with Oracle’s recommended restore and recovery operations
  • Database crash consistent at the point of the snapshot
  • Write ordering is preserved for each file within a snapshot”

Our NetApp snapshot covers the last two bullet points above for our disaster recovery purposes.  Those points are supported in section 6.4 of the report shown below.   Our database files all reside on the same volume thereby preserving write ordering.

“A crash-consistent image is equivalent to or consistent with a copy of a database image after the database instance, server, or storage system has crashed.

When a checkpoint or snapshot is created by the storage system, OS utility, or user without any coordination with the source application (such as when a snapshot is created of an active database without it being in hot backup mode), the resulting snapshot is usually not crash consistent unless the copy utility has the capability to capture a consistent image across multiple volumes or storage systems.

If a database has all of its files (control files, data files, online redo logs, and archived logs) contained within a single NetApp volume, then the task is straightforward. A Snapshot copy of that single volume will provide a crash-consistent copy.”

We will continue to perform the traditional online backup mode (ALTER TABLESPACE BEGIN BACKUP/END BACKUP) which may be necessary in some situations, however for disaster recovery the volume snapshot restore will provide a more efficient restoration of the Oracle database VM.

Additional References
Using Crash-Consistent Snapshot Copies as Valid Oracle Backups

My Oracle Support – Supported Backup, Restore and Recovery Operations using Third Party Snapshot Technologies [ID 604683.1]