RTO and RPO

Monday, October 14, 2013

What is the difference between Recovery Time Objective (RTO) and Recovery Point Objective (RPO)?  That was a question that came up recently in a disaster recovery discussion.  Good explanation provided by Dejan Kosutic’s Blog entry.

In summary:

  • Recovery Time Objective (RTO) is “target time set for resumption of product, service or activity delivery after an incident”.
  • Recovery Point Objective (RPO) is “the maximum tolerable period in which data might be lost”.

NetApp Crash-Consistent Snapshots

Monday, February 27, 2012

We performed a disaster recovery (DR) test at our DR site in October 2011 which involved recovering Oracle database VMs with storage on a  SAN.  Once the database VM was recovered from a NetApp volume snapshot, it was started and we discovered Oracle went through its usual crash recovery and the database was running as expected.  No need for me to recover using the online (hot) backup files.

This prompted me to investigate whether this is a valid way to recover the Oracle database VMs for disaster recovery. I found a NetApp Technical Report called “Using Crash-Consistent Snapshot Copies as Valid Oracle Backups” which appears to support my discovery during the disaster recovery test.  I highly recommend reading the report as it contains valuable information on Oracle backup and recovery functionality.

Just a little background on our environment.  The Oracle databases are used for development so some data discard is acceptable.  The database VMs are backed up (snapshot/volume) nightly using VSC/SMVI.  The NetApp volume snapshots are copied to tape which are used to restore the data at DR.

The document describes a crash-consistent image in section 6.

“Some customers would like to have the ability to capture a backup of an online database without requiring the database to be in backup mode. In such a circumstance, the backup copy (if implemented correctly) will be technically equivalent to an image of a crashed database. That is, it is in a state similar to an image of an Oracle Database that has experienced an abrupt power loss, a server crash, or a shutdown abort. Such images are described as crash consistent.

When an instance fails or crashes, the data in the buffers of the SGA are not written to the data files, and a special flag in the control file is not set to indicate that it was a normal shutdown. Therefore, when a crash image is started, it triggers an automatic crash recovery to be performed by Oracle where the necessary redo contents are applied. This process brings the data files to a transaction-consistent state by rolling back uncommitted transactions and preserving all committed changes up to the point when the instance failed. A crash-consistent image without physical corruption can be reliably started and recovered by Oracle’s automatic crash recovery operation.”

Section 6.1 references My Oracle Support Note ID 604683.1 regarding Oracle’s guidelines for supporting third-party snapshot technology as a backup/restore solution.  Oracle will officially support backup, restore and recovery operations using third-party snapshot technologies assuming that the third party snapshot technology can meet the prerequisites.  Oracle states:

“The third party vendor needs to guarantee and held accountable that their snapshots conform to all the following requirements:

  • Integrated with Oracle’s recommended restore and recovery operations
  • Database crash consistent at the point of the snapshot
  • Write ordering is preserved for each file within a snapshot”

Our NetApp snapshot covers the last two bullet points above for our disaster recovery purposes.  Those points are supported in section 6.4 of the report shown below.   Our database files all reside on the same volume thereby preserving write ordering.

“A crash-consistent image is equivalent to or consistent with a copy of a database image after the database instance, server, or storage system has crashed.

When a checkpoint or snapshot is created by the storage system, OS utility, or user without any coordination with the source application (such as when a snapshot is created of an active database without it being in hot backup mode), the resulting snapshot is usually not crash consistent unless the copy utility has the capability to capture a consistent image across multiple volumes or storage systems.

If a database has all of its files (control files, data files, online redo logs, and archived logs) contained within a single NetApp volume, then the task is straightforward. A Snapshot copy of that single volume will provide a crash-consistent copy.”

We will continue to perform the traditional online backup mode (ALTER TABLESPACE BEGIN BACKUP/END BACKUP) which may be necessary in some situations, however for disaster recovery the volume snapshot restore will provide a more efficient restoration of the Oracle database VM.

Additional References
Using Crash-Consistent Snapshot Copies as Valid Oracle Backups

My Oracle Support – Supported Backup, Restore and Recovery Operations using Third Party Snapshot Technologies [ID 604683.1]


Database Crash with Tablespace in Hot Backup

Monday, June 6, 2011

I encountered this problem twice before a few years ago and most recently last week.  Adding this to my notes when it happens again a few years from now.

Snippet from alert log showing database crashed.

Fri May 27 19:07:27 Eastern Daylight Time 2011
Errors in file i:\oradba\admin\pd102\bdump\pd102_pmon_1464.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [kksCursorFreeCallBack+112] [PC:0x71E70C] [ADDR:0x68] [UNABLE_TO_READ] []

Fri May 27 19:07:45 Eastern Daylight Time 2011
LGWR: terminating instance due to error 472
Fri May 27 19:07:59 Eastern Daylight Time 2011
Instance terminated by LGWR, pid = 1508

Login to restart the database.

> sqlplus "sys as sysdba"

SQL*Plus: Release 10.2.0.5.0 - Production on Fri May 27 20:52:14 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Enter password:
Connected to an idle instance.

SQL> startup;
ORACLE instance started.

Total System Global Area 1157627904 bytes
Fixed Size                  2077040 bytes
Variable Size             201330320 bytes
Database Buffers          939524096 bytes
Redo Buffers               14696448 bytes
Database mounted.

ORA-01113: file 8 needs media recovery if it was restored from backup, or END BACKUP if it was not
ORA-01110: data file 8: 'I:\ORADATA\PD102\TMF43WT.DBF'

I reviewed the alert log and was able to determine the data file specified was in backup mode when the database terminated.  Following the instructions in Oracle Support Doc ID 183367.1, I executed the commands below to resolve the problem.

SQL> alter database datafile 'I:\ORADATA\PD102\TMF43WT.DBF' end backup;

Database altered.

SQL> alter database open;

Database altered.

References
Common Causes and Solutions on ORA-1113 Error Found in Backup & Recovery (Doc ID 183367.1)

AT STARTUP AFTER CRASH WITH TABLESPACE(S) IN HOT BACKUP *******************************************************
1. Mount the database.

   SQL> STARTUP MOUNT;

2. Find out which datafiles were in hot backup mode when the database crashed or was shutdown abort or the machine was rebooted by running the query:

   SQL> SELECT V1.FILE#, NAME
FROM V$BACKUP V1, V$DATAFILE V2
WHERE V1.STATUS = 'ACTIVE' AND V1.FILE# = V2.FILE#;

3. For each of the files returned by the above query, issue the command:

   SQL> ALTER DATABASE DATAFILE '<full path name>' END BACKUP;

4. Open the database.

   SQL> ALTER DATABASE OPEN;