NetApp Crash-Consistent Snapshots

Monday, February 27, 2012

We performed a disaster recovery (DR) test at our DR site in October 2011 which involved recovering Oracle database VMs with storage on a  SAN.  Once the database VM was recovered from a NetApp volume snapshot, it was started and we discovered Oracle went through its usual crash recovery and the database was running as expected.  No need for me to recover using the online (hot) backup files.

This prompted me to investigate whether this is a valid way to recover the Oracle database VMs for disaster recovery. I found a NetApp Technical Report called “Using Crash-Consistent Snapshot Copies as Valid Oracle Backups” which appears to support my discovery during the disaster recovery test.  I highly recommend reading the report as it contains valuable information on Oracle backup and recovery functionality.

Just a little background on our environment.  The Oracle databases are used for development so some data discard is acceptable.  The database VMs are backed up (snapshot/volume) nightly using VSC/SMVI.  The NetApp volume snapshots are copied to tape which are used to restore the data at DR.

The document describes a crash-consistent image in section 6.

“Some customers would like to have the ability to capture a backup of an online database without requiring the database to be in backup mode. In such a circumstance, the backup copy (if implemented correctly) will be technically equivalent to an image of a crashed database. That is, it is in a state similar to an image of an Oracle Database that has experienced an abrupt power loss, a server crash, or a shutdown abort. Such images are described as crash consistent.

When an instance fails or crashes, the data in the buffers of the SGA are not written to the data files, and a special flag in the control file is not set to indicate that it was a normal shutdown. Therefore, when a crash image is started, it triggers an automatic crash recovery to be performed by Oracle where the necessary redo contents are applied. This process brings the data files to a transaction-consistent state by rolling back uncommitted transactions and preserving all committed changes up to the point when the instance failed. A crash-consistent image without physical corruption can be reliably started and recovered by Oracle’s automatic crash recovery operation.”

Section 6.1 references My Oracle Support Note ID 604683.1 regarding Oracle’s guidelines for supporting third-party snapshot technology as a backup/restore solution.  Oracle will officially support backup, restore and recovery operations using third-party snapshot technologies assuming that the third party snapshot technology can meet the prerequisites.  Oracle states:

“The third party vendor needs to guarantee and held accountable that their snapshots conform to all the following requirements:

  • Integrated with Oracle’s recommended restore and recovery operations
  • Database crash consistent at the point of the snapshot
  • Write ordering is preserved for each file within a snapshot”

Our NetApp snapshot covers the last two bullet points above for our disaster recovery purposes.  Those points are supported in section 6.4 of the report shown below.   Our database files all reside on the same volume thereby preserving write ordering.

“A crash-consistent image is equivalent to or consistent with a copy of a database image after the database instance, server, or storage system has crashed.

When a checkpoint or snapshot is created by the storage system, OS utility, or user without any coordination with the source application (such as when a snapshot is created of an active database without it being in hot backup mode), the resulting snapshot is usually not crash consistent unless the copy utility has the capability to capture a consistent image across multiple volumes or storage systems.

If a database has all of its files (control files, data files, online redo logs, and archived logs) contained within a single NetApp volume, then the task is straightforward. A Snapshot copy of that single volume will provide a crash-consistent copy.”

We will continue to perform the traditional online backup mode (ALTER TABLESPACE BEGIN BACKUP/END BACKUP) which may be necessary in some situations, however for disaster recovery the volume snapshot restore will provide a more efficient restoration of the Oracle database VM.

Additional References
Using Crash-Consistent Snapshot Copies as Valid Oracle Backups

My Oracle Support – Supported Backup, Restore and Recovery Operations using Third Party Snapshot Technologies [ID 604683.1]

Advertisements

SYSAUX tablespace growing rapidly

Saturday, December 3, 2011

I have an Oracle 11g R2 (11.2.0.1) database where I noticed the SYSAUX tablespace was growing larger every day.  Searched Oracle My Support and found Doc ID 1292724.1 and Doc ID 552880.1 which were helpful.

After running awrinfo.sql, I found the largest consumer to be SM/OPTSTAT at 2.8 GB which is larger and not typical when compared to my other Oracle 10g R2 and Oracle 11g R1 databases.

********************************************************
(1b) SYSAUX occupants space usage (v$sysaux_occupants)  
********************************************************
|                                                       
| Occupant Name        Schema Name               Space Usage
| -------------------- -------------------- ----------------
| SM/OPTSTAT           SYS                        2,726.8 MB
| SM/AWR               SYS                           76.6 MB
| SM/ADVISOR           SYS                            8.2 MB
| LOGMNR               SYSTEM                         7.9 MB
| SM/OTHER             SYS                            6.8 MB
| JOB_SCHEDULER        SYS                            3.6 MB

**********************************
(4) Space usage by non-AWR components (> 500K)
**********************************

COMPONENT        MB SEGMENT_NAME                            SEGMENT_TYPE  
--------- --------- --------------------------------------- --------------
NON_AWR       800.0 SYS.I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST      INDEX         
NON_AWR       752.0 SYS.WRI$_OPTSTAT_HISTGRM_HISTORY        TABLE         
NON_AWR       424.0 SYS.I_WRI$_OPTSTAT_H_ST                 INDEX         
NON_AWR       240.0 SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY       TABLE         
NON_AWR       192.0 SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST       INDEX         
NON_AWR       104.0 SYS.I_WRI$_OPTSTAT_HH_ST                INDEX

Searching Google, I came upon John Hallas’ “Purging statistics from the SYSAUX tablespace” blog entry quite informative and detailed on managing occupants in the SYSAUX tablespace.

As John mentions:

“Whenever statistics in the dictionary are modified, old versions of statistics are saved automatically for future restoring. The old statistics are purged automatically at regular intervals based on the statistics history retention setting and the time of recent statistics gathering performed in the system. Retention is configurable using the ALTER_STATS_HISTORY_RETENTION procedure. The default value is 31 days.”

To help reduce the amount of space caused by historical data, I set the retention of old statistics to 14 days.

exec dbms_stats.alter_stats_history_retention(14);

To verify the new setting, I executed the statement below.

select dbms_stats.get_stats_history_retention from dual;

I proceeded to incrementally purge statistics from 31 days to 14 days.  Started with 28 days, then moved to 21 days.

exec dbms_stats.purge_stats(sysdate-28);

Then I tried 14 days, however I encountered an UNDO space error requiring me to change my increments using 18, 16, and finally 14 to complete the purge task.

Now to reclaim the space used by the objects shown in section 4 of the AWRINFO report, I will eventually follow John Hallas’ steps using the ALTER TABLE MOVE command.


Database Crash with Tablespace in Hot Backup

Monday, June 6, 2011

I encountered this problem twice before a few years ago and most recently last week.  Adding this to my notes when it happens again a few years from now.

Snippet from alert log showing database crashed.

Fri May 27 19:07:27 Eastern Daylight Time 2011
Errors in file i:\oradba\admin\pd102\bdump\pd102_pmon_1464.trc:
ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [kksCursorFreeCallBack+112] [PC:0x71E70C] [ADDR:0x68] [UNABLE_TO_READ] []

Fri May 27 19:07:45 Eastern Daylight Time 2011
LGWR: terminating instance due to error 472
Fri May 27 19:07:59 Eastern Daylight Time 2011
Instance terminated by LGWR, pid = 1508

Login to restart the database.

> sqlplus "sys as sysdba"

SQL*Plus: Release 10.2.0.5.0 - Production on Fri May 27 20:52:14 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Enter password:
Connected to an idle instance.

SQL> startup;
ORACLE instance started.

Total System Global Area 1157627904 bytes
Fixed Size                  2077040 bytes
Variable Size             201330320 bytes
Database Buffers          939524096 bytes
Redo Buffers               14696448 bytes
Database mounted.

ORA-01113: file 8 needs media recovery if it was restored from backup, or END BACKUP if it was not
ORA-01110: data file 8: 'I:\ORADATA\PD102\TMF43WT.DBF'

I reviewed the alert log and was able to determine the data file specified was in backup mode when the database terminated.  Following the instructions in Oracle Support Doc ID 183367.1, I executed the commands below to resolve the problem.

SQL> alter database datafile 'I:\ORADATA\PD102\TMF43WT.DBF' end backup;

Database altered.

SQL> alter database open;

Database altered.

References
Common Causes and Solutions on ORA-1113 Error Found in Backup & Recovery (Doc ID 183367.1)

AT STARTUP AFTER CRASH WITH TABLESPACE(S) IN HOT BACKUP *******************************************************
1. Mount the database.

   SQL> STARTUP MOUNT;

2. Find out which datafiles were in hot backup mode when the database crashed or was shutdown abort or the machine was rebooted by running the query:

   SQL> SELECT V1.FILE#, NAME
FROM V$BACKUP V1, V$DATAFILE V2
WHERE V1.STATUS = 'ACTIVE' AND V1.FILE# = V2.FILE#;

3. For each of the files returned by the above query, issue the command:

   SQL> ALTER DATABASE DATAFILE '<full path name>' END BACKUP;

4. Open the database.

   SQL> ALTER DATABASE OPEN;


Expired (Grace) Oracle Account

Wednesday, May 18, 2011

A user indicated he was getting the error below, but could still connect.

ORA-28002: the password will expire within 7 days

In Oracle 11g, the default expiration date for the “DEFAULT” profile has changed from UNLIMITED to 180 days.  Changed the setting with the SQL below.

ALTER PROFILE default LIMIT PASSWORD_LIFE_TIME unlimited;

Note:  This change only affects accounts who have not entered their grace period.  Accounts that have entered their grace period will have to change their passwords.

Must make a note in my database creation scripts to adjust this setting to UNLIMITED.  Not saying this is a good decision for a production database, but for my development and lab databases this will eliminate the problem with an expiring account.


Installing Oracle10gR2 Database on Windows 2008 R2

Tuesday, April 26, 2011

Oracle Database 10g R2 is certified on Windows 2008 R2 with the stipulation that you run 10.2.0.5.  However, the 64-bit Oracle base version (10.2.0.4) encounters an error when running setup.exe.

Click image to see larger view

Modified the “Certified Versions” section of the oraparam.ini located in I:\_oratemp\database\install.  Added “,6.1” to the end of the Windows parameter.

[Certified Versions]
#You can customise error message shown for failure, provide value for CERTIFIED_VERSION_FAILURE_MESSAGE
Windows=5.0,5.1,5.2,6.0,6.1

Executed the Oracle Installer again allowing me to proceed with the installation.

Click image for larger view

Two more errors are encountered in OUI Product-Specific Prerequisite Checks, however you can check off in the checkbox and the status is changed to “User Verified” and OUI allows you to continue with the installation.

Click image for larger view

Reference
Installing Oracle Database Client 10g Release 2 (10.2) on a Windows 2008 R2 x64


SQL*Plus Permission Denied

Sunday, February 6, 2011

After installing Oracle 11g Client on RHEL 5.4, I encountered a “permission denied” error while attempting to invoke SQL*Plus.  This occurred on two separate installations one 11gR1 and the other 11gR2.

First, the 11gR1 installation with the following symptoms:

Non-oracle user is trying to login to database via SQL*Plus, but attempts to access sqlplus result in a permission denied error.  Execution of sqlplus as oracle user works fine.

This is explained in Oracle Support Doc ID 1223449.1.

Cause
There is a lack of ‘read “r”‘ and ‘execute “x”‘ permissions on an upper directory which was not allowing access to sqlplus file.

Without execute permission on a directory, a user cannot access files in that directory even if they own them and have all permissions on them.

Solution
Add “r-x” permissions to “others” on upper directory (ex. /u01/app).

Following the recommendation above, I executed the commands below to successfully remediate the problem.

% cd /u01/app
% chmod 755 oracle

For the 11gR2 installation, I changed the permission as suggested, however the error below arises:

sqlplus: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory

An explanation is provided in Oracle Support Doc ID 1206898.1.

Cause
The environment variable LD_LIBRARY_PATH was not set to include the 11g Client directory or the environment variable was set incorrectly.

Solution
Set the environment variable LD_LIBRARY_PATH to include the 11g Client directory.

Following the instructions mentioned above resolved my problems and all is functioning well.


32-bit Oracle Client on Linux x86-64 is *not* supported

Sunday, February 6, 2011

I mentioned in a previous post “32-bit Oracle Database server is *not* certified on Windows x64” the problem I encountered with attempting to install 32-bit Oracle on Windows.  Now I have encountered sort of the same problem only with 32-bit Oracle 11gR1 Client on Linux x86-64 (RHEL 5.4).  So again I post a personal reminder to myself.

When I installed the 32-bit Oracle 11gR1 Client, the installer raised an error in the linking phase.  Reviewing the installation log, I saw errors referring to “ld: cannot find -lxml11”.  Searched My Oracle Support and found Oracle Support Doc ID 549555.1.

Cause
The issue is caused by installation of 11g 32bit on a 64bit Linux Operating System.

Solution
Download and install 11g 64bit for Linux 64bit Operating System.

Also, upon reviewing Oracle Support Doc ID 1075717.1, it states:

  • The only release of 32-bit RDBMS Client software that will properly install on an x86_64 Linux OS system is 11.2.
  • 32-bit client software from any earlier Oracle Database release (9.2, 10.1, 10.2 or 11.1) is NOT supported and will NOT install on any x86_64 Linux OS system

However, according to the same support document, it appears 32-bit Oracle Instant Client could be installed.

In some specific situations, a customer may be directed to place 32-bit client libraries without “installing” 32-bit Client software.  The following method will  “place” (not “install”) only the 32-bit client libraries on x86_64 Linux.

  • The Oracle “Instant Client” product could be installed. This will have no executable binary files, and is accomplished by beginning an OUI (aka runInstaller) installation of the 32-bit Client software on x86_64 Linux. The key step is to choose the “Instant Client” installation “type”.

What started me down this road to discovery is we thought it was necessary to install the 32-bit Oracle Client for Documentum eContent Server 6.5 SP3.  We had seen forums indicating this was required on Windows 2008 environments.  However, for Linux x86-64, Oracle Client 64-bit was installed and eContent 6.5 SP3 worked without incident.