Oracle: DST struck my RMAN

On monday november 1st I discovered this error in one of my RMAN-logs:

using target database controlfile instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=11 devtype=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 11/01/2010 02:46:36
ORA-01455: converting column overflows integer datatype

The Sunday before we had a switch from daylight saving time (DST) back to normal time (minus 1 hour between 2 a.m. and 3 a.m.). Unfortunately my backup run exactly in that very hour and so I had a backupset whose completion time was before its starting time. My Oracle 9.2 RMAN couldn’t cope with that “paradox” and refused to execute commands like “LIST BACKUP” or “CROSSCHECK BACKUP”.

Metalink, or MOS (My Oracle Support), told me that this is fixed in 10.1.0.2. In 9.2 I should query “rc_backup_piece” for all backups “where start_time > completion_time”:

select piece#, bp_key, start_time, completion_time from rc_backup_piece
 where start_time > completion_time;

And then delete these “wrong” backups.

Pity me I still run that DB in NOCATALOG-mode and so I had no view to point that query to. So first I had to set up a RMAN-Catalog on another DB and synchronised my DB’s “RMAN” with it.

After I got the number of the incriminated backupset I deleted it. But when I run RMAN in NOCATALOG-mode afterwards I still got that error – using a catalog all was fine.

Seems as if RMAN didn’t logged the DELETE in my controlfile. The synchronization only goes from controlfile to catalog. So I had a wrong record in my CTL.

Knowing that the space for RMAN-data in CTLs is limited, I gave a try to reduce the control_file_record_keep_time:

alter system set control_file_record_keep_time=1;

Hoping that the wrong data would age out the next day. Well it took a few days longer than that but finally it worked as expected and my CTL is clean now.

More notes on MOS:
Bug 2665255
ID 222392.1

Edit 2012-03-05:
Here’s an action-log of a recent case in that matter:

15$ sqlplus rmancat@mbfas

SQL*Plus: Release 9.2.0.4.0 - Production on Mon Mar 5 14:54:38 2012

Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.

Enter password:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Data Mining and Real Application Testing options

SQL> alter session set NLS_DATE_FORMAT='dd.mm.yyyy hh24:mi:ss';

Session altered.

SQL> select piece#, bp_key, start_time, completion_time from rc_backup_piece where start_time > completion_time;

    PIECE#     BP_KEY START_TIME          COMPLETION_TIME
---------- ---------- ------------------- -------------------
         1      32126 30.10.2011 02:55:18 30.10.2011 02:07:52

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Data Mining and Real Application Testing options



22$ rman catalog rmancat/pwd@mbfas target sys/pwd@ws95

Recovery Manager: Release 9.2.0.4.0 - 64bit Production

Copyright (c) 1995, 2002, Oracle Corporation.  All rights reserved.

connected to target database: WS95 (DBID=1646722213)
connected to recovery catalog database

RMAN> delete backuppiece 32126;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=11 devtype=DISK

List of Backup Pieces
BP Key  BS Key  Pc# Cp# Status      Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
32126   32124   1   1   AVAILABLE   DISK        /usr2/oracle/mnt2/oradata/ws95/backup/WS95_61mqc68k_1_1_20111030.rmn

Do you really want to delete the above objects (enter YES or NO)? YES
deleted backup piece
backup piece handle=/usr2/oracle/mnt2/oradata/ws95/backup/WS95_61mqc68k_1_1_20111030.rmn recid=8335 stamp=765860118
Deleted 1 objects

RMAN>
Advertisements

2 Responses to Oracle: DST struck my RMAN

  1. Viqash says:

    I am also facing the same issue and have minimized control_file_record_keep_time paramter to 2 but still the issue is same. In your case how many days it took to clear the control file and what was the value set by you.

    • logbuffer says:

      Sorry, I can’t recall the number of days exactly but I think it lasted roughly a week – although I set control_file_record_keep_time=1. But if you can manage to use a catalog as long, you should have no problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: