Saturday, 18 January 2014

How to simulate block corruption and do RMAN block recovery


RMAN offers a blockrecover command to recover from a block corruption.Otherwise we would have to restore the entire datafile and thereby recover it.

Lets simulate block media recovery

1) Corrupt datafile users.dbf for instance

$ dd if=/dev/zero of=/u/oracle/oradata/test/users.dbf bs=8k conv=notrunc seek=20 count=1

2) verify the block corruption using dbverify utility:

$dbv file=/u/oracle/oradata/test/users.dbf blocksize=8192

DBVERIFY: Release 10.2.0.1.0 - Production on Tue Jul 20 16:21:06 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - Verification starting : FILE = /u/oracle/oradata/test/users.dbf
Page 20 is marked corrupt
Corrupt block relative dba: 0x01000014 (file 4, block 20)
Completely zero block found during dbv:

DBVERIFY - Verification complete

Total Pages Examined         : 131072
Total Pages Processed (Data) : 87614
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 5449
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1449
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 36559
Total Pages Marked Corrupt   : 1
Total Pages Influx           : 0
Highest block SCN            : 301640 (0.301640)

Block 20 of datafile 4 is corrupted.

3) Lets do RMAN block recovery using blockrecover command.

RMAN> blockrecover datafile 4 block 20;

Starting blockrecover at 20-JUL-10
using channel ORA_DISK_1

channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00004
channel ORA_DISK_1: reading from backup piece /u1/flash_recovery_area/test/backupset/2010_07_16/o1_mf_nnndf_TAG20100716T181457_640o2b16_.bkp
channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=/u1/flash_recovery_area/test/backupset/2010_07_16/o1_mf_nnndf_TAG20100716T181457_640o2b16_.bkp tag=TAG20100716T181457
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:45

starting media recovery

archive log thread 1 sequence 17 is already on disk as file /u1/flash_recovery_area/test/archivelog/2010_07_17/o1_mf_1_17_642q4zbv_.arc
archive log thread 1 sequence 18 is already on disk as file /u1/flash_recovery_area/test/archivelog/2010_07_19/o1_mf_1_18_647vt1ps_.arc
archive log thread 1 sequence 19 is already on disk as file /u1/flash_recovery_area/test/archivelog/2010_07_19/o1_mf_1_19_647vvqrz_.arc
archive log thread 1 sequence 20 is already on disk as file /u1/flash_recovery_area/test/archivelog/2010_07_20/o1_mf_1_20_64byfgff_.arc
archive log thread 1 sequence 21 is already on disk as file /u1/flash_recovery_area/test/archivelog/2010_07_20/o1_mf_1_21_64byfgjf_.arc
archive log thread 1 sequence 22 is already on disk as file /u1/flash_recovery_area/test/archivelog/2010_07_20/o1_mf_1_22_64byfg8f_.arc
media recovery complete, elapsed time: 00:00:05
Finished blockrecover at 20-JUL-10

Thursday, 16 January 2014

Recover Standby database With Missing Archivelogs on Dataguard setup



After stoping dataguard service for some maintanence purposes, some of the archive logs got lost on primary database that were produced meantime. RMAN incremental backup was used to recover dataguard and resyncronize. Primary and standby databases don't use same RMAN catalog, so the backupset that is taken from primary backup needed to transfer and register on the standby side. Also there have been some datafile creations on primary side that were not applied on standby. Because of this, we needed to re-create the control file on standby and transfer the newly created datafiles from primary side. Here is a detailed article about this recovery process.

1-determine last SCN on standby db
PRIMARY
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3360225821

STANDBY
SQL> SELECT CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
-----------
3215410716

2-Stop log apply and transport services.

2.1 stop redo sent on primary
alter system set log_archive_dest_state_2 ='defer' scope=both ;
2.2 stop redo apply on standby
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

3-Backup primary database incremental ; from SCN last applied on standby db.
--for faster backup try with multi channel
run {
allocate channel ch1 device type disk;
allocate channel ch2 device type disk;
allocate channel ch3 device type disk;
allocate channel ch4 device type disk;
allocate channel ch5 device type disk;
allocate channel ch6 device type disk;
BACKUP INCREMENTAL FROM SCN 3215410716 DATABASE FORMAT '/intl_migration/cdrdb/backup/tmpForStandby_%U' tag 'FORSTANDBY';
release channel ch1;
release channel ch2;
release channel ch3;
release channel ch4;
release channel ch5;
release channel ch6;
}

4-Transfer backup sets to standby side.
because the incremental backup was 1 TB size ; I needed to seperate under different mount points.
Don't worry about keeping them in different folders. We will register them.
SOURCE FOLDERS
/intl_migration/cdrdb/backup/

DEST FOLDER
/medftp/backupDG
/app3/backupDG/

bin
prompt
lcd /intl_migration/cdrdb/backup/
cd /medftp/backupDG
mput tmpForStandby_rkk2lbe4_1_1 tmpForStandby_rlk2lbe5_1_1 tmpForStandby_rmk2lbe7_1_1 tmpForStandby_rnk2lbe9_1_1 tmpForStandby_rok2lbeb_1_1 tmpForStandby_rpk2lbed_1_1 tmpForStandby_rqk2m14c_1_1 tmpForStandby_rrk2m19j_1_1 tmpForStandby_rsk2m2bt_1_1 tmpForStandby_rtk2m2eu_1_1 tmpForStandby_ruk2m2km_1_1 tmpForStandby_rvk2m3m2_1_1

bin
prompt
lcd /intl_migration/cdrdb/backup/
cd /app3/backupDG/
mput tmpForStandby_s0k2mmu4_1_1 tmpForStandby_s1k2mn2d_1_1 tmpForStandby_s2k2mnlu_1_1 tmpForStandby_s3k2mnut_1_1 tmpForStandby_s4k2mob3_1_1 tmpForStandby_s5k2moee_1_1 tmpForStandby_s6k2nc22_1_1 tmpForStandby_s7k2ncda_1_1 tmpForStandby_s8k2nd6s_1_1 tmpForStandby_s9k2ne6n_1_1 tmpForStandby_sak2ne8i_1_1 tmpForStandby_sbk2nf3c_1_1 tmpForStandby_ssk2o28c_1_1

5-Register backup sets to stanby db
OnStandby db
rman target /
RMAN> CATALOG START WITH '/app3/backupDG/tmpForStandby';
RMAN> CATALOG START WITH '/medftp/backupDG/tmpForStandby';

6-Recover standby db ;
one important note ;
because this is a backup taken for only phisical standby db sync ; noredo key word is required.
See : http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/rcmdupdb.htm#sthref955

RMAN>
run {
allocate channel ch1 device type disk;
allocate channel ch2 device type disk;
allocate channel ch3 device type disk;
allocate channel ch4 device type disk;
allocate channel ch5 device type disk;
allocate channel ch6 device type disk;
allocate channel ch7 device type disk;
allocate channel ch8 device type disk;

RECOVER DATABASE NOREDO;

release channel ch1;
release channel ch2;
release channel ch3;
release channel ch4;
release channel ch5;
release channel ch6;
release channel ch7;
release channel ch8;
}

7-Create new standby control file
Before re-starting log apply service on standby db; create a new standby controlfile in primary db , copy it to standby .Creating a new controlfile is my suggestion because during non transferred and applied logs ; some chages may be done affecting controlfile like adding redo members, adding datafile, adding new tablespaces...etc

7-1 shutdown standby db instance
7-2 create new standby control file move it to standby side destinations (generally 3).
SQL> alter database create standby controlfile as '/tmp/stby.ctl'; --on primary db
scp /tmp/stby.ctl oracle@stdbyserver:/oradata/ctl<1>/ctl.dbf

7-3 start standby db in mount , and start log apply service MenagedRecoveryProcess;
SQL> startup mount;

8-OPTIONAL - Transfer newly created files. 
If new datafiles were added during the time that dataguard had been stopped as it happened to me; you need to copy the newly created files .They were not included incremental backup set;
and not created cause of stopped MRP.
8-1 determine all datafiles from database (remember we have just created a new controlfile , both primary and standby has same information)
SQL> spool '/tmp/hede.txt';
SQL> select 'file ' ,name from v$datafile;
# sh /tmp/hede.txt > fileSatus.txt
# cat fileSatus.txt grep cannot
/oradata/file004.dbf : cannot open
/oradata/file005.dbf : cannot open
Means we have to copy these 2 files to standby side.

8-2 After determining missing datafiles ; backup them as image copy in primary db ,copy to standby side.
BACKUP AS COPY DATAFILE '/oradata/file004.dbf' FORMAT '/tmp/file004.dbf' TAG stdbyImgCopy;
BACKUP AS COPY DATAFILE '/oradata/file005.dbf' FORMAT '/tmp/file005.dbf' TAG stdbyImgCopy;

scp /tmp/file004.dbf oracle@stdbyserver:/oradata/file004.dbf
scp /tmp/file005.dbf oracle@stdbyserver:/oradata/file005.dbf

9-Re-start log apply and transfer services.
9.1 start redo sent on primary
alter system set log_archive_dest_state_2 ='enable' scope=both ;

9.2 start redo apply on standby
SQL> startup mount;
SQL>ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;

9-3 check if for any problems; you may encounter problems. Check alert.log and status of proceesses
SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY;

After success of this operation We were freed of time and space to re-establish all 30 TB database.
A similar workaound is documented in metalink for Oracle 9i : Doc ID:290817.1