Sunday, August 3, 2014

Block Corruption in Oracle Database

During backup and recovery practice sessions, we often struggle to perform block recovery scenario. This is because we find it difficult to corrupt an Oracle block.

I performed this test on Oracle 10g Release 2 (10.2.0.1) on Windows XP and for the purpose of this test we need to keep our database in archivelog mode. In this article, I will discuss how to corrupt an Oracle data block, but before beginning this discussion, I would like to answer:

Why to corrupt an Oracle Block?

We will be corrupting an Oracle block in order to practice recovery procedures involved when one encounters a Block Corruption in a production environment. If a block gets corrupted in any of our production databases we will be in a position to rectify and correct the error instead of wandering for help.

This is purely for educational purpose and please do not practice this on any of your production/development/testing databases, rather create a new database for this purpose and practice it there.

For the purpose of this test, I have created a separate tablespace and a new schema.

SQL> create tablespace corrupt_ts datafile '/u02/mydb/data/corrupt01.dbf' size 10m;  

Tablespace created.  
 
SQL> create user test identified by test default tablespace corrupt_ts;  

User created.  
 
SQL> grant connect, resource to test; 

Grant succeeded.  
 
Create and populate test table with dummy data as shown:

SQL> conn test/test 

Connected. 
 
SQL> create table t1 as select rownum rno, object_name from all_objects 
  2  where object_name like 'AQ%'; 
 
Table created. 
 
SQL> select count(*) from t1; 
 
  COUNT(*) 
---------- 
        42 

Insert a record into this table which we will be corrupting:

SQL> insert into t1 values (99, 'LET ME CORRUPT'); 
 
1 row created. 
 
SQL> commit; 
 
Commit complete. 

Let us take RMAN full database backup before we corrupt the block.

RMAN> backup format '/u02/rman/fulldb_%U' database plus archivelog; 
 
Starting backup at 01-FEB-08 
current log archived 
piece handle=/u02/rman/fulldb_0LJ7LIML_1_1 tag=TAG20080201T234641 comment=NONE 
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:28 
Finished backup at 02-FEB-08 
 
Starting backup at 02-FEB-08 
current log archived 
using channel ORA_DISK_1 
channel ORA_DISK_1: starting archive log backupset 
channel ORA_DISK_1: specifying archive log(s) in backup set 
input archive log thread=1 sequence=105 recid=105 stamp=645581560 
channel ORA_DISK_1: starting piece 1 at 02-FEB-08 
channel ORA_DISK_1: finished piece 1 at 02-FEB-08 
piece handle==/u02/rman/fulldb_0MJ7LINS_1_1 tag=TAG20080202T001242 comment=NONE 
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:04 
Finished backup at 02-FEB-08 
 
Take the tablespace offline so that we can make changes to the datafile. There are many freeware and shareware Hex Editors available in the market. I am using UltraEdit editor to make changes in our datafile.

SQL> alter tablespace corrupt_ts offline; 
 
Tablespace altered. 

In Windows

Open datafile “'c:\mydb\data\corrupt01.dbf” using UltraEdit (press “Ctrl+h” to toggle between Hex Mode). Search for our record entry “LET ME CORRUPT” in the file and changed “CORRUPT” to “NORRUPT” and save the file and close UltraEdit. I just changed “C” to “N”.

In Linux

For example I want to corrupt 13th block of mytbs01.dbf:

$ dd if=/dev/zero of=/u02/mydb/data/corrupt01.dbf bs=8k conv=notrunc seek=13 count=1;

1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.0186921 seconds, 438 kB/s

Bring back the tablespace to online mode.

SQL> alter tablespace corrupt_ts online; 
 
Tablespace altered. 

You may notice that Oracle doesn’t complain when it brings the datafile online because the file header wasn’t modified. Oracle will complain only when it tries to access the corrupt blocks. Let’s see what happens when we try to query table “T1”.

SQL> conn test/test 
Connected. 

SQL> select * from t1; 
 
       RNO OBJECT_NAME 
---------- ------------------------------ 
         1 AQ$_AGENT 
         2 AQ$_DEQUEUE_HISTORY 
          : 
          : 
 
        30 AQ$_JMS_NAMEARRAY 
ERROR: 
ORA-01578: ORACLE data block corrupted (file # 6, block # 13) 
ORA-01110: data file 6: '/u02/mydb/data/corrupt01.dbf ' 
 
30 rows selected. 

Whenever we get these kind of error messages, we need to check all the blocks

RMAN> validate check logical database;

Starting validate at 16-NOV-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=107 device type=DISK
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00001 name=/u02/mydb/data/system01.dbf
input datafile file number=00002 name=/u02/mydb/data/sysaux01.dbf
input datafile file number=00003 name=/u02/mydb/data/undotbs01.dbf
input datafile file number=00004 name=/u02/mydb/data/users01.dbf
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              17594        38400           277491
 File Name: /u02/mydb/data/system01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              13854
 Index      0              4487
 Other      0              2465

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              20381        25600           277631
 File Name: /u02/mydb/data/sysaux01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              869
 Index      0              957
 Other      0              3393

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              541          22784           277631
 File Name: /u02/mydb/data/undotbs01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              0
 Index      0              0
 Other      0              22243

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    FAILED 0              1133         1280            271968
 File Name: /u02/mydb/data/corrupt01.dbf
 Block Type Blocks Failing Blocks Processed
 ---------- -------------- ----------------
 Data       0              10
 Index      0              0
 Other      1              137

validate found one or more corrupt blocks
See trace file /home/oracle/prima/diag/rdbms/prima/prima/trace/prima_ora_18316.trc for details
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              612
Finished validate at 16-NOV-10

Query returns 30 records and then complains of block corruption in file 6. Block numbered 13 is being reported as corrupt. Let us see what all blocks are corrupt in “corruption01.dbf” datafile by running dbv utility.

Let’s verify corrupted block:

$ dbv file=/u02/mydb/data/corrupt01.dbf blocksize=8192
    
DBVERIFY: Release 10.2.0.1.0 - Production on Mon Feb 4 00:00:11 2008 
 
Copyright (c) 1982, 2005, Oracle.  All rights reserved. 
 
DBVERIFY - Verification starting : FILE = /u02/mydb/data/corrupt01.dbf 
Page 13 is marked corrupt 
Corrupt block relative dba: 0x0180000d (file 6, block 13) 
Bad check value found during dbv: 
Data in bad block: 
 type: 6 format: 2 rdba: 0x0180000d 
 last change scn: 0x0000.0039aa9f seq: 0x3 flg: 0x06 
 spare1: 0x0 spare2: 0x0 spare3: 0x0 
 consistency value in tail: 0xaa9f0603 
 check value in block header: 0x85b0 
 computed block checksum: 0x1b00 
 
DBVERIFY - Verification complete 
 
Total Pages Examined         : 1280 
Total Pages Processed (Data) : 4 
Total Pages Failing   (Data) : 0 
Total Pages Processed (Index): 0 
Total Pages Failing   (Index): 0 
Total Pages Processed (Other): 11 
Total Pages Processed (Seg)  : 0 
Total Pages Failing   (Seg)  : 0 
Total Pages Empty            : 1264 
Total Pages Marked Corrupt   : 1 
Total Pages Influx           : 0 
Highest block SCN            : 3779231 (0.3779231) 
C:\ora10g\BIN> 

This utility scans all the blocks in a given datafile and outputs the corrupt ones. In my case, I have one block marked as corrupt. Make a note of all the corrupt blocks as we need to recover them to previous state.

Start RMAN session and recover all the corrupt blocks. The beauty of RMAN is that it leaves the entire datafile online except the corrupted blocks and we need to recover only those corrupt blocks instead of entire datafile.

Method 1:
  
RMAN> blockrecover datafile 6 block 13; 
 
Starting blockrecover at 04-FEB-08 
using target database control file instead of recovery catalog 
allocated channel: ORA_DISK_1 
channel ORA_DISK_1: sid=44 devtype=DISK 
 
channel ORA_DISK_1: restoring block(s) 
channel ORA_DISK_1: specifying block(s) to restore from backup set 
restoring blocks of datafile 00006 
channel ORA_DISK_1: reading from backup piece C:\MYDB\RMAN\FULLDB_0KJ7LH72_1_1 
channel ORA_DISK_1: restored block(s) from backup piece 1 
piece handle=C:\MYDB\RMAN\FULLDB_0KJ7LH72_1_1 tag=TAG20080201T234641 
channel ORA_DISK_1: block restore complete, elapsed time: 00:00:36 
 
starting media recovery 
 
archive log thread 1 sequence 105 is already on disk as file C:\MYDB\FRA\MYDB\ARCHIVELOG\2008_02_02\O1_MF_1_10 
5_3T72T48S_.ARC 
archive log thread 1 sequence 106 is already on disk as file C:\MYDB\FRA\MYDB\ARCHIVELOG\2008_02_03\O1_MF_1_10 
6_3TD97K0Z_.ARC 
media recovery complete, elapsed time: 00:00:35 
Finished blockrecover at 04-FEB-08 
  
RMAN reports success of block recovery command. Let us query the table again by logging in to SQL*Plus:
  
SQL> select * from t1; 
 
       RNO OBJECT_NAME 
---------- ------------------------------ 
         1 AQ$_AGENT 
         2 AQ$_DEQUEUE_HISTORY 
          : 
          : 
 
        41 AQ$_JMS_ARRAY_ERROR_INFO 
        42 AQ$_JMS_ARRAY_ERRORS 
        99 LET ME CORRUPT 
 
43 rows selected. 

the query runs successfully and our original record is restored.

Method 2:

The next 11g New Feature here is: It will take the block out of the Flashback Logs, if present there.

RMAN> blockrecover corruption list;

Starting recover at 16-NOV-10
using channel ORA_DISK_1
searching flashback logs for block images
finished flashback log search, restored 1 blocks

starting media recovery
media recovery complete, elapsed time: 00:00:01

Finished recover at 16-NOV-10

During the whole process, i.e validate check logical database & blockrecover corruption list, the corrupt tablespace remains online and usable, except the table

No comments:

Post a Comment