Oracle Database Instance Recovery Tutorial

Is It Possible to Corrupt an Oracle Database?

No, You Cannot – Not If It Is Properly Administered

Why? Because the mechanism of redo and undo is the best yet developed on this planet.

In this tutorial, Oracle Certified Master John Watson will explain redo, undo, commit, instance recovery and demonstrate why your database can never lose one row of committed data.

Learn what actually happens, in memory and on disc, when DML is issued. John will demonstrate that instance recovery is automatic, and unstoppable.

This is critical and essential knowledge for all Oracle Administrators and DBA students.

Presenter: John Watson, Oracle Certified Master

This free tutorial is segmented into fourteen separate lessons:

  1. Introduction to Oracle Certified Master John Watson and SkillBuilders (1:44)
  2. Tutorial Agenda (2:40)
  3. The Database ACID Test (3:09)
  4. Physical and Logical Corruption (1:37)
  5. Demonstration – What Does Oracle do when DML is Executed? (9:47)
  6. Commit and the Log File Sync Wait Event (1:18)
  7. When Why and How Does Oracle Write to Disk (1:58)
  8. What About Checkpoints? (1:16)
  9. Discussion about COMMIT NOWAIT (3:46)
  10. Lecture – What Oracle Database does when DML is Executed (7:22)
  11. The Oracle Instant Crash Recovery Mechanism (6:24) (click on video below)
  12. Principles to Remember (0:52)
  13. Oracle – The Incorruptible Database (1:54)
  14. Next Steps (0:47)

Date: Jun 4, 2013


NOTE: Some corporate firewalls will not allow videos hosted by YouTube.

Transcript

The Oracle Instant Crash Recovery Mechanism

Oracle Instance Recovery Tutorial

 

Module 11 – The Oracle Instant Crash Recovery Mechanism

 

>> John:  What if we lose memory?

 

[pause]

 

We have a database crash. Bang. The entire instance is destroyed. Why would that happen? Shut down/abort, someone pulls the plug out the back of the machine, ORA-600 causes everything to die. Whatever happens, an instance crash, instance failure.

 

[pause]

 

There we are with our corrupted database. It’s got all sorts of inaccurate information in it right now.

 

[pause]

 

What do we do? What we know what we do – you issue a startup command. Issue the startup command, build the instance in memory. Startup no mounts. No problem. Build the memory structures. Start the background processes.

 

[pause]

 

But then we alter database mounts. There’s also database mount my time, the SMON process is going to look at the online log files and the datafiles. It will be immediately apparent that the instance had crashed because of course all it had is a lot of sync, the SCN’s all wrong. That will trigger the instance recovery mechanism.

 

[pause]

 

How does instance recovery work? The SMON (which I didn’t actually include on the slide) goes to the online log and starts replaying change vectors. It finds this first change vector that refers to a block of table data. It copies that table data that says John’s salary is 10. It copies that block into memory and applies the change.

 

[pause]

 

So we end up with this situation.

 

[pause]

 

First change vector, copy John 10 into memory, apply the change. Next change vector refers to undo block. Copy the empty block of undo memory, block into memory, apply the change.

 

Next change vector as we process this numerically is another update to the table. We actually don’t need to do that. We can copy into memory and have a look if we want, but that change vector was already on disc. This change vector, however, was not. It’s applied to the undo segment. Nothing there, so copy the block into memory. Apply the change.

 

This is what we call the instance recovery mechanism. We are rebuilding the SGA, the state it was in before the crash. But it gets better because the instance recovery process doesn’t just rebuild the SGA, it also corrects the situation on the database.

 

[pause]

 

As it rebuilds the SGA, it checkpoints the relevant blocks. So we do through recovery mechanism not only rebuild the instance, but we also synchronize the instance with the database. The recovery mechanism done by SMON means that we now have John’s committed change and his undo on disc together of course with Susan’s uncommitted change and the undo that refers to the red transaction.

 

That is the instance recovery mechanism. Perhaps it’s that question said earlier – it’s a bit badly named. We’re recovering the instance the state it was in before the crash and at the same time synchronizing the instance with the database by simulating a checkpoint. That is before the recovery that occurs after instance failure.

 

I’ve gone ahead of my slides at this point I’m afraid. The crash recovery mechanism following the crash the database has corrupted, the forward recovery then occurs between mount and open modes. We rebuild the buffer cache to the state it was in before the crash. Both committed and uncommitted work is replayed including the undo blocks.

 

[pause]

 

Then the final stage rollback. Having completed this work we can open the database. After opening the database, we can proceed to the next stage which is where we roll back the uncommitted transaction.

 

Previous stage, we have recovered the committed blue transaction and the uncommitted red transaction. We then open the database and proceed to this state. We’ll return Susan’s row to the original level, the original point it was at, the 40. At that point, the red transaction has been reversed and is gone forever, which is correct. It was never committed. The definition of uncommitted transaction, transaction isolation has got to be reversed and backed out.

 

That is your instance recovery mechanism. Forward recovery, totally automatic. Rollback, also totally automatic. You wouldn’t want to stop it, but it is absolutely impossible to stop it even if you want it to.

 

>> Dave:  How long does this process take?

 

>> John:  You’ll see it in the alert log. You’ll see in the alert log between the alter database mounts, the alter database open, you’ll see the timings for the pass through the redo log.

 

There are in fact two passes through the redo log. I’ve simplified things a bit. One pass is to identify which change vectors need to be applied and the second pass actually applies them.

 

You’ll see the exact timings. The delay can be significant. If you have a database with 100 gigabyte SGA or whatever, and there were an extraordinary number of dirty buffers at the time of the crash, forward recovery could take a significant number of minutes.

 

It is possible to tune it to a certain extent. But look at the alert log. Just go to any database you want, the one you run on your own PC. Do some work, shut down/abort, start up, and the alert log will show you how long it takes.

 

You can also predict how long it takes by looking at the view v$ instance recovery. This will give you an idea of if the database crashed right now, how much redo would we have to replay? How many IOs on datafiles would be necessary? And about how long would it take to recover?

 

[pause]

 

In this case, I imagine it’s going to be instantaneous.

 

[pause]

 

Not quite. It would take 9 seconds.

 

You can get an estimate. But I would advise you to test it to be sure.

 

>> Dave:  Is there a way to start up without going through the instance recovery process?

 

>> John:  Absolutely impossible. Absolutely impossible. That takes us on to this point.

 

 

Copyright SkillBuilders.com 2017

×
Free Online Registration Required

The tutorial session you want to view requires your registering with us.

It’s fast and easy, and totally FREE.

And best of all, once you are registered, you’ll also have access to all the other 100’s of FREE Video Tutorials we offer!

 

×
Podcast
×