Back

Oracle Database Instance Recovery Tutorial

Is It Possible to Corrupt an Oracle Database?

No, You Cannot – Not If It Is Properly Administered

Why? Because the mechanism of redo and undo is the best yet developed on this planet.

In this tutorial, Oracle Certified Master John Watson will explain redo, undo, commit, instance recovery and demonstrate why your database can never lose one row of committed data.

Learn what actually happens, in memory and on disc, when DML is issued. John will demonstrate that instance recovery is automatic, and unstoppable.

This is critical and essential knowledge for all Oracle Administrators and DBA students.

Presenter: John Watson, Oracle Certified Master

This free tutorial is segmented into fourteen separate lessons:

  1. Introduction to Oracle Certified Master John Watson and SkillBuilders (1:44) (click on video below) click to read the Transcript(click to read the Transcript)
  2. Tutorial Agenda (2:40) click to read the Transcript
  3. The Database ACID Test (3:09) click to read the Transcript
  4. Physical and Logical Corruption (1:37) click to read the Transcript
  5. Demonstration – What Does Oracle do when DML is Executed? (9:47) click to read the Transcript
  6. Commit and the Log File Sync Wait Event (1:18) click to read the Transcript
  7. When Why and How Does Oracle Write to Disk (1:58) click to read the Transcript
  8. What About Checkpoints? (1:16) click to read the Transcript
  9. Discussion about COMMIT NOWAIT (3:46) click to read the Transcript
  10. Lecture – What Oracle Database does when DML is Executed (7:22) click to read the Transcript
  11. The Oracle Instant Crash Recovery Mechanism (6:24) click to read the Transcript
  12. Principles to Remember (0:52) click to read the Transcript
  13. Oracle – The Incorruptible Database (1:54) click to read the Transcript
  14. Next Steps (0:47) click to read the Transcript

Date: Jun 4, 2013


NOTE: Some corporate firewalls will not allow videos hosted by YouTube.

Transcript

Introduction to Oracle Certified Master John Watson and SkillBuilders

Oracle Instance Recovery Tutorial

 

Module 1 – Introduction to OCM John Watson

 

>> Dave:  Welcome to today’s webinar. Today we ask the question, is it possible to corrupt an Oracle database? 

 

We will learn why the answer is no, not if it’s properly administered. 

 

[pause]

 

We will learn about the redo and undo mechanisms and a fantastic process called instance recovery. We believe this knowledge is essential for every Oracle database administrator. 

 

[pause]

 

My name is Dave Anderson. I’ll moderate today’s session. Please use your chat window to forward your questions. 

 

[pause]

 

Our instructor today is Oracle Certified Master John Watson, SkillBuilders Director of Oracle Database Services. 

 

John is the author of the Oracle Press Exam guide and holds eight Oracle certification including Exadata Implementation Specialist. John is in the 12c beta program and is now writing the Oracle Press 12c Exam Guide. 

 

[pause]

 

Besides authoring and teaching, John provides database administration support to SkillBuilders customers both our project works such as RAC and Data Guard implementations and helps cover issues for customers enrolled in our 24/7 support program. John is a very, very busy person. 

 

[pause]

 

Now I’d like to turn it over to John for the agenda and today’s lesson. Welcome, John.

 

 

Copyright SkillBuilders.com 2017

×
Transcript

Tutorial Agenda

Oracle Instance Recovery Tutorial

 

Module 2 – Tutorial Agenda

 

>> John:  Thank you, Dave. Good morning or good afternoon – depending on your time zone – to all of you. 

 

What I want to cover in our short lecture today is, in response to popular demand, we’re going back to basics once again. I want to cover some of the database fundamentals that I find is sometimes skipped over when people first study Oracle. Of course, if you study Oracle with us we don’t skip over anything. 

 

It’s possible for some of you this material – because this is a bit of a back to basics – is largely revision, but I hope that nonetheless you will get something out of it. I have found from years consulting in the field and a bit of time in the classroom that many DBAs, perfectly competent DBAs, fully competent production DBAs have perhaps never fully investigated what actually happens in memory and on disc when you execute DML. 

 

[pause]

 

So what I intend to do is run through firstly the requirements of a relational database management system as regards to data corruption. Basically, that’s what we call the ACID test and it’s quite simply the relational database is not allowed to corrupt any data under any circumstances. We’ll explain what’s going on and of course demonstrate as well how Oracle has actually implemented this. 

 

Understanding this mechanism is crucial for any DBA. But in particular I want to present it in a manner that perhaps will help you explain it to others. Why? So that you can motivate the investments in Oracle technology that your organization needs. 

 

Oracle is expensive. I’m in love with the technology, no question about that, but we can’t deny that it is not cheap and I find that I often need to explain to users why it’s essential, what they’re getting for their money. There is, above all, this one reason. In a properly configured database, it is absolutely impossible to lose or corrupt data provided the database is properly configured and managed. 

 

[pause]

 

What I want to run through today – I’ll prove – what is going on in the background to implement this and prove that the recovery mechanism in the event of any corruption is simply unavoidable.

 

Copyright SkillBuilders.com 2017

×
Transcript

The Database ACID Test

Oracle Instance Recovery Tutorial

 

Module 3 – The Database ACID Test

 

>> John:  First, the rules to which a relational database must conform. There’s one overreaching rule, the database must not permit any damage to your data. No matter what happens it has to pass what they call the ACID test. 

 

Very briefly, I don’t want to spend more than a couple of minutes on this, because I’m sure all of you would’ve been in college much more recently than an old man like me and can possibly describe these things better. But just to make sure we are all on the same page of the manual as it were, I want to quickly to run through the ACID test. 

 

ACID. A stands for atomicity. 

 

[pause]

 

One definition – a transaction must complete or fail in its entirety. It doesn’t matter whether the transaction is updating, one column and one row or updating a thousand rows and a hundred tables, the entire transaction should complete or fail. 

 

[pause]

 

What is the transaction? The business [01:07 inaudible] will have defined the transaction structure and the database must guarantee that it cannot be broken no matter what happens. What could happen? Power cut, hardware failure, anything that can cause an instance crash. The database must guarantee that incomplete transactions are never seen and complete transactions are always seen. 

 

Second point, C for consistency. The database must guarantee that the data within it is always valid, valid in terms of business rules. At no point can there be anything visible in the database that breaks the integrity rules. Those rules and the integrity checks define declaratively by the DBA is a way they’re implementing the business rules within your organization. 

 

[pause]

 

Then I for isolation. Any number of definitions of isolation, I’ll just pick up one or two of them here. The easiest – a transaction must be hidden until it’s complete. At which point, the transaction must be published instantaneously to all of the sessions. But until it’s complete, the changes must be isolated to the one session that’s seeing them. 

 

[pause]

 

A follow on from this, a query must return the result set consistent with the committed state of the database when the query started. There are variations on this, but that’s Oracle’s definition. Recommitted and freeze the database as of the time the query started. 

 

[pause]

 

Finally, D for durability. The database is not allowed to lose even one row of committed data. If it does, if your database can lose a row of committed data then you’re using a wrong database, perhaps even the wrong DBA. But that’s the rules of relational database. It must never lose data, it must never damage data. 

 

[pause]

 

How do we implement these rules? Constraints of course are relational integrity rules. But the overall consideration, if it doesn’t conform to the ACID test then it’s simply isn’t a database.

 

Copyright SkillBuilders.com 2017

×
Transcript

Physical and Logical Corruption

Oracle Instance Recovery Tutorial

 

Module 4 – Physical and Logical Corruption

 

>> Dave:  What about corruption caused by physical damage?

 

>> John:  Corruption caused by physical damage, that’s out of the DBA’s and Oracle’s control. That physical damage, we can guarantee zero loss of data provided the database is properly configured in terms of archivelog mode backup, Data Guard even, but the recovery from physical damage is not going to be automatic. So we guarantee it, but it isn’t automatic.

 

[pause]

 

>> Dave:  Guaranteed as long as we’ve done proper administration with backups?

 

>> John:  And if we haven’t, you fire your DBA. That’s usually the management’s immediate reaction to an Oracle database losing data.

 

[pause]

 

>> Dave:  Very good. One more question. What about logical corruption or bad block formats?

 

>> John:  Logical corruption again is out of Oracle’s control. But from release 11g we can in fact repair logical corruption automatically but only if you are in the Data Guard’s environment with active Data Guard configured. That’s really a bit out of scope for the database. It’s damaged beyond the database’s control but we can in fact repair that automatically through the active Data Guard mechanism. If a logically corrupted block is detected in the primary database, it will be restored and recovered using a block from the standby database and the other way around as well.

 

Copyright SkillBuilders.com 2017

×
Transcript

Demonstration – What Does Oracle do when DML is Executed?

Oracle Instance Recovery Tutorial

 

Module 5 – Demo – What Oracle Database does when DML is Executed

 

>> John:  I want to run through what actually happens when we execute DML, what happens on disc, what happens in memory?

 

There’s some assumed knowledge here by the way. I assume you are all database administrators. This is some assumed knowledge. What I think I’ll do at this stage is demonstrate what happens first and then return to the slide.

 

[pause]

 

This is actually standard 11g 11203 database by the way. Nothing special about it at all. I’ll connect to my database as user Scott and I’m going to create a little tablespace. Create tablespace. Now I have a small tablespace. 

 

[pause]

 

Datafile.

 

[pause]

 

I’ll call the datafile

 

[pause]

 

Small1.dbf*size 128k. It can’t get much smaller than that.

 

You’ll see why I’m creating a tablespace specifically for this demonstration in a second. Then within that tablespace, I’ll create a very simple table.

 

[pause]

 

A very simple table t1 with one column c1 of varchar2 type.

 

[pause]

 

Now, insert a row into the table. Insert into t1 values and I’ll insert myself.

 

[pause]

 

I’ll even commit myself. So I am now committed. But what’s actually happened on disc? Let’s take a look.

 

[pause]

 

There’s my datafile small1.dbf. I’ve inserted myself and I’ve committed myself – is there anything on disc at this stage? I’ll use the strings utility to extract just the text strings from a binary file and I’ll string small1.dbf and grep out john. 

 

[pause]

 

It’s not there. That’s the first example of data corruption. I’ve got a committed transaction and it does not exist in the database. It simply isn’t there.

 

[pause]

 

Right now my database is missing a committed transaction. That’s straightforward corruption. But it gets worse. I’ll insert another row. I shall insert a row. In this case, I’ll put in Susan. Susan can join my database. Nice.

 

[pause]

 

Uncommitted transaction. I’ve got a committed transaction – apparently it’s not in the database – and uncommitted transaction as well. Is that in the database? It certainly shouldn’t be. You see if Susan exists in the datafile and sure enough she doesn’t. But what if Oracle decides to write some data to disc? I’ll go through the algorithm with you shortly.

 

Let’s just assume – I’m going to force Oracle to write to disc at this point. Alter system, checkpoint. Force a writer buffers to disc. What’s happened now? Susan is there.

 

[pause]

 

She exists in the datafile. This is bad a corruption as you can get. You think, “What’s happening?” My committed transaction that inserted john was not on disc. My uncommitted transaction that inserted Susan is in the database. This is a pretty serious state of affairs. Is it a problem? SQL*Plus disconnects from another session.    

 

[pause]

 

If I select * from t1, everything is fine. I’m getting proper transaction isolation. My transaction has inserted John. We don’t know how durable it is, but apparently just pretty durable. It’s there. It’s visible even though it is not in the datafiles and my uncommitted transaction is insert Susan is totally isolated so nobody else can see her. Of course, from the session that did the insert that session is of course allowed to see the uncommitted transaction.

 

[pause]

 

Take a look at the next stage. What has happened within the redo log? I’ll go through the redo log contents and then we’ll see how this is being implemented.

 

Select * from v$log

 

I want to see which is my current log file group. My current log file group is group 1. Group 1 consists of that file there, redo01.log. Let us see what there is in redo01.log. 

 

[pause]

 

I’ll grep out John. Does he exist?

 

[pause]

 

Yes, he does. So the change vector that inserted John does appear to be in there.

 

What about the change vector that inserted Susan?

 

[pause]

 

She’s in there too.

 

Maybe you can make things a bit more complicated. Build up a bit more of an idea of what’s happening on disc and then run through the theory. 

 

[pause]

 

Let’s say in this session, I shall update t1 set c1 =. I’ll put Dave in my database instead of me. C1 = john. I do that update.

 

[pause]

 

Uncommitted.

 

[pause]

 

What do we now see in the redo logs, things get a bit more complicated. 

 

[pause]

 

John has two entries now in the redo string. One will be the entry when he was inserted into the table. The other will come from the undo segment when I updated the row, which we’ll come to in just a second. Of course, Dave is in there too.

 

[pause]

 

What’s on disc? Right now, if we look at John as the person who exists in the datafile.

 

[pause]

 

That’s a checkpoint to force a disc write. Alter system checkpoints.

 

[pause]

 

What’s in the datafile now?

 

[pause]

 

John’s gone. He’s no longer in the datafile, but Dave is. Now I have my uncommitted update that’s apparently wiped John out from the database. 

 

[pause]

 

What’s going to happen at this point if we have a problem and we grep the database? 

 

[pause]

 

What I’ve demonstrated, we have the log writer, we have the database writer. The log writer writes on a very, very aggressive algorithm. Change vectors are written to the online log in very nearly real time. When you say “commit,” it is real time. As you’ve seen, pretty much as soon as I did any change, the change vector immediately went to the online log – immediately.

 

So I’ll just commit that transaction, stop any more confusion. Do another update.

 

[pause]

 

I’ll change “Dave” to “Lynn.”

 

[pause]

 

That’s uncommitted.

 

[pause]

 

We get this ready in advance.

 

[pause]

 

Grep Lynn. No mention of course. Do my change. Set c1 = lynn where c1 = dave.

 

[pause]

 

In virtually real time, the change vector of the update is there. Alter the change vector to Dave.

 

[pause]

 

He was there as well. Right. What’s going on?

 

Log writer, as demonstrated, writes in a very aggressive algorithm. Your change vectors go to the log file in nearly real time. When you say commit, believe me, it is real time. The session hangs when you say commit until the vectors have gone to the online log.

 

[pause]

 

Note the online log change vectors are both committed and uncommitted changes.

 

[pause]

 

Database writer by contrast, also as demonstrated, writes in a very lazy algorithm. I’ve changed a row to Lynn. But those Lynn exist in the database, not at all.

 

[pause]

 

Database writer writes it on a lazy algorithm as little as possible, as rarely as possible. Why? Because disc are always back to performance. Blocks are written only if they conform to two criteria. They’ve been changed – there’s no point in writing data that hasn’t been changed – and they’re idle, i.e. no one is using them. 

 

This choice of which blocks to write is not related to commit. End result? Your database is always corrupted as demonstrated. The datafiles on disc may be storing uncommitted changes. They’ve been missing committed changes.

 

[pause]

 

Does this matter? No. It doesn’t, unless the instance crashes.     

 

Copyright SkillBuilders.com 2017

×
Transcript

Commit and the Log File Sync Wait Event

Oracle Instance Recovery Tutorial

 

Module 6 – Commit and the Log File Sync Wait Event

 

>> Dave:  Isn’t the commit the log file sync wait event and what can we do about that? Apparently one of our students just had a problem with that.

 

>> John:  Yes. The commit is exactly the log file sync wait event. When we issue the commit – it’s actually on the slide right here. You’re a bit ahead of where you are, about three minutes, 30 seconds. 

 

When does log writer write? On the commit and other things, when you issue a commit your session hangs until the log writer has written all changed vectors in the buffer to disc. That is reported as the log file sync wait event as well as the ultimate bottlenecks in the Oracle system. If your database is hanging on that event, what can you do? Very, very little except perhaps there may be a couple of parameters you can fiddle around with. But in general, if that really is what’s bottlenecking your system, your only real option is to go to RAC because in the RAC environment multiple log writers multiple log buffers multiple threads of redo.

 

 

Copyright SkillBuilders.com 2017

×
Transcript

When Why and How Does Oracle Write to Disk

Oracle Instance Recovery Tutorial

 

Module 7 – When Why and How Does Oracle Write to Disk

 

>> John:  When the log writer write, as discussed, on commit. That’s to enforce transaction durability and that’s a straightforwardly Oracle’s definition of a committed transaction. The change vectors needed to recreate the transaction or to roll it back are in the online redo log. 

 

As we’ve seen, we get both change vectors for undo and redo do go to the online log. But apart from that, the log writer will also write until the log buffer is 1/3 full and that’s for performance reasons. That will force the log writer to string change vectors to disc in very nearly real time anyway. As we’ve seen, it’s virtually is real time but that would be another trigger to force it to write to disc. 

 

Most interestingly, it will also write the entire log buffer committed on uncommitted changes just before database writer writes anything. Why is that? It’s so that we can guarantee that if database writer decides to write an uncommitted transaction, log writer will always have the change vectors needed to rebuild the undo segments that can roll back that transaction. I will demonstrate and show in the next slide. 

 

[pause]

 

So when does database writer writes? When we need to free up space in the buffer cache every 3 seconds. But remember for these reasons, it won’t write much. You might have 200,000 changed buffers in memory and database writer might write a few hundred of them. It writes as little as possible as rarely as possible. Why? There’s no point in writing data that’s really busy. 

 

[pause]

 

And a checkpoint is the way I forcing it of course. All the system checkpoint and that writes all buffers to disc. There are some other reasons of course, but that will do for now.

 

Copyright SkillBuilders.com 2017

×
Transcript

What About Checkpoints?

Oracle Instance Recovery Tutorial

 

Module 8 – What About Checkpoints

 

>> Dave:  What happens when automatic checkpoint occurs? Automatic checkpoint happens every 3 seconds I think?

 

>> John:  No, no, no, no, no, no, no. The full checkpoint – I should’ve been a lot tighter in my terminology here. Full checkpoints never occur unless you ask for them with alter system checkpoints or clean shutdown. When you do a shutdown immediate or shutdown normal, it will get a full checkpoint. My terminology was loose here. 

 

[pause]

 

Partial checkpoints occur automatically on demand. For instance, you’ll get datafile checkpoints whenever you put a tablespace in the backup mode or whenever RMAN begins to back up a data file. 

 

There were a number of reasons for partial checkpoints but the other reasons of 3 seconds or space needed in the buffer cache, that’s only going to be writing a very few buffers. Some people refers – this is the incremental checkpoint. It pushes the checkpoint position forward slowly.

 

Copyright SkillBuilders.com 2017

×
Transcript

Discussion about COMMIT NOWAIT

Oracle Instance Recovery Tutorial

 

Module 9 – Commit Nowait

 

>> Dave:  What is this business about commit nowait?

 

>> John:  Alright. That’s one of the few bits of tuning you can do to tune your redo generation. There are three parameters in fact, two of which are now deprecated. We have commit logging, commit wait and commit write. 

 

The default alter system – I’ll do it at the session level actually. Alter session set commit_logging.  I’m working from memory here, but the default would be immediate,wait, if I remember correctly. 

 

[pause]

 

Nope. Let me try commit_wait or commit_write. That’s the one. Right. 

 

[pause]

 

This is the default situation. Commit_write = ‘immediate,wait’. What this means is that when you issue a commit, log writer immediately flushes the buffer to disc and your session waits. That’s what I was describing, the log file sync wait event. That’s the default situation. 

 

[pause]

 

If you are feeling very brave, you can go to the other extreme. We can set that to ‘batch,nowait’. 

 

[pause]

 

This may improve your performance of redo generation dramatically. What does that mean “batch”? Rather than forcing a write to disc every commit, if you’re committing thousands of transactions a second, that’s an awful lot of triggers on log writer, we batch them up and write them when we get around to it. 

 

Nowait means your session doesn’t hang. You issue a commit and just carry on and hope the log writer will get it to disc, get the change vectors to disc appropriately. If you go to this, you may find that your log file sync disappear, you hit the database performance flies but you’re taking a risk because it is now possible for you to commit transactions and then if something goes wrong before the log file write is completed, you’re going to have corruptions – well, it’s a corrupted database – you would have lost your work which is breaking all the rules. 

 

The other two parameters commit_write has in fact now being replaced with commit_wait and commit_logging, and this divide them up into two. We can commit_write having batch nowait, we set commit_wait to wait or nowait and commit_logging to batch or immediate. Dangerous parameters. Now the default is the guarantee of no corruption, no lost data.

 

[pause]

 

>> Dave:  I used that with great success at a customer side who had a repeatable load process that was doing too many commits. So if it failed, if there was a problem, we simply could rerun the job but that was a good scenario I think for that.

 

>> John:  That will be one of the few places where we would definitely recommend the situation. You got a batch process that’s generating vast amounts of redo and the redo generation is slowing things down, but if anything were to go wrong you can repeat the work. That would be a good case for using it. You’ll be a brave man who would set it up in standard say OLTP environment and it would have to be agreed and the implications discussed with the customer first.

 

Copyright SkillBuilders.com 2017

×
Transcript

Lecture – What Oracle Database does when DML is Executed

Oracle Instance Recovery Tutorial

 

Module 10 – What Really Happens when DML is Executed

 

>> John:  What I’ve been describing, the difference in the instance and the database. The instance has real time data but the datafiles may be either ahead of or behind of reality. Remember. Why is that? Because the database writer will not write data to disc and this has got a good reason to. Commit is not considered a good reason. It is a good reason for the log writer for the change vectors, but not for the database update.

 

The datafiles may well be missing incomplete transactions, be out of date. They may be storing incomplete transactions being ahead of reality. Remember you’ve got the case of rollback and that means the rollback will have to update things on disc as well. 

 

[pause]

 

The online log has the change vectors – committed or not – including change vectors applied to undo blocks. 

 

[pause]

 

I’ve been attempting to draw animations. I’ll try to give a graphical description of how this actually works. In this series of slides, it’s not really a proper animation here. What I’m trying to draw at this point – up here I’ve got a system global area. This is my instance. There are two major structures – database buffer cache, redo log buffer. I’m not bothered with the shared pool at the moment.

 

[pause]

 

Down on disc, I’ve got the database. The database consists of the online redo log files and my data files. The database, the online log. I put in just two background processes – the database writer (of which you might have several) and the log writer. These are the two background processes that connect the instance and memory up here with the database down here on disc.

 

Current situation, everything is pretty static. I’ve got a datafile with tables and datafile with undo segments. I’m interested in just two blocks of the datafile. 

 

As a block containing a record for Susan – it says salary is 40 and a block containing a record for John. It says his salary is 10.

 

[pause]

 

In my undo tablespace datafile, I’ve got a couple of blocks that are available for use. I put them down here as being completely empty.

 

[pause]

 

What’s in the buffer cache? Nothing at all.

 

[pause]

 

That’s my starting point for talking about what happens on disc and in memory. I demonstrated it, now we cover it in theory.

 

First step, Mr. Blue issues a DML statement “Update emp set sal=20 where ename=john.” That’s the statement his user process generates and turns to his server process for execution. Remember, grant server splits use a process such as SQL*Plus generates, SQL server process executes it.

 

[pause]

 

That’s the statement we send to the server process. How do we then execute it?

 

[pause]

 

I’m not bothered with parting. The first stage of execution, we write to the log buffer the change vectors that we are going to apply to the buffer cache. We are going to apply two change vectors.

 

[pause]

 

We are going to update John’s row to 20. We’re going to write the old version of his salary to the undo segment.

 

How do we do that? We put the change vectors in the log buffer and we copy the blocks into memory. We copy the block that says John’s salary is 10. It’s got John’s right up there. We find an available block, I’ll undo. Let me copy that up.

 

[pause]

 

What have we achieved so far? We’ve written the change vectors to the log buffer and we’ve got the data in memory. Next stage, we do the updates. 

 

[pause]

 

We write the change to John’s row. He’s now on salary at 20. And we write the old version of his salary to the undo segment.

 

[pause]

 

What’s on disc? Nothing at this point. Uncommitted transaction. Of course, if Mr. Blue queries the emp table, he’ll see John’s salary is 20. If anybody else queries the emp table, Oracle constructs a read-consistent image using the redo saying that John’s salary is 10. Transaction isolation.

 

[pause]

 

Make things a bit more complicated now and we’ll start a second transaction.

 

[pause]

 

Mr. Red logs on. Mr. Red says, “Update emp set sal=50 where ename=susan.” 

 

[pause]

 

Same process. We find Susan’s block on disc. Susan’s salary is 40. Copy it up. We find a block of undo on disc. We copy it up.

 

[pause]

 

We write the change vectors to the log buffer. We’re going to update the table to 50. We’re going to update the undo segment to say Susan’s salary=40. Then we make the changes in memory.

 

Right now, I have two uncommitted transactions. What have I got on disc? Nothing. Remember, log writer does in fact write change vectors in very nearly real time. Almost inevitably, these four change vectors will by now be in the online log almost inevitably. But let’s take it as slowly as we can.

 

[pause]

 

What will happen next? What if Mr. Blue commits his transaction? 

 

[pause]

 

He issues the commit. At that point, his session is going to hang. This happens.

 

[pause]

 

We write the commit record to the log buffer. We fill the log buffer sequentially. It’s actually a circular buffer but I couldn’t work out how to draw a circular buffer so I’ve drawn just filling from the bottom up.

 

We put the commit record in the log buffer. My session is hanging while log writer wakes up and wham the entire log buffer to disc. At this point in the online log, we have the change vector applied to John’s row. The change vector applied to undo segment that lets us roll that back should we need to. We’ve got the change vector applied to Susan’s row and the change vector applied to the undo segment that would let us roll that back to. And we’ve got the fact that the blue transaction is committed. So the online log has both committed and uncommitted transactions as I demonstrated.

 

What’s in the database? The datafiles, no change.

 

[pause]

 

Does that matter? No. Because in memory, we’ve got the real time information. We can construct read-consistent images necessary and we know what’s committed and what isn’t. But maybe we then have a disaster.   

 

[pause]

 

Before we have the disaster, take it a step further. Sorry, I was at this stage. I’ve just committed.

 

[pause]

 

Next step, database writer for whatever reason, remember every three seconds you need to free some space in the buffer cache, whatever. Database writer wakes up and decides to copy a block to disc. Which block does it copy to disc?

 

[pause]

 

It has decided to copy Susan’s block to disc. In the previous slide, that was the situation. This slide, boom. Database writer has written the uncommitted change to disc. This is my ultimately corrupted database. It is not storing John’s committed change. It is storing Susan’s uncommitted change.

 

[pause]

 

No problem because in memory everything is fine.

 

 

Copyright SkillBuilders.com 2017

×
Transcript

The Oracle Instant Crash Recovery Mechanism

Oracle Instance Recovery Tutorial

 

Module 11 – The Oracle Instant Crash Recovery Mechanism

 

>> John:  What if we lose memory? 

 

[pause]

 

We have a database crash. Bang. The entire instance is destroyed. Why would that happen? Shut down/abort, someone pulls the plug out the back of the machine, ORA-600 causes everything to die. Whatever happens, an instance crash, instance failure.

 

[pause]

 

There we are with our corrupted database. It’s got all sorts of inaccurate information in it right now.

 

[pause]

 

What do we do? What we know what we do – you issue a startup command. Issue the startup command, build the instance in memory. Startup no mounts. No problem. Build the memory structures. Start the background processes.

 

[pause]

 

But then we alter database mounts. There’s also database mount my time, the SMON process is going to look at the online log files and the datafiles. It will be immediately apparent that the instance had crashed because of course all it had is a lot of sync, the SCN’s all wrong. That will trigger the instance recovery mechanism.

 

[pause]

 

How does instance recovery work? The SMON (which I didn’t actually include on the slide) goes to the online log and starts replaying change vectors. It finds this first change vector that refers to a block of table data. It copies that table data that says John’s salary is 10. It copies that block into memory and applies the change.

 

[pause]

 

So we end up with this situation.

 

[pause]

 

First change vector, copy John 10 into memory, apply the change. Next change vector refers to undo block. Copy the empty block of undo memory, block into memory, apply the change.

 

Next change vector as we process this numerically is another update to the table. We actually don’t need to do that. We can copy into memory and have a look if we want, but that change vector was already on disc. This change vector, however, was not. It’s applied to the undo segment. Nothing there, so copy the block into memory. Apply the change.

 

This is what we call the instance recovery mechanism. We are rebuilding the SGA, the state it was in before the crash. But it gets better because the instance recovery process doesn’t just rebuild the SGA, it also corrects the situation on the database.

 

[pause]

 

As it rebuilds the SGA, it checkpoints the relevant blocks. So we do through recovery mechanism not only rebuild the instance, but we also synchronize the instance with the database. The recovery mechanism done by SMON means that we now have John’s committed change and his undo on disc together of course with Susan’s uncommitted change and the undo that refers to the red transaction.

 

That is the instance recovery mechanism. Perhaps it’s that question said earlier – it’s a bit badly named. We’re recovering the instance the state it was in before the crash and at the same time synchronizing the instance with the database by simulating a checkpoint. That is before the recovery that occurs after instance failure.

 

I’ve gone ahead of my slides at this point I’m afraid. The crash recovery mechanism following the crash the database has corrupted, the forward recovery then occurs between mount and open modes. We rebuild the buffer cache to the state it was in before the crash. Both committed and uncommitted work is replayed including the undo blocks.

 

[pause]

 

Then the final stage rollback. Having completed this work we can open the database. After opening the database, we can proceed to the next stage which is where we roll back the uncommitted transaction. 

 

Previous stage, we have recovered the committed blue transaction and the uncommitted red transaction. We then open the database and proceed to this state. We’ll return Susan’s row to the original level, the original point it was at, the 40. At that point, the red transaction has been reversed and is gone forever, which is correct. It was never committed. The definition of uncommitted transaction, transaction isolation has got to be reversed and backed out. 

 

That is your instance recovery mechanism. Forward recovery, totally automatic. Rollback, also totally automatic. You wouldn’t want to stop it, but it is absolutely impossible to stop it even if you want it to.

 

>> Dave:  How long does this process take?

 

>> John:  You’ll see it in the alert log. You’ll see in the alert log between the alter database mounts, the alter database open, you’ll see the timings for the pass through the redo log. 

 

There are in fact two passes through the redo log. I’ve simplified things a bit. One pass is to identify which change vectors need to be applied and the second pass actually applies them. 

 

You’ll see the exact timings. The delay can be significant. If you have a database with 100 gigabyte SGA or whatever, and there were an extraordinary number of dirty buffers at the time of the crash, forward recovery could take a significant number of minutes.

 

It is possible to tune it to a certain extent. But look at the alert log. Just go to any database you want, the one you run on your own PC. Do some work, shut down/abort, start up, and the alert log will show you how long it takes.

 

You can also predict how long it takes by looking at the view v$ instance recovery. This will give you an idea of if the database crashed right now, how much redo would we have to replay? How many IOs on datafiles would be necessary? And about how long would it take to recover?

 

[pause]

 

In this case, I imagine it’s going to be instantaneous. 

 

[pause]

 

Not quite. It would take 9 seconds.

 

You can get an estimate. But I would advise you to test it to be sure.

 

>> Dave:  Is there a way to start up without going through the instance recovery process?

 

>> John:  Absolutely impossible. Absolutely impossible. That takes us on to this point.    

 

 

Copyright SkillBuilders.com 2017

×
Transcript

Principles to Remember

Oracle Instance Recovery Tutorial

 

Module 12 – Principles to Remember

 

>> John:  Undo generates redo, theyíre not opposites. The redo protect all block changes. 

 

[pause]

 

Block changes to table blocks, index blocks, undo blocks, all of them. 

 

Log writer writing algorithm, remember, itís optimized for safety. Unless you play games with that parameter, a transaction is not committed until the change vectors needed to rebuild it and the change vectors to reverse it in the undo and the redo log. Database writer algorithm optimize the performance – totally unrelated to commit processing. Thatís how we guarantee the ACID test.

 

Copyright SkillBuilders.com 2017

×
Transcript

Oracle – The Incorruptible Database

Oracle Instance Recovery Tutorial

 

Module 13 – Oracle – The Incorruptible Database

 

>> John:  The result, incorruptible database. What I’ve tried to explain is how the undo-redo mechanism is unbreakable and you can have absolute competence in it. Go through the examples such as I’ve done, sketch it out in your notebook, test what’s actually visible in the various physical files the way you saw me doing it, and you’ll demonstrate to yourself that no matter what combination of circumstances you will never end up with a committed transaction that is lost on uncommitted transaction that survives provided your redo log is available. 

 

[pause]

 

What if it isn’t available? That takes us back to that question earlier. If the instance recovery  fails, the reason it fails is that your redo log is damaged. In that circumstance, you can’t open the database. What does that mean? You got to protect your redo log and that is through the multiplexing mechanism that you must enable to follow your company standards. 

 

These standards are vital. If your company standards say the redo log must consist of two copies of each log file group on two separate devices and both those devices are damaged, you’ve lost your redo log and you lost data. But you as DBA can then point to the standards document and say, “Hey, that’s what was agreed I would do.” 

 

[pause]

 

You must have standards. Does your standard say one copy, two copy, four copies? I’m not going to say how many you should have. But you need documented standards for how the redo log should be protected and that will protect not only the database but it will also protect you.

 

[pause]

 

 The end result of that is that provided you meet that service level agreement, you and your users can have absolute confidence in Oracle.

 

Copyright SkillBuilders.com 2017

×
Transcript

Next Steps

Oracle Instance Recovery Tutorial

 

Module 14 – Next Steps

 

>> John:  What do you need to do next? Assess what data loss is acceptable, assess what downtime is acceptable, and discuss it with the users. How important is this data? Does downtime matter? Does data loss matter? 

 

Get it all documented. Get all documented as to how you’re protecting the redo log in particular. And if absolutely zero data loss and zero downtime are the requirements then we move on to Data Guard or RAC, which I think we’ve dealt with in other seminars.

 

Copyright SkillBuilders.com 2017

×
Free Online Registration Required

The tutorial session you want to view requires your registering with us.

It’s fast and easy, and totally FREE.

And best of all, once you are registered, you’ll also have access to all the other 100’s of FREE Video Tutorials we offer!

 

×
Podcast
×