10/16/2008 Readings: * Same as last lecture and lecture 10. =Recovery= From last time - most DBs in practice are - NO FORCE - don't force data pages to disk, just force log to disk. (might have to redo changes) - STEAL - possibly write uncommitted dirty data pages to disk. (might have to undo uncommitted transactions) timeline of events: T1 |--------------wA---------------| commit T2 |-------------wB--------------|abort T3|------------wC--------------...crash is written to log: START T1 START T2 wA T1 START T3 wB T2 COMMIT T1 wC T3 ABORT T2 So when system crashed before T3 finishes, we have the log fully written as above, no memory, and an indeterminate state for wB/wC (they might only be partially written). What to do: - walk through log to see which transactions started - see which transactions committed, aborted, and just stopped without commit/abort - winners: T1 - losers: T2, T3 - redo all winners to make sure data pages are up-to-date - undo all losers to make sure some of their dirty data pages didn't make it to disk =ARIES - recovery protocol from System R= ARIES paper got a lot of flack for being hard to read, and for not inventing the idea of recovery. The reason it's so fundamental is that it spells out, in detail, all corner cases of implementing such a system. Also discusses: - cheap checkpointing - don't have to read log from very beginning of time, just the last system checkpoint. Previous systems just wrote entire system out to disk, but this was REALLY expensive. - recovering from a crash during recovery - it is possible for machine to crash again after crashing - they handle this - clearly spells out difference between logical and physical REDO/UNDO ==3 Passes over Log== Analysis - determines winners and losers by running forward through log from last checkpoint REDO - redo everything by running forward through log again. Even redo losers. This makes undo much easier, since you know exactly the state of the system, and can reuse undo/abort logic. UNDO - undo losers by running backward through log. ==Log Records== A log record has: - Record type (described in next section) - LSN (log sequence number) - TID (transaction ID) - UNDO image (image of data before this step) - REDO image (image of data after this step) - prevLSN - last LSN to touch this Xaction We'll discuss later why this is, but: - UNDO image is logical - REDO image is physical Each page keeps a pageLSN - the LSN of the log record to last touch it. ==Four Types of Log Records== BEGIN (SOT - start of transaction) - has no undo/redo END (EOT - end of transaction) - has no undo/redo, stores abort/commit UPDATE (UP) - has undo/redo CHECKPOINT (CP) - makes recovery faster ==Normal Operation of ARIES== When ARIES is just running in normal system state, it keeps in memory: - Xaction (transaction) table - list of active Xactions, plus lastLSN (lastLSN = most recent log record to be put into log for each Xaction) - Dirty page table - list of dirty pages, each page with a recoveryLSN (recoveryLSN = first LSN to have dirtied each page) ==Example== T1 |----wA---wB-----------wC-------------------| commit T2 |----wD-------------wA--------| commit T3 |------------------------------wB---------------------wE-...crash ^ ^ `---Checkpoint `----Flush A,B,C Log records LSN type tid prevLSN data 1 SOT 1 - - 2 UP 1 1 A 3 UP 1 2 B 4 CP - - - 5 SOT 3 - - 6 UP 1 3 C 7 SOT 2 - - 8 UP 2 7 D 9 EOT 1 6 - 10 UP 3 5 B 11 UP 2 8 A Xaction table at the end of 13 steps: TID lastLSN 1 1 (deleted due to commit) 1 2 (deleted due to commit) 1 3 (deleted due to commit) ... 3 13 Xaction table as of CP: TID lastLSN 1 3 Dirty page table at end of 13 steps: pg recoveryLSN A 2 (deleted due to newer recoveryLSN) B 3 (deleted due to newer recoveryLSN) C 6 (deleted due to newer recoveryLSN) A 11 B 10 D 8 E 3 Dirty page table as of CP: pg recoveryLSN A 2 B 3 After crash, read in location of last CP from some well-known place on disk. Read in Xaction table and dirty page table from disk. First Pass...Analysis Read log from checkpoint, and update Xaction table and dirty page table (DPT) to understand state of affairs. Don't undo or redo anything. Xaction table after we rebuild it from step 5 (after CP) TID lastLSN 3 13 DPT after we rebuild it from step 5 (after CP) pg recoveryLSN A 2 B 3 C 6 D 8 E 13 Now we're rebuilt! Second Pass...REDO Look at minimum of recoveryLSNs in DPT. That is the first log record that we should try redoing from, since it's the earliest modification to any page that might have not made it to the appropriate data page. Now go to the earliest recoveryLSN, and read forward through the log, reapplying all UP log records. Don't update all pages - check each page's pageLSN. If the LSN of the UP record is newer than the pageLSN, then the information in it should be applied to the page. If the LSN of the UP record is less than or equal to the pageLSN, then that change has already been witnessed on this page before the crash! Also, don't update pages not in the dirty page table. If they aren't there, then they were flushed to disk already, even before the commit/abort happened (remember, this system allows STEAL!). Third Pass...UNDO! Run backward through log. For any Xaction still in the Xaction table, look at its lastLSN. Apply the lastLSN closest to the end of the log. Then look at that log entry's prevLSN, which says the next log record for this transaction we have to undo. Keep following the prevLSNs, undoing records in the order they appear globally in the log table. ==Logical/Physical REDO/UNDO== If you crash during recovery, then you might have to REDO an Xaction twice. If that were a logical REDO, you would say something like (insert Tuple X) twice, and would have no way of knowing that you inserted X twice. So all REDOs are physical - they physically modify parts of a page - this type of REDO would be idempotent (repeatable) without causing problems. If we had physical UNDO, then we would keep physical state of DB after each operation runs in INCREASING order of the log. As we apply physical UNDOs, we are applying the records in DECREASING order of the log. So you must use logical UNDOs, since you can then undo what you need to in a physical setting which is different than how you applied it. ==Handling Crashes== If you crash during UNDO, since you did a logical UNDO, you won't know how far through a logical UNDO operation you got. So write a compensation log record (CLR) to the log for each UNDO you perform. So if you UNDO step 13, write LSN 14 with a CLR record that has a prevLSN of 10, so that you can REDO it to rebuild state, and when you UNDO it, you will skip 13 as an LSN.