Failures and Recovery
A
computer system, like any other mechanical or electrical device, is subject to
failure. There is a variety of causes of such failure, including disk crash,
power failure, software error, a fire in the machine room, or even sabotage. In
each of these cases, information may be lost. Therefore, the database system
must take actions in advance to ensure that the atomicity and durability
properties of transactions. An integral part of a database system is a recovery
scheme that is responsible for the restoration of the database to a consistent
state that existed prior to the occurrence of the failure.
Failure Classification
There are various types of failure that may occur in a
system, each of which needs to be dealt with in a different manner. The
simplest type of failure to deal with is one that does not result in the loss
of information in the system. The failures that are more difficult to deal with
are those that result in loss of information.
·
Transaction Failure(local failure): This
failure is caused by bad input and bugs in the application program.
·
System Failure: This failure is caused by
bugs in database management system code, operating system fault and hardware
failure also. This failures occur less frequently and it results in loss of
main memory.
·
Media failure: This failure is caused by software errors,
hardware errors, physical errors, head crashes etc. This failures results in
loss of parts of secondary storage.
·
Site Failures: This failure occurs due to failure of local CPU and results in
a system crashing.
·
Network Failures: This failure may occur
in communication links.
·
Subotage Failures: This failure occurs as result of
intentional corruption of destruction of data, hardware or software facilities.
Database Recovery
Database
recovery is the process of restoring the database to a correct state in the
event of a failure. DBMS includes recovery facilities such as a backup
mechanism, logging facilities and checkpoint facilities to enable updates to
database that are in progress to be permanent.
System recovery
The system must be prepared to recover, not only from
purely local failures such as the occurrence of an overflow condition within
individual transaction, but also from “global” failures such as a power outage.
A local failure, by definition, affects only the transaction in which the
failure has actually occurred. A global failure, by contrast, affects all of
the transactions in progress at the time of the failure, and hence has
significant system-wide implications. Such failures fall into two broad
categories:
n
System failures (e.g., power outage),
which affect all transactions currently in progress but do not physically
damage the database. A system failure is sometimes called a softcrash.
n Media
failures (e.g., head crash on the disk); which do cause damage to the
database, or to some portion of it, and affect at least those transactions
currently using the portion. A media failure is sometimes called a hard crash.
The key point regarding system failure is that the contents
of main memory are lost (in particular, the database buffers are lost). The
precise state of any transaction that was in progress at the time of the
failure is therefore no longer known; such a transaction can therefore never be
successfully completed, and so must be undone –i.e. Rolled back when the system
restarts.
Furthermore, it might also be necessary to redo certain transactions at restart time
that did successfully complete prior to the crash but did not manage to get the
their updates transferred from the database buffers to the physical database.
Media recovery
A media failure in a failure such as a disk head crash, or
a disk controller failure, in which some portion of the database has been
physically destroyed. Recovery from such a failure basically involves reloading
(or restoring) the database from a backup copy (or dump), and then using the
log-both active and achieve portions, in general-to redo all transactions that
completed since that backup copy was taken. There is no need to undo transactions
that were still in progress at the time of the failure, since by definition all
updates of such transactions have been “undone” (actually lost) anyway.
The need to be able to perform media recovery implies the
need for a dump/restore (or unload/reload) utility. The dump portion of that
utility is used to make backup copies of the database on demand. Such copies
can be kept on tape or other archival storage; it is not necessary that they be
on direct access media. After a media failure, the restore portion of the
utility is sued to recreate the database from a specified backup copy.
Recovery and Atomicity
Consider simplified
banking system and transaction Ti that transfers $50 from account A
to account B, with initial values of A and B being $1000 and $2000,
respectively. Suppose that a system crash has occurred during the execution of
Ti, after output(BA) has taken place, but before output(BB)
was executed, where BA and BB denote the buffer blocks on
which A and B reside. Since the memory contents were lost, we do not know the
fate of the transaction; thus, we could invoke one of two possible recovery
procedures.
·
Reexecute Ti. Thus procedure will result
in the value of A becoming $900, rather than $950. Thus, the system enters an
inconsistent state.
·
Do not reexecute Ti. The current system
state is values of $950 and $2000 for A and B, respectively. Thus, the system
enters an inconsistent state.
In either case, the database is left in an inconsistent
state, and thus this simple recovery scheme does not work. The reason for this
difficulty is that we have modified the database without having assurance that
the transaction will indeed commit. Our goal is to perform either all or no
database modifications made by Ti. However, if Ti
performed multiple database modifications, several output operations may be
required, and a failure may occur after some of these modifications have been
made, but before all of them are made.
Log-Based Recovery
The most widely used structure for recording database
modifications is the log. The log is a sequence of log records, and maintains a
record of all the update activities in the database. There are several types of
log records. An update long record describes a single database write, and has
the following fields:
·
Transaction identifier is the unique
identifier of the transaction that performed the write operation.
·
Data-item identifier is the unique
identifier of the data item written. Typically, it is the location on disk of
the data item.
·
Old value is the value of the data item
prior to the write.
·
New value is the value that the data item
will have after the write.
Other special log records exist to record significant
events during transaction processing, such as the start of a transaction and
the commit or abort of a transaction. We denote the various types of log
records as follows:
·
<Ti start>. Transaction Ti
has started.
·
<Ti, Xj, V1,
V2>. Transaction Ti has performed a write on data item
Xj. Xj had value V1 before the write, and will
have value V2 after the write.
·
<Ti commit>. Transaction Ti
has committed.
·
<Ti abort>. Transaction Ti
has aborted.
Whenever a transaction performs a write, it is essential
that the log record for that write be created before the database is modified.
Once a log record exists, we can output the modification to the database if
that is desirable. Also, we have the ability to undo a modification that has
already been output to the database. We undo it by using the old-value field in
log records.
For log records to be useful for recovery from system and
disk failures, the log must reside in stable storage. It is assumed that every
log record is written to the end of the log on stable storage as soon as it is
created. The log contains a complete record of all database activity. The
recovery schemes uses two
Log
|
Database
|
<T0 start>
|
|
<T0, A, 1000,
950>
|
|
<T0, B, 2000,
2050>
|
|
|
A = 950
|
|
B = 2050
|
<T0 commit>
|
|
<T1 start>
|
|
<T1, C, 700, 600>
|
|
|
C = 600
|
<T1 commit>
|
|
Fig. State of system log and database corresponding to T0
and T1.
recovery procedures:
·
undo(Ti) restores the value of
all data items updated by transaction Ti to the old values.
·
redo(Ti) sets the value of all
data items updated by transaction Ti to the new values.
The set of data items updated Ti and their
respective old and new values can be found in the log.
The undo and redo operations must be idempotent to
guarantee correct behavior even if a failure occurs during the recovery
process.
After a failure has occurred, the recovery scheme consults
the long to determine which transactions need to be redone, and which need to
be undone. This classification of transactions is accomplished as follows:
·
Transaction Ti needs to be undone if
the log contains the record <Ti start>, but does not
contain the record <Ti commit>.
·
Transaction Ti needs to be redone if
the log contains both the record <Ti start> and the
record <Ti commit>.
As an illustration, let us return to out banking example,
with transaction T0 and T1 executed one after the other
in the order T0 followed by T1. Let us suppose that the
system crashes before the completion of the transactions. We shall consider
three cases. The state of the logs for each of these cases is shown Fig.
First, let us assume that the crash occurs just after the
log record for the step
write(B)
of transaction T0 has been written to stable
storage. When the system comes back up, it finds the record <T0
start> in the log, but no corresponding <T0 commit> record.
Thus, transaction T0 must be undone, so an undo(T0) is
<T0
start>
|
<T0
start>
|
<T0
start>
|
<T0
A, 1000, 950>
|
<T0,
A, 1000, 950>
|
<T0,
A, 1000, 950>
|
<T0,
B, 2000, 2050>
|
<T0,
B, 2000, 2050>
|
<T0,
B, 2000, 2050>
|
|
<T0
commit>
|
<T0
commit>
|
|
<T1
start>
|
<T1
start>
|
|
<T1,
C, 700, 600>
|
<T1,
C, 700, 600>
|
|
|
<T1
commit>
|
(a)
|
(b)
|
(c)
|
Figure The same log,
shown at three different times.
performed. As a result, the values in accounts A and B (on
the disk) are restored to $1000 and $2000, respectively.
Next, let us the assume that the crash comes just after
the log record for the step
write(C)
of transaction T1 has been written to stable
storage (As shown in fig.b). When the system comes back up, two recovery
actions need to be taken. The operation undo(T1) must be performed,
since the record <T1 start> appears in the log, but there is
not record (T1 commit>. The operation redo (T0) must
be performed, since the log contains both the record <T0
start> and the record <T0 commit>. At the end of the entire
recovery procedure, the values of accounts A, B, and C are $950, $2050, and
$700, respectively. Note that the undo(T1) operation is performed
before the redo (T0). In this example, the same outcome would result
if the order were reversed.
Finally, let us assume that the crash occurs just after
the log record.
<T1
commit>
has been written to stable storage (Figc). When the system
comes back up, both T0 and T1 need to be redone, since
the records <T0 start> and <T0 commit> appear
in the log as do the records <T1 start> and <T1
commit>. After the recovery procedures redo(T0) and redo(T1)
are performed, the values in accounts A, B, C are $950, $2050 and $600,
respectively.
Checkpoints
When a system failure occurs, we must consult the log to
determine those transactions that need to be redone and those that need to be
undone. In principle, we need to search the entire log to determine this
information. There are two major difficulties with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to
out algorithm, need to be redone have already written their updates into the
database. Although redoing them will cause no harm, it will nevertheless cause
recovery to take longer.
To reduce these types of overhead checkpoint is
introduced. In addition, the system periodically performs checkpoints, which
require the following sequence of actions to take place:
1. Output onto stable storage all log records
currently residing in main memory.
2. Output to the disk all modified buffer
blocks.
3. Output onto stable storage a log record
<checkpoint>.
Transactions are not allowed to perform any update actions,
such as writing to a buffer block or writing a log record, while a checkpoint
is in progress.
The presence of a <checkpoint> record in the log
allows the system to streamline its recovery procedure. Consider a transaction
Ti that committed prior to the checkpoint the <checkpoint>
record. Any database modifications made by Ti must have been written
to the database either prior to the checkpoint or as part of the checkpoint
itself. Thus, at recovery time, there is no need to perform a redo operations
on Ti.
This observation allows us to refine our previous recovery
schemes. (We continue to assume that transactions are run serially). After a
failure has occurred, the recovery scheme examines the log to determine the
most recent transaction Ti that started executing before the most
recent checkpoint took place. It can find such a transaction by searching the
log backward, from the end of the log, until it finds the first
<checkpoint> record (since we are searching backward, the record found is
the final<checkpoint> record in the log); then it continues the search
backward until it finds the next <Ti start> record. This
record identifies a transaction Ti.
Once transaction Ti has been identified, the
redo and undo operations need to be applied to only transaction Ti
and all the transaction Tj that started executing after transaction
Ti. Let us denote these transactions by the set T. The remainder
(earlier part) of the log can be ignored, and can be erased whenever desired.
The exact recovery operations to be performed depend on whether the
immediate-modification technique or the deferred-modification technique is
being used. The recovery operations that are required if the
immediate-modification technique is employed are as follows:
·
For all transactions tK in T that
have no <Tk commit> record in the log, execute undo(T0).
·
For all transactions Tk in T such
that the record <Tk commit> appears in the log, execute redo(Tk).
Obviously, the undo operation does not need to be applied
when the deferred-modification technique is being employed.
As an illustration, consider the set of transactions {T0,
T1 … T100} executed in the order of the subscripts.
Suppose that the most recent checkpoint took place during the execution of
transaction T67. Thus, only transactions T67, T68,
…, T100 need to be considered during the recovery scheme. Each of
them needs to be redone if it has committed; otherwise, it needs to be undone.
Shadow Paging
An alternative to log-based crash-recovery techniques is
shadow paging. Under certain circumstances, shadow paging may require fewer
disk accesses than do the log-based methods. There are, however, disadvantages
to the shadow-paging approach. For example, it is hard to extend shadow paging
to allow multiple transactions to execute concurrently.
As before, the database is partitioned into some number of
fixed-length blocks, which are referred to as pages. The term page is borrowed
form operating systems, since we are using a paging scheme for memory
management. Let us assume that there are n pages, numbered 1 through n. (In
practice, n may be in the hundreds of thousands.) These pages do not need not
to be stored in any particular order on disk. However, there must be a way to
find the ith page of the database for any given i. The first entry
contains a pointer to the first page of the database, the second entry points
to the second page, and so on.
The key idea behind the shadow-paging technique is to
maintain two page tables during the life of a transaction: the current page
table and the shadow page table. When the transaction starts, both page tables
are identical. The shadow page table is never changed over the duration of the
transaction. The current page table may be changed when a transaction performs
a write operation. All input and output operations use the current page table
to locate database pages on disk.
No comments:
Post a Comment