Failures and Recovery

A computer system, like any other mechanical or electrical device, is subject to failure. There is a variety of causes of such failure, including disk crash, power failure, software error, a fire in the machine room, or even sabotage. In each of these cases, information may be lost. Therefore, the database system must take actions in advance to ensure that the atomicity and durability properties of transactions. An integral part of a database system is a recovery scheme that is responsible for the restoration of the database to a consistent state that existed prior to the occurrence of the failure.

Failure Classification

There are various types of failure that may occur in a system, each of which needs to be dealt with in a different manner. The simplest type of failure to deal with is one that does not result in the loss of information in the system. The failures that are more difficult to deal with are those that result in loss of information.

· Transaction Failure(local failure): This failure is caused by bad input and bugs in the application program.

· System Failure: This failure is caused by bugs in database management system code, operating system fault and hardware failure also. This failures occur less frequently and it results in loss of main memory.

· Media failure: This failure is caused by software errors, hardware errors, physical errors, head crashes etc. This failures results in loss of parts of secondary storage.

· Site Failures: This failure occurs due to failure of local CPU and results in a system crashing.

· Network Failures: This failure may occur in communication links.

· Subotage Failures: This failure occurs as result of intentional corruption of destruction of data, hardware or software facilities.

Database Recovery

Database recovery is the process of restoring the database to a correct state in the event of a failure. DBMS includes recovery facilities such as a backup mechanism, logging facilities and checkpoint facilities to enable updates to database that are in progress to be permanent.

System recovery

The system must be prepared to recover, not only from purely local failures such as the occurrence of an overflow condition within individual transaction, but also from “global” failures such as a power outage. A local failure, by definition, affects only the transaction in which the failure has actually occurred. A global failure, by contrast, affects all of the transactions in progress at the time of the failure, and hence has significant system-wide implications. Such failures fall into two broad categories:

n System failures (e.g., power outage), which affect all transactions currently in progress but do not physically damage the database. A system failure is sometimes called a softcrash.

n Media failures (e.g., head crash on the disk); which do cause damage to the database, or to some portion of it, and affect at least those transactions currently using the portion. A media failure is sometimes called a hard crash.

The key point regarding system failure is that the contents of main memory are lost (in particular, the database buffers are lost). The precise state of any transaction that was in progress at the time of the failure is therefore no longer known; such a transaction can therefore never be successfully completed, and so must be undone –i.e. Rolled back when the system restarts.

Furthermore, it might also be necessary to redo certain transactions at restart time that did successfully complete prior to the crash but did not manage to get the their updates transferred from the database buffers to the physical database.

Media recovery

A media failure in a failure such as a disk head crash, or a disk controller failure, in which some portion of the database has been physically destroyed. Recovery from such a failure basically involves reloading (or restoring) the database from a backup copy (or dump), and then using the log-both active and achieve portions, in general-to redo all transactions that completed since that backup copy was taken. There is no need to undo transactions that were still in progress at the time of the failure, since by definition all updates of such transactions have been “undone” (actually lost) anyway.

The need to be able to perform media recovery implies the need for a dump/restore (or unload/reload) utility. The dump portion of that utility is used to make backup copies of the database on demand. Such copies can be kept on tape or other archival storage; it is not necessary that they be on direct access media. After a media failure, the restore portion of the utility is sued to recreate the database from a specified backup copy.

Recovery and Atomicity

Consider simplified banking system and transaction T_i that transfers $50 from account A to account B, with initial values of A and B being $1000 and $2000, respectively. Suppose that a system crash has occurred during the execution of T_i, after output(B_A) has taken place, but before output(B_B) was executed, where B_A and B_B denote the buffer blocks on which A and B reside. Since the memory contents were lost, we do not know the fate of the transaction; thus, we could invoke one of two possible recovery procedures.

· Reexecute Ti. Thus procedure will result in the value of A becoming $900, rather than $950. Thus, the system enters an inconsistent state.

· Do not reexecute Ti. The current system state is values of $950 and $2000 for A and B, respectively. Thus, the system enters an inconsistent state.

In either case, the database is left in an inconsistent state, and thus this simple recovery scheme does not work. The reason for this difficulty is that we have modified the database without having assurance that the transaction will indeed commit. Our goal is to perform either all or no database modifications made by T_i. However, if T_i performed multiple database modifications, several output operations may be required, and a failure may occur after some of these modifications have been made, but before all of them are made.

Log-Based Recovery

The most widely used structure for recording database modifications is the log. The log is a sequence of log records, and maintains a record of all the update activities in the database. There are several types of log records. An update long record describes a single database write, and has the following fields:

· Transaction identifier is the unique identifier of the transaction that performed the write operation.

· Data-item identifier is the unique identifier of the data item written. Typically, it is the location on disk of the data item.

· Old value is the value of the data item prior to the write.

· New value is the value that the data item will have after the write.

Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or abort of a transaction. We denote the various types of log records as follows:

· <T_i start>. Transaction T_i has started.

· <T_i, X_j, V₁, V₂>. Transaction T_i has performed a write on data item X_j. X_j had value V₁ before the write, and will have value V₂ after the write.

· <T_i commit>. Transaction T_i has committed.

· <T_i abort>. Transaction T_i has aborted.

Whenever a transaction performs a write, it is essential that the log record for that write be created before the database is modified. Once a log record exists, we can output the modification to the database if that is desirable. Also, we have the ability to undo a modification that has already been output to the database. We undo it by using the old-value field in log records.

For log records to be useful for recovery from system and disk failures, the log must reside in stable storage. It is assumed that every log record is written to the end of the log on stable storage as soon as it is created. The log contains a complete record of all database activity. The recovery schemes uses two

Log	Database
<T₀ start>
<T₀, A, 1000, 950>
<T₀, B, 2000, 2050>
	A = 950
	B = 2050
<T₀ commit>
<T₁ start>
<T₁, C, 700, 600>
	C = 600
<T₁ commit>

Fig. State of system log and database corresponding to T₀ and T₁.

recovery procedures:

· undo(T_i) restores the value of all data items updated by transaction T_i to the old values.

· redo(T_i) sets the value of all data items updated by transaction T_i to the new values.

The set of data items updated T_i and their respective old and new values can be found in the log.

The undo and redo operations must be idempotent to guarantee correct behavior even if a failure occurs during the recovery process.

After a failure has occurred, the recovery scheme consults the long to determine which transactions need to be redone, and which need to be undone. This classification of transactions is accomplished as follows:

· Transaction T_i needs to be undone if the log contains the record <T_i start>, but does not contain the record <T_i commit>.

· Transaction T_i needs to be redone if the log contains both the record <T_i start> and the record <T_i commit>.

As an illustration, let us return to out banking example, with transaction T₀ and T₁ executed one after the other in the order T₀ followed by T₁. Let us suppose that the system crashes before the completion of the transactions. We shall consider three cases. The state of the logs for each of these cases is shown Fig.

First, let us assume that the crash occurs just after the log record for the step

write(B)

of transaction T₀ has been written to stable storage. When the system comes back up, it finds the record <T₀ start> in the log, but no corresponding <T₀ commit> record. Thus, transaction T₀ must be undone, so an undo(T₀) is

<T₀ start>	<T₀ start>	<T₀ start>
<T₀ A, 1000, 950>	<T₀, A, 1000, 950>	<T₀, A, 1000, 950>
<T₀, B, 2000, 2050>	<T₀, B, 2000, 2050>	<T₀, B, 2000, 2050>
	<T₀ commit>	<T₀ commit>
	<T₁ start>	<T₁ start>
	<T₁, C, 700, 600>	<T₁, C, 700, 600>
		<T₁ commit>
(a)	(b)	(c)

Figure The same log, shown at three different times.

performed. As a result, the values in accounts A and B (on the disk) are restored to $1000 and $2000, respectively.

Next, let us the assume that the crash comes just after the log record for the step

write(C)

of transaction T₁ has been written to stable storage (As shown in fig.b). When the system comes back up, two recovery actions need to be taken. The operation undo(T₁) must be performed, since the record <T₁ start> appears in the log, but there is not record (T₁ commit>. The operation redo (T₀) must be performed, since the log contains both the record <T₀ start> and the record <T₀ commit>. At the end of the entire recovery procedure, the values of accounts A, B, and C are $950, $2050, and $700, respectively. Note that the undo(T₁) operation is performed before the redo (T₀). In this example, the same outcome would result if the order were reversed.

Finally, let us assume that the crash occurs just after the log record.

<T₁ commit>

has been written to stable storage (Figc). When the system comes back up, both T₀ and T₁ need to be redone, since the records <T₀ start> and <T₀ commit> appear in the log as do the records <T₁ start> and <T₁ commit>. After the recovery procedures redo(T₀) and redo(T₁) are performed, the values in accounts A, B, C are $950, $2050 and $600, respectively.

Checkpoints

When a system failure occurs, we must consult the log to determine those transactions that need to be redone and those that need to be undone. In principle, we need to search the entire log to determine this information. There are two major difficulties with this approach:

1. The search process is time-consuming.

2. Most of the transactions that, according to out algorithm, need to be redone have already written their updates into the database. Although redoing them will cause no harm, it will nevertheless cause recovery to take longer.

To reduce these types of overhead checkpoint is introduced. In addition, the system periodically performs checkpoints, which require the following sequence of actions to take place:

1. Output onto stable storage all log records currently residing in main memory.

2. Output to the disk all modified buffer blocks.

3. Output onto stable storage a log record <checkpoint>.

Transactions are not allowed to perform any update actions, such as writing to a buffer block or writing a log record, while a checkpoint is in progress.

The presence of a <checkpoint> record in the log allows the system to streamline its recovery procedure. Consider a transaction T_i that committed prior to the checkpoint the <checkpoint> record. Any database modifications made by T_i must have been written to the database either prior to the checkpoint or as part of the checkpoint itself. Thus, at recovery time, there is no need to perform a redo operations on T_i.

This observation allows us to refine our previous recovery schemes. (We continue to assume that transactions are run serially). After a failure has occurred, the recovery scheme examines the log to determine the most recent transaction T_i that started executing before the most recent checkpoint took place. It can find such a transaction by searching the log backward, from the end of the log, until it finds the first <checkpoint> record (since we are searching backward, the record found is the final<checkpoint> record in the log); then it continues the search backward until it finds the next <T_i start> record. This record identifies a transaction T_i.

Once transaction T_i has been identified, the redo and undo operations need to be applied to only transaction T_i and all the transaction T_j that started executing after transaction T_i. Let us denote these transactions by the set T. The remainder (earlier part) of the log can be ignored, and can be erased whenever desired. The exact recovery operations to be performed depend on whether the immediate-modification technique or the deferred-modification technique is being used. The recovery operations that are required if the immediate-modification technique is employed are as follows:

· For all transactions t_K in T that have no <T_k commit> record in the log, execute undo(T₀).

· For all transactions T_k in T such that the record <T_k commit> appears in the log, execute redo(T_k).

Obviously, the undo operation does not need to be applied when the deferred-modification technique is being employed.

As an illustration, consider the set of transactions {T₀, T₁ … T₁₀₀} executed in the order of the subscripts. Suppose that the most recent checkpoint took place during the execution of transaction T₆₇. Thus, only transactions T₆₇, T₆₈, …, T₁₀₀ need to be considered during the recovery scheme. Each of them needs to be redone if it has committed; otherwise, it needs to be undone.

Shadow Paging

An alternative to log-based crash-recovery techniques is shadow paging. Under certain circumstances, shadow paging may require fewer disk accesses than do the log-based methods. There are, however, disadvantages to the shadow-paging approach. For example, it is hard to extend shadow paging to allow multiple transactions to execute concurrently.

As before, the database is partitioned into some number of fixed-length blocks, which are referred to as pages. The term page is borrowed form operating systems, since we are using a paging scheme for memory management. Let us assume that there are n pages, numbered 1 through n. (In practice, n may be in the hundreds of thousands.) These pages do not need not to be stored in any particular order on disk. However, there must be a way to find the i^th page of the database for any given i. The first entry contains a pointer to the first page of the database, the second entry points to the second page, and so on.

The key idea behind the shadow-paging technique is to maintain two page tables during the life of a transaction: the current page table and the shadow page table. When the transaction starts, both page tables are identical. The shadow page table is never changed over the duration of the transaction. The current page table may be changed when a transaction performs a write operation. All input and output operations use the current page table to locate database pages on disk.

Notes of IT

Tuesday, December 27, 2016