Transaction
Collections of operations that form a single logical unit
of work are called transactions. for example, a transfer of funds from a
checking account to a savings account is a single operation when viewed from
the customer's standpoint; within the database system, however, it comprises
several operation. Clearly, it is essential that all these operation occur, or
that, in case of a failure, none occur. It would be unacceptable if the
checking account were debited, but the savings account were not credited. A
database system must ensure proper execution of transactions despite failures-
either the entire transaction executes, or none of it does.
A transaction is a unit of program execution that accesses
and possible updates various data items. A collection of operations that should
be treated as a single logical operation. A transaction usually results from
the execution of a user program and the in the program the transaction is of
the form begin transaction and end transaction .
To ensure integrity of data, we require that the database
system maintains the following properties (called ACID properties)of the
transactions.
ACID (Atomicity, Consistency, Isolation, Durability)
The four states defined by the acronym ACID (atomicity,
consistency, isolation, and durability) relate to a successful transaction,
usually a database transaction. All of these characteristics must be present,
or the transaction must be completely reversed (undone or rolled back).
A: Atomicity: either the entire set of
operations happens or none of it does. Atomicity refers to the ability
of the DBMS to guarantee that either all of the tasks of a transaction are
performed or none of them are. The transfer of funds can be completed or it can
fail for a multitude of reasons, but atomicity guarantees that one account
won't be debited if the other is not credited as well.
C:
Consistency: the
set of operations taken together should move the system from one consistent
state to another. Consistency refers to the database being in a legal
state when the transaction begins and when it ends. This means that a
transaction can't break the rules, or integrity constraints,
of the database. If an integrity constraint states that all accounts must have
a positive balance, then any transaction violating this rule will be aborted.
I: Isolation: even though multiple transactions may
operate concurrently, there is a total order on all transactions. Stated
another way: each transaction perceives the system as if no other transactions
were running concurrently. Isolation
refers to the ability of the application to make operations in a transaction
appear isolated from all other operations. This means that no operation outside
the transaction can ever see the data in an intermediate state; a bank manager can
see the transferred funds on one account or the other, but never on both --
even if he ran his query while the transfer was still being processed.
D: Durability: even in the face of a crash, once
the system has said that a transaction completed, the results of the
transaction must be permanent. Durability
refers to the guarantee that once the user has been notified of success, the
transaction will persist, and not be undone. This means it will survive system
failure, and that the database system has checked the integrity constraints and won't need to abort the
transaction. Typically, all transactions are written into a log that can be played back to recreate the
system to the its state right before the failure. A transaction can only
be deemed committed after it is safely in the log. Implementing
the ACID properties correctly is not simple. Processing a transaction often
requires a number of small changes to be made, including updating indexes that
are used by the system to speed up searches. This sequence of operations is
subject to failure for a number of reasons; for instance, the system may have
no room left on its disk drives. ACID suggests that the database be able to
perform all of these operations at once. In fact this is difficult to arrange.
There are two popular families of techniques: Write ahead logging and Shadow
paging. In both cases, locks must be acquired on all information that is read
and updated. In write ahead logging, atomicity is guaranteed by ensuring that
all REDO and UNDO information is written to a log before it is written to the
database. In shadowing, updates are applied to a copy of the database, and the
new copy is activated when the transaction commits. The copy refers to
unchanged parts of the old version of the database, rather than being an entire
duplicate.
Terminology
Concurrency control: the method of ensuring that
simultaneously running transactions provide isolation.
Recovery: the act of taking a database after an
unexpected failure and returning it to a consistent state.
Abort: undo a transaction.
Commit: make
a transaction permanent.
Rollback: the act of "undoing" a
transaction.
Transaction manager or transaction monitor: the
component responsible for maintaining the ACID properties.
Compensating transaction: a transaction used to
undo (rollback) another transaction.
Unresolved transaction: a transaction that has
neither aborted nor committed.
Terminated transaction: a transaction that has
either aborted or committed.
The Life and Times of a Transaction
n
Transactions begin by some sort of
"transaction begin" operation.
n
Transactions end either by being committed or
aborted.
n
Abort: return the system to the state it was in
before the transaction began.
Ü System
state must be the same as if the transaction had never existed.
Ü Must
abort any transactions that depend on the outcome of the aborting transaction.
n
Commit: declare the transaction permanently
complete.
Ü If
you say you've committed, no actions should be able to move the system to a
state not containing the transaction.
Ü All
operations much be forever persistent in the database.
Transaction States
Active: initial state. Entered upon begin; stays in this
state while running.
n
Partially Committed: All statements in the
transaction have been completed.
n
Failed: The transaction has encountered an
abnormal or errant condition and cannot continue.
n
Committed: The transaction is complete and can
never be undone.
n
Aborted: The system has been returned to its
pre-transaction state.
Three concurrency problems
There are essentially three ways in which things can go
wrong three ways, that is, in which a transaction, through correct in itself,
can nevertheless produce the wrong answer if some other transaction interferes
with it in some way. It should be noted that the interfering transaction might
also be correct in itself; it is the uncontrolled interleaving of operations
from the two correct transactions that produces the overall incorrect result.
The three-concurrency problems are:
n
The lost update problem;
n
The uncommitted dependency problem and
n
The inconsistent analysis problem
The Lost Update Problem
Consider the situation illustrated in Fig. That figure is
meant to be read as follows: Transaction A retrieves some tuple to at time t1;
transaction B retrieves that same tuple t at
Transaction A
|
Time
|
Transaction B
|
-
|
|
|
-
|
-
|
|
|
-
|
RETRIEVE t
|
t1
|
-
|
-
|
|
|
-
|
-
|
t2
|
RETRIEVE t
|
-
|
|
|
-
|
UPDATE t
|
t3
|
-
|
-
|
|
|
-
|
-
|
t4
|
UPDATE t
|
-
|
¯
|
-
|
Transaction A loses an update at time t4
Time t2; transaction A updates the tuple (on the basis of
the values seen at time t1) at time t3; and transaction B updates the same
tuple (on the basis of the values seen at time t2 which are the same as those
seen at time t1) at time t4. Transaction A’s update is lost at time t4, because
transaction B overwrites it without even looking at it.
The Uncommitted Dependency Problem
The uncommitted dependency problem arises if one
transaction is allowed to retrieve or, worse, update – a tuple that has been updated
by another transaction but not yet committed by that other transaction. For if
it has not yet been committed, there is always a possibility that it never will
be committed but will be rolled back instead
In the first example (Fig), transaction A seen an
uncommitted update (also called an uncommitted change) at time t2. That update
is then undone at time t3. Transaction A is
Transaction A
|
Time
|
Transaction B
|
-
|
|
|
-
|
-
|
|
|
-
|
-
|
t1
|
UPDATE t
|
-
|
|
|
-
|
RETRIEVE t
|
t2
|
-
|
-
|
|
|
-
|
-
|
t3
|
ROLLBACK
|
-
|
¯
|
|
Transaction A becomes dependent on an uncommitted change at
time t2
Transaction A
|
Time
|
Transaction B
|
-
|
|
|
-
|
-
|
|
|
-
|
-
|
t1
|
UPDATE t
|
-
|
|
|
-
|
UPDATE t
|
t2
|
-
|
-
|
|
|
-
|
-
|
t3
|
ROLLBACK
|
-
|
¯
|
|
Transaction A updates an uncommitted change at time t2, and
loses that update at update at time t3
therefore
operating on a false assumption-namely, the assumption that tuple t has the
value seen at time t2, whereas in fact it has whatever value it had prior to
time t1. As a result, transaction A might well produce incorrect output. Now, by
the way that the rollback of transaction B might be due to no fault of B’s –it
might, for example be the result of a system crash. (And transaction A might
already have terminated by that time, in which case the crash would not cause a
rollback to be issued for A also.)
The second example
is even worse. Not only does transaction A become dependent on a
uncommitted change at time t2, but it actually loses an update at time
t3-because the rollback at time r3 cause tuple t to be restored to its value
prior to time t1. This is another version of the lost update problem.
The Inconsistent Analysis Problem
Consider Fig , which shows two transactions A and B
operating on account (ACC) tuples: Transaction A is summing account balance;
transaction B is transferring an amount 10 from account 3 to account 1. The
result produced by A 110, is obviously incorrect; if A were to go on the write
that result back into the database, it would actually
|
|
ACC
3
|
||||||||
Transaction A
|
Time
|
Transaction B
|
||||||||
-
|
|
|
-
|
||||||||
-
|
|
|
-
|
||||||||
RETRIEVE ACC 1 :
|
t1
|
-
|
||||||||
sum = 40
|
|
|
-
|
||||||||
-
|
|
|
-
|
||||||||
RETRIEVE ACC 2 :
|
t2
|
-
|
||||||||
sum = 90
|
|
|
-
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t3
|
RETRIEVE ACC 3
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t4
|
UPDATE ACC 3 :
|
||||||||
-
|
|
|
30 ®
20
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t5
|
RETRIEVE ACC 1
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t6
|
UPDATE ACC 1
|
||||||||
-
|
|
|
40 ®
50
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t7
|
COMMIT
|
||||||||
-
|
|
|
|
||||||||
RETRIVE ACC 3 :
|
t8
|
|
||||||||
sum = 110, not 120
|
|
|
|
||||||||
-
|
¯
|
|
Transaction A performs as inconsistent analysis
leave the database in inconsistent state. We say that A
has seen an inconsistent state of the database and has therefore performed an
inconsistent analysis. Note the difference between this example and the
previous one: There is no question here of A being dependent on an uncommitted
change, since B commits all of its updates before A see ACC3.
Locking
Concurrency problems can all be solved by means of a
concurrency control technique called locking. The basic idea is simple: When a
transaction needs an assurance that some object it is interested in typically a
database tuple–will not change in some manner while its back is turned (as it
were), it acquires a lock on the object. The effect of the lock is to “lock
other transaction of the object in question, and thus in particular to prevent
them from changing it. The first transaction is therefore able to carry out its
processing in the certain knowledge that the object in question will remain in
a stable state for al long as the transaction wishes it to.
The working of locks can be explained as follows.
1. First, we assume the system supports
two kinds of locks, exclusive locks (X locks) and shared locks (S locks). Note:
X and S locks are sometimes called write locks and read locks, respectively.
2. If transaction A holds an exclusive (X)
lock on tuple t, then a request from some distinct transaction B for a lock of
either type on t will be denied.
3. If transaction A holds a shared (S)
lock on tuple t, then:
n
A request from some distinct transaction B for
an X lock on t will be denied;
n
A request from some distinct transaction B for
an S lock on t will be granted (that is, B will now also hold an S lock on t).
These rules can conveniently be summarized by means of a
lock type compatibility matrix (As shown in fig). That matrix is interpreted as
follows: Consider some tuple t; suppose transaction A currently holds a lock on
t as indicated by the entries in the column headings (dash=no lock); and
suppose some distinct transaction B issues a request for a lock on t as
indicated by the entries down the left-hand side (for completeness we again
include the “no lock” case). An “N” indicates a conflict (B's request cannot be
satisfied and B goes into a wait state), a “Y” indicates compatibility (B's
request is satisfied). The matrix is obviously symmetric.
|
X
|
S
|
-
|
X
|
N
|
N
|
Y
|
S
|
N
|
Y
|
Y
|
-
|
Y
|
Y
|
Y
|
Compatibility matrix for lock types X and S.
data access protocol (or locking protocol) that makes use
of X and S locks as just defined to guarantee that problems cannot occur.
1. A
transaction that wishes to retrieve a tuple must first acquire an S lock on
that tuple.
2. A transaction that wishes to update a
tuple must first acquire an X lock on that tuple. Alternatively, if it already
holds an S lock on the tuple (as it will in a RETRIEE-UPDATE sequence), then it
must promote that S lock to X level.
It should be noted that transaction requests for tuple
locks are normally implicit; a “tuple retrieve” request is an implicit request
for an S lock, and a “tuple update” request is an implicit request for an X
lock, on the relevant tuple. Also, of course (as always), we take the term
“update” to include INSERT and DELETEs as well as UPDATEs per se, but the rules
requires some minor refinement to take care of INSERTs and DELETEs.
To continue with the protocol:
3. If a lock request from transaction B is
denied because it conflicts with a lock already held by transaction A,
transaction B goes into a wait state. B will wait until A's lock is released.
Note: The system must guarantee that B does not wait forever (a possibility
sometimes referred to as livelock). A simple way to provide such a guarantee is
to service all lock request on a “first-come, first-served” basis.
4. X locks are held until
end-of-transaction (COMMIT or ROLLBACK).
The Three Concurrency Problems Protocols
The Lost Update Problem
As shown in fig, Transaction A's UPDATE at time t3 is not
accepted, because it is an implicit request for an X lock on t, and such a
request conflicts with the S lock already held by transaction B; so A goes into
a wait state. For analogous reasons, B goes into a wait state at time t4. Now
both transactions are unable to proceed, so there is no question of any update
being lost. Locking thus solves the lost update problem by reducing it to
another problem!-but at least it does solve the original problem.
Transaction A
|
Time
|
Transaction B
|
-
|
|
|
-
|
-
|
|
|
-
|
RETRIEVE ACC 1 :
|
t1
|
-
|
(acquire s lock on to)
|
|
|
-
|
-
|
|
|
-
|
-
|
t2
|
RETRIEVE t
|
-
|
|
|
(acquire S lock on t)
|
-
|
|
|
-
|
UPDATE t
|
t3
|
-
|
(request X lock on t)
|
|
|
-
|
wait
|
|
|
-
|
wait
|
t4
|
UPDATE t
|
wait
|
|
|
(request X lock on t)
|
wait
|
|
|
wait
|
wait
|
|
|
wait
|
wait
|
¯
|
wait
|
No update is lost, but deadlock occurs at time t4
The Uncommitted Dependency Problem
As shown in fig. Transaction A's operation at time t2
(RETRIEVE and UPDATE in Fig.) is not accepted in either case, because it is an
implicit request for a lock on t, and such a request conflicts with the X lock
already held by B; so A goes into a wait state. It remains in that wait state
until B reaches its termination (either COMMIT or ROLLBACK), when B's lock is
released and A is able to proceed; and at that point A sees a committed value
(either the pre-B value, if B terminates with rollback, or the post-B value
otherwise). Either way, A is no longer dependent on an uncommitted update.
Transaction A
|
Time
|
Transaction B
|
-
|
|
|
-
|
-
|
|
|
-
|
-
|
t1
|
UPDATE t
|
-
|
|
|
(acquire X lock on t)
|
-
|
|
|
-
|
RETRIEVE t
|
t2
|
-
|
(request S lock on t)
|
|
|
-
|
wait
|
|
|
-
|
wait
|
t3
|
COMMIT / ROLLBACK
|
wait
|
|
|
(release X lock on t)
|
resume : RETRIEVE t
|
t4
|
|
(acquire S lock on t)
|
|
|
|
-
|
¯
|
|
Transaction A is prevented from seeing an uncommitted
change at time t2
Transaction A
|
Time
|
Transaction B
|
-
|
|
|
-
|
-
|
|
|
-
|
-
|
t1
|
UPDATE t
|
-
|
|
|
(acquire X lock on t)
|
-
|
|
|
-
|
RETRIEVE t
|
t2
|
-
|
(request S lock on t)
|
|
|
-
|
wait
|
|
|
-
|
wait
|
t3
|
COMMIT / ROLLBACK
|
wait
|
|
|
(release X lock on t)
|
resume : RETRIEVE t
|
t4
|
|
(acquire S lock on t)
|
|
|
|
-
|
¯
|
|
Transaction A is prevented from updating an uncommitted
change at time t2
The Inconsistent Analysis Problem
As shown in fig., Transaction B's UPDATE at time t6 is not
accepted, because it is an implicit request for an X lock an ACC 1, and such a
request conflicts with the S lock already held by A; so B goes into a wait
state. Likewise, transaction A's RETREIVE at time t7 is also not accepted,
because it is an implicit request for an S lock an ACC 3, and such a request
conflicts with the X lock already by B; so A goes into a wait state also.
Again, therefore, locking solves the original problem (the inconsistent
analysis problem, in this case) by forcing a deadlock.
|
|
ACC
3
|
||||||||
Transaction A
|
Time
|
Transaction B
|
||||||||
-
|
|
|
-
|
||||||||
-
|
|
|
-
|
||||||||
RETRIEVE ACC 1 :
|
t1
|
-
|
||||||||
(acquire S lock on ACC 1)
|
|
|
-
|
||||||||
Sum = 40
|
|
|
-
|
||||||||
-
|
|
|
-
|
||||||||
RETRIEVE ACC 2 :
|
t2
|
-
|
||||||||
(acquire S lock on ACC 2)
|
|
|
-
|
||||||||
sum = 90
|
|
|
-
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t3
|
RETRIEVE ACC 3
|
||||||||
-
|
|
|
(acquire X lock on ACC 3)
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t4
|
UPDATE ACC 3
|
||||||||
|
|
|
(acquire X lock on ACC 3)
|
||||||||
-
|
|
|
30 ®
20
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t5
|
RETRIEVE ACC 1
|
||||||||
-
|
|
|
(acquire S lock on ACC 1)
|
||||||||
-
|
|
|
-
|
||||||||
-
|
t6
|
UPDATE ACC 1
|
||||||||
-
|
|
|
(request X lock on ACC 3)
|
||||||||
-
|
|
|
wait
|
||||||||
RETRIEVE ACC 3 :
|
t7
|
wait
|
||||||||
(request S lock on ACC 3)
|
|
|
wait
|
||||||||
wait
|
|
|
wait
|
||||||||
wait
|
¯
|
wait
|
Inconsistent analysis is prevented, but deadlock occurs at
time t7
Deadlock Handling
A system is in a deadlock state if there exists a set of
transactions such that every transaction in the set is waiting for another
transaction in the set. More precisely, there exists a set of waiting
transaction {TO,T1,…,Tn} such that TO
is waiting for a data item that is held by T1, and T1 is
waiting for a data item that is held by T2 , and.., and Tn-1
is waiting for a data item that is held by Tn, and Tn is
waiting for a data item that is held by TO. None of the transactions
can make progress in such a situation.
n
Occurs when two or more transactions are blocked
waiting on each other in such a manner that none will make forward progress.
n
Two approaches
Ü Prevention:
do not let deadlocks occur.
Ü Detection
and Recovery: allow deadlocks to occur, but detect it and do something about
it.
Deadlock Prevention
n
Three categories
Ü Order
lock requests in such a way that deadlock can never occur.
Ü Abort
a transaction instead of waiting when there is the potential for deadlock.
Ü Use
time-outs.
n
Lock ordering
Ü Simple
scheme: request all locks atomically at once.
Difficult to know what to lock at the beginning of the
transaction.
May hold items locked for along time unnecessarily.
n
Preemptive schemes
Ü Assign
a timestamp to each transaction when it begins.
Ü At
the time a transaction is about to wait, decide if we allow the transaction to
wait, force it to abort, or force someone else to abort.
Ü If
a transaction aborts, we do not give it a new timestamp.
Ü wait-die:
wait only on younger transactions; if you encounter an older transaction, abort
self.
Ü wound-die:
if transaction would block on a younger transaction, abort the younger
transaction (i.e., wound it).
Ü Trade-offs
wait-die: older transactions wait more (but eventually
finish).
wait-die: a transaction might get aborted several times
before it finally completes.
1. wound-wait:
may be fewer rollbacks than wait-die.
Ü Both
schemes do a lot of rollbacks that may not be necessary.
n
Time-out based schemes
Ü Allow
a transaction to wait for at most a specified time.
Ü If
the time elapses, the transaction rolls back.
Deadlock Detection and Recovery
n
Construct/maintain a waits-for graph.
Ü Vertex
set consists of all transactions.
Ü An
edge from Vi to V j indicates that T
i is waiting on a lock held by T
j .
n
If there is a cycle in the graph, then a
deadlock exists.
n
Pick a participant in the deadlock and abort it.
Ü May
be possible to do partial rollback (i.e., only roll back far enough to break
the deadlock instead of to the beginning of the transaction).
Ü In
practice, far too complex to maintain the state necessary to do this.
Ü Avoid
starvation: do not want the same transaction continually getting rolled back.
Can select a policy that favors transactions previously
aborted.
Buffer Management
Programs in a database system make requests on the buffer
manager when they need a block from disk. If the block is already in the
buffer, the requester is passed the address of the block in main memory. If the
block is not in the buffer, the buffer manager first allocates space in the
buffer for the block, throwing out some other block, if required, to make space
for the new block. The block that is thrown out is written back to disk only if
it was modified since the most recent time that it was written to the disk.
Then, the buffer manager reads in the block from the disk to the buffer, and
passes the address of the block in main memory to the requester. The internal
actions of the buffer manager are transparent to the programs that issue disk-
block requests.
Buffer manager appears to be nothing more than a virtual
memory manager, like those found in most operating systems. One difference is
that the size of memory addresses are not sufficient to address all disks
blocks. Further, to serve the database system well, the buffer manager must use
techniques more sophisticated than typical virtual memory management schemes.
Replacement strategy: When there is no room left in the
buffer, a block must be removed from the buffer before a new one can read in.
Typically, operating systems use a least recently used(LRU) scheme, in which
the block that was referenced least recently is written back to disk and is
removed from the buffer. This simple approach can be improved on for database
applications.
Pined blocks: For the database system to be able to
recover from crashes it is necessary to restrict those times when a block may
be written back to disk. A block that is not allowed to be written back to disk
is said to be pined. Although many operating systems do not provide support for
pinned blocks, such a feature is essential for the implementation of a database
system that is resilient to crashes.
Forced output of blocks: There are situations in which it
is necessary to write back the block to disk, even thought the buffer space
that is occupies is not needed. This write is call the forced output of a
block Main memory contents and thus
buffer contents are lost in a crash, whereas data on disk usually survive a
crash.
No comments:
Post a Comment