Notes of IT: Concurrency control

Transaction

Collections of operations that form a single logical unit of work are called transactions. for example, a transfer of funds from a checking account to a savings account is a single operation when viewed from the customer's standpoint; within the database system, however, it comprises several operation. Clearly, it is essential that all these operation occur, or that, in case of a failure, none occur. It would be unacceptable if the checking account were debited, but the savings account were not credited. A database system must ensure proper execution of transactions despite failures- either the entire transaction executes, or none of it does.

A transaction is a unit of program execution that accesses and possible updates various data items. A collection of operations that should be treated as a single logical operation. A transaction usually results from the execution of a user program and the in the program the transaction is of the form begin transaction and end transaction .

To ensure integrity of data, we require that the database system maintains the following properties (called ACID properties)of the transactions.

ACID (Atomicity, Consistency, Isolation, Durability)

The four states defined by the acronym ACID (atomicity, consistency, isolation, and durability) relate to a successful transaction, usually a database transaction. All of these characteristics must be present, or the transaction must be completely reversed (undone or rolled back).

A: Atomicity: either the entire set of operations happens or none of it does. Atomicity refers to the ability of the DBMS to guarantee that either all of the tasks of a transaction are performed or none of them are. The transfer of funds can be completed or it can fail for a multitude of reasons, but atomicity guarantees that one account won't be debited if the other is not credited as well.

C: Consistency: the set of operations taken together should move the system from one consistent state to another. Consistency refers to the database being in a legal state when the transaction begins and when it ends. This means that a transaction can't break the rules, or integrity constraints, of the database. If an integrity constraint states that all accounts must have a positive balance, then any transaction violating this rule will be aborted.

I: Isolation: even though multiple transactions may operate concurrently, there is a total order on all transactions. Stated another way: each transaction perceives the system as if no other transactions were running concurrently. Isolation refers to the ability of the application to make operations in a transaction appear isolated from all other operations. This means that no operation outside the transaction can ever see the data in an intermediate state; a bank manager can see the transferred funds on one account or the other, but never on both -- even if he ran his query while the transfer was still being processed.

D: Durability: even in the face of a crash, once the system has said that a transaction completed, the results of the transaction must be permanent. Durability refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone. This means it will survive system failure, and that the database system has checked the integrity constraints and won't need to abort the transaction. Typically, all transactions are written into a log that can be played back to recreate the system to the its state right before the failure. A transaction can only be deemed committed after it is safely in the log. Implementing the ACID properties correctly is not simple. Processing a transaction often requires a number of small changes to be made, including updating indexes that are used by the system to speed up searches. This sequence of operations is subject to failure for a number of reasons; for instance, the system may have no room left on its disk drives. ACID suggests that the database be able to perform all of these operations at once. In fact this is difficult to arrange. There are two popular families of techniques: Write ahead logging and Shadow paging. In both cases, locks must be acquired on all information that is read and updated. In write ahead logging, atomicity is guaranteed by ensuring that all REDO and UNDO information is written to a log before it is written to the database. In shadowing, updates are applied to a copy of the database, and the new copy is activated when the transaction commits. The copy refers to unchanged parts of the old version of the database, rather than being an entire duplicate.

Terminology

Concurrency control: the method of ensuring that simultaneously running transactions provide isolation.

Recovery: the act of taking a database after an unexpected failure and returning it to a consistent state.

Abort: undo a transaction.

Commit: make a transaction permanent.

Rollback: the act of "undoing" a transaction.

Transaction manager or transaction monitor: the component responsible for maintaining the ACID properties.

Compensating transaction: a transaction used to undo (rollback) another transaction.

Unresolved transaction: a transaction that has neither aborted nor committed.

Terminated transaction: a transaction that has either aborted or committed.

The Life and Times of a Transaction

n Transactions begin by some sort of "transaction begin" operation.

n Transactions end either by being committed or aborted.

n Abort: return the system to the state it was in before the transaction began.

Ü System state must be the same as if the transaction had never existed.

Ü Must abort any transactions that depend on the outcome of the aborting transaction.

n Commit: declare the transaction permanently complete.

Ü If you say you've committed, no actions should be able to move the system to a state not containing the transaction.

Ü All operations much be forever persistent in the database.

Transaction States

Active: initial state. Entered upon begin; stays in this state while running.

n Partially Committed: All statements in the transaction have been completed.

n Failed: The transaction has encountered an abnormal or errant condition and cannot continue.

n Committed: The transaction is complete and can never be undone.

n Aborted: The system has been returned to its pre-transaction state.

Three concurrency problems

There are essentially three ways in which things can go wrong three ways, that is, in which a transaction, through correct in itself, can nevertheless produce the wrong answer if some other transaction interferes with it in some way. It should be noted that the interfering transaction might also be correct in itself; it is the uncontrolled interleaving of operations from the two correct transactions that produces the overall incorrect result. The three-concurrency problems are:

n The lost update problem;

n The uncommitted dependency problem and

n The inconsistent analysis problem

The Lost Update Problem

Consider the situation illustrated in Fig. That figure is meant to be read as follows: Transaction A retrieves some tuple to at time t1; transaction B retrieves that same tuple t at

Transaction A	Time	Transaction B
-	\|	-
-	\|	-
RETRIEVE t	t1	-
-	\|	-
-	t2	RETRIEVE t
-	\|	-
UPDATE t	t3	-
-	\|	-
-	t4	UPDATE t
-	¯	-

Transaction A loses an update at time t4

Time t2; transaction A updates the tuple (on the basis of the values seen at time t1) at time t3; and transaction B updates the same tuple (on the basis of the values seen at time t2 which are the same as those seen at time t1) at time t4. Transaction A’s update is lost at time t4, because transaction B overwrites it without even looking at it.

The Uncommitted Dependency Problem

The uncommitted dependency problem arises if one transaction is allowed to retrieve or, worse, update – a tuple that has been updated by another transaction but not yet committed by that other transaction. For if it has not yet been committed, there is always a possibility that it never will be committed but will be rolled back instead

In the first example (Fig), transaction A seen an uncommitted update (also called an uncommitted change) at time t2. That update is then undone at time t3. Transaction A is

Transaction A	Time	Transaction B
-	\|	-
-	\|	-
-	t1	UPDATE t
-	\|	-
RETRIEVE t	t2	-
-	\|	-
-	t3	ROLLBACK
-	¯

Transaction A becomes dependent on an uncommitted change at time t2

Transaction A	Time	Transaction B
-	\|	-
-	\|	-
-	t1	UPDATE t
-	\|	-
UPDATE t	t2	-
-	\|	-
-	t3	ROLLBACK
-	¯

Transaction A updates an uncommitted change at time t2, and loses that update at update at time t3

therefore operating on a false assumption-namely, the assumption that tuple t has the value seen at time t2, whereas in fact it has whatever value it had prior to time t1. As a result, transaction A might well produce incorrect output. Now, by the way that the rollback of transaction B might be due to no fault of B’s –it might, for example be the result of a system crash. (And transaction A might already have terminated by that time, in which case the crash would not cause a rollback to be issued for A also.)

The second example is even worse. Not only does transaction A become dependent on a uncommitted change at time t2, but it actually loses an update at time t3-because the rollback at time r3 cause tuple t to be restored to its value prior to time t1. This is another version of the lost update problem.

The Inconsistent Analysis Problem

Consider Fig , which shows two transactions A and B operating on account (ACC) tuples: Transaction A is summing account balance; transaction B is transferring an amount 10 from account 3 to account 1. The result produced by A 110, is obviously incorrect; if A were to go on the write that result back into the database, it would actually

ACC 1

ACC 2

ACC 3

Transaction A

Time

Transaction B

RETRIEVE ACC 1 :

sum = 40

RETRIEVE ACC 2 :

sum = 90

RETRIEVE ACC 3

UPDATE ACC 3 :

30 ® 20

RETRIEVE ACC 1

UPDATE ACC 1

40 ® 50

COMMIT

RETRIVE ACC 3 :

sum = 110, not 120

Transaction A performs as inconsistent analysis

leave the database in inconsistent state. We say that A has seen an inconsistent state of the database and has therefore performed an inconsistent analysis. Note the difference between this example and the previous one: There is no question here of A being dependent on an uncommitted change, since B commits all of its updates before A see ACC3.

Locking

Concurrency problems can all be solved by means of a concurrency control technique called locking. The basic idea is simple: When a transaction needs an assurance that some object it is interested in typically a database tuple–will not change in some manner while its back is turned (as it were), it acquires a lock on the object. The effect of the lock is to “lock other transaction of the object in question, and thus in particular to prevent them from changing it. The first transaction is therefore able to carry out its processing in the certain knowledge that the object in question will remain in a stable state for al long as the transaction wishes it to.

The working of locks can be explained as follows.

1. First, we assume the system supports two kinds of locks, exclusive locks (X locks) and shared locks (S locks). Note: X and S locks are sometimes called write locks and read locks, respectively.

2. If transaction A holds an exclusive (X) lock on tuple t, then a request from some distinct transaction B for a lock of either type on t will be denied.

3. If transaction A holds a shared (S) lock on tuple t, then:

n A request from some distinct transaction B for an X lock on t will be denied;

n A request from some distinct transaction B for an S lock on t will be granted (that is, B will now also hold an S lock on t).

These rules can conveniently be summarized by means of a lock type compatibility matrix (As shown in fig). That matrix is interpreted as follows: Consider some tuple t; suppose transaction A currently holds a lock on t as indicated by the entries in the column headings (dash=no lock); and suppose some distinct transaction B issues a request for a lock on t as indicated by the entries down the left-hand side (for completeness we again include the “no lock” case). An “N” indicates a conflict (B's request cannot be satisfied and B goes into a wait state), a “Y” indicates compatibility (B's request is satisfied). The matrix is obviously symmetric.

	X	S	-
X	N	N	Y
S	N	Y	Y
-	Y	Y	Y

Compatibility matrix for lock types X and S.

data access protocol (or locking protocol) that makes use of X and S locks as just defined to guarantee that problems cannot occur.

1. A transaction that wishes to retrieve a tuple must first acquire an S lock on that tuple.

2. A transaction that wishes to update a tuple must first acquire an X lock on that tuple. Alternatively, if it already holds an S lock on the tuple (as it will in a RETRIEE-UPDATE sequence), then it must promote that S lock to X level.

It should be noted that transaction requests for tuple locks are normally implicit; a “tuple retrieve” request is an implicit request for an S lock, and a “tuple update” request is an implicit request for an X lock, on the relevant tuple. Also, of course (as always), we take the term “update” to include INSERT and DELETEs as well as UPDATEs per se, but the rules requires some minor refinement to take care of INSERTs and DELETEs.

To continue with the protocol:

3. If a lock request from transaction B is denied because it conflicts with a lock already held by transaction A, transaction B goes into a wait state. B will wait until A's lock is released. Note: The system must guarantee that B does not wait forever (a possibility sometimes referred to as livelock). A simple way to provide such a guarantee is to service all lock request on a “first-come, first-served” basis.

4. X locks are held until end-of-transaction (COMMIT or ROLLBACK).

The Three Concurrency Problems Protocols

The Lost Update Problem

As shown in fig, Transaction A's UPDATE at time t3 is not accepted, because it is an implicit request for an X lock on t, and such a request conflicts with the S lock already held by transaction B; so A goes into a wait state. For analogous reasons, B goes into a wait state at time t4. Now both transactions are unable to proceed, so there is no question of any update being lost. Locking thus solves the lost update problem by reducing it to another problem!-but at least it does solve the original problem.

Transaction A	Time	Transaction B
-	\|	-
-	\|	-
RETRIEVE ACC 1 :	t1	-
(acquire s lock on to)	\|	-
-	\|	-
-	t2	RETRIEVE t
-	\|	(acquire S lock on t)
-	\|	-
UPDATE t	t3	-
(request X lock on t)	\|	-
wait	\|	-
wait	t4	UPDATE t
wait	\|	(request X lock on t)
wait	\|	wait
wait	\|	wait
wait	¯	wait

No update is lost, but deadlock occurs at time t4

The Uncommitted Dependency Problem

As shown in fig. Transaction A's operation at time t2 (RETRIEVE and UPDATE in Fig.) is not accepted in either case, because it is an implicit request for a lock on t, and such a request conflicts with the X lock already held by B; so A goes into a wait state. It remains in that wait state until B reaches its termination (either COMMIT or ROLLBACK), when B's lock is released and A is able to proceed; and at that point A sees a committed value (either the pre-B value, if B terminates with rollback, or the post-B value otherwise). Either way, A is no longer dependent on an uncommitted update.

Transaction A	Time	Transaction B
-	\|	-
-	\|	-
-	t1	UPDATE t
-	\|	(acquire X lock on t)
-	\|	-
RETRIEVE t	t2	-
(request S lock on t)	\|	-
wait	\|	-
wait	t3	COMMIT / ROLLBACK
wait	\|	(release X lock on t)
resume : RETRIEVE t	t4
(acquire S lock on t)	\|
-	¯

Transaction A is prevented from seeing an uncommitted change at time t2

Transaction A	Time	Transaction B
-	\|	-
-	\|	-
-	t1	UPDATE t
-	\|	(acquire X lock on t)
-	\|	-
RETRIEVE t	t2	-
(request S lock on t)	\|	-
wait	\|	-
wait	t3	COMMIT / ROLLBACK
wait	\|	(release X lock on t)
resume : RETRIEVE t	t4
(acquire S lock on t)	\|
-	¯

Transaction A is prevented from updating an uncommitted change at time t2

The Inconsistent Analysis Problem

As shown in fig., Transaction B's UPDATE at time t6 is not accepted, because it is an implicit request for an X lock an ACC 1, and such a request conflicts with the S lock already held by A; so B goes into a wait state. Likewise, transaction A's RETREIVE at time t7 is also not accepted, because it is an implicit request for an S lock an ACC 3, and such a request conflicts with the X lock already by B; so A goes into a wait state also. Again, therefore, locking solves the original problem (the inconsistent analysis problem, in this case) by forcing a deadlock.

ACC 1

ACC 2

ACC 3

Transaction A

Time

Transaction B

RETRIEVE ACC 1 :

(acquire S lock on ACC 1)

Sum = 40

RETRIEVE ACC 2 :

(acquire S lock on ACC 2)

sum = 90

RETRIEVE ACC 3

(acquire X lock on ACC 3)

UPDATE ACC 3

(acquire X lock on ACC 3)

30 ® 20

RETRIEVE ACC 1

(acquire S lock on ACC 1)

UPDATE ACC 1

(request X lock on ACC 3)

wait

RETRIEVE ACC 3 :

wait

(request S lock on ACC 3)

wait

Inconsistent analysis is prevented, but deadlock occurs at time t7

Deadlock Handling

A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is waiting for another transaction in the set. More precisely, there exists a set of waiting transaction {T_O,T₁,…,T_n} such that T_O is waiting for a data item that is held by T₁, and T₁is waiting for a data item that is held by T₂ , and.., and T_n-1 is waiting for a data item that is held by T_n, and T_n is waiting for a data item that is held by T_O. None of the transactions can make progress in such a situation.

n Occurs when two or more transactions are blocked waiting on each other in such a manner that none will make forward progress.

n Two approaches

Ü Prevention: do not let deadlocks occur.

Ü Detection and Recovery: allow deadlocks to occur, but detect it and do something about it.

Deadlock Prevention

n Three categories

Ü Order lock requests in such a way that deadlock can never occur.

Ü Abort a transaction instead of waiting when there is the potential for deadlock.

Ü Use time-outs.

n Lock ordering

Ü Simple scheme: request all locks atomically at once.

Difficult to know what to lock at the beginning of the transaction.

May hold items locked for along time unnecessarily.

n Preemptive schemes

Ü Assign a timestamp to each transaction when it begins.

Ü At the time a transaction is about to wait, decide if we allow the transaction to wait, force it to abort, or force someone else to abort.

Ü If a transaction aborts, we do not give it a new timestamp.

Ü wait-die: wait only on younger transactions; if you encounter an older transaction, abort self.

Ü wound-die: if transaction would block on a younger transaction, abort the younger transaction (i.e., wound it).

Ü Trade-offs

wait-die: older transactions wait more (but eventually finish).

wait-die: a transaction might get aborted several times before it finally completes.

1. wound-wait: may be fewer rollbacks than wait-die.

Ü Both schemes do a lot of rollbacks that may not be necessary.

n Time-out based schemes

Ü Allow a transaction to wait for at most a specified time.

Ü If the time elapses, the transaction rolls back.

Deadlock Detection and Recovery

n Construct/maintain a waits-for graph.

Ü Vertex set consists of all transactions.

Ü An edge from Vi to V_j indicates that T_i is waiting on a lock held by T_j .

n If there is a cycle in the graph, then a deadlock exists.

n Pick a participant in the deadlock and abort it.

Ü May be possible to do partial rollback (i.e., only roll back far enough to break the deadlock instead of to the beginning of the transaction).

Ü In practice, far too complex to maintain the state necessary to do this.

Ü Avoid starvation: do not want the same transaction continually getting rolled back.

Can select a policy that favors transactions previously aborted.

Buffer Management

Programs in a database system make requests on the buffer manager when they need a block from disk. If the block is already in the buffer, the requester is passed the address of the block in main memory. If the block is not in the buffer, the buffer manager first allocates space in the buffer for the block, throwing out some other block, if required, to make space for the new block. The block that is thrown out is written back to disk only if it was modified since the most recent time that it was written to the disk. Then, the buffer manager reads in the block from the disk to the buffer, and passes the address of the block in main memory to the requester. The internal actions of the buffer manager are transparent to the programs that issue disk- block requests.

Buffer manager appears to be nothing more than a virtual memory manager, like those found in most operating systems. One difference is that the size of memory addresses are not sufficient to address all disks blocks. Further, to serve the database system well, the buffer manager must use techniques more sophisticated than typical virtual memory management schemes.

Replacement strategy: When there is no room left in the buffer, a block must be removed from the buffer before a new one can read in. Typically, operating systems use a least recently used(LRU) scheme, in which the block that was referenced least recently is written back to disk and is removed from the buffer. This simple approach can be improved on for database applications.

Pined blocks: For the database system to be able to recover from crashes it is necessary to restrict those times when a block may be written back to disk. A block that is not allowed to be written back to disk is said to be pined. Although many operating systems do not provide support for pinned blocks, such a feature is essential for the implementation of a database system that is resilient to crashes.

Forced output of blocks: There are situations in which it is necessary to write back the block to disk, even thought the buffer space that is occupies is not needed. This write is call the forced output of a block Main memory contents and thus buffer contents are lost in a crash, whereas data on disk usually survive a crash.

Notes of IT

Tuesday, December 27, 2016

Concurrency control

Transaction

ACID (Atomicity, Consistency, Isolation, Durability)

Terminology

The Life and Times of a Transaction

Transaction States

Three concurrency problems

The Lost Update Problem

The Uncommitted Dependency Problem

The Inconsistent Analysis Problem

Locking

The Three Concurrency Problems Protocols

The Lost Update Problem

The Uncommitted Dependency Problem

The Inconsistent Analysis Problem

Deadlock Handling

Deadlock Prevention

Deadlock Detection and Recovery

Buffer Management

No comments:

Mobile Cloud Computing