数据库中的锁[lock]是用于并发控制[concurrency control]的重要机制,其本质是对特定数据的控制。一般来说,在应用了锁机制的数据库系统中,如果想要访问/修改任何一个数据,则一定是要先对该数据加锁,访问之后再释放锁。锁主要有两种形式,共享锁[shared lock]和排它锁[exclusive lock]。

1 Lock

1.1 Shared Lock

共享锁又称读锁,如果Txn T对数据A加上了共享锁,那么Txn T可以读A,但不能修改A,其他Txn只能在对数据A加共享锁而不能加排它锁。加上了共享所,任何Txn包括加锁的Txn本身都不能对数据进行修改(但是允许读取),以此来保证当前读取数据的准确性。

If a transaction T has obtained a shared-mode lock (denoted by S) on item Q, then T can read, but cannot write, Q. – Database System Concepts 6th Edition

1.2 Exclusive Lock

排它锁又称写锁,如果Txn T对于数据A加上了排它锁,那么只允许T对A进行读取和修改,其他任何事务都不能再对A加任何类型的锁,直到T释放该排它锁。这就防止其他Txn在A上进行任何的读取和修改,直到该排它锁被释放。

If a transaction T has obtained an exclusive-mode lock (denoted by X) on item Q, then T can both read and write Q.

1.3 Lock Compatibility

锁的相容矩阵,只有shared lock和shared lock可以相互兼容,剩下的都不可以。

To access a data item, transaction T must first lock that item. If the data item is already locked by another transaction in an incompatible mode, the concurrencycontrol manager will not grant the lock until all incompatible locks held by other transactions have been released. Thus, T is made to wait until all incompatible locks held by other transactions have been released.

X S -
X false false true
S false true true
- true true true

1.4 Lock vs Latch

Lock和Latch是两个在数据库和分布式系统中概念相近的两个词。

Locks on buffer blocks are unrelated to locks used for concurrency-control of transactions, and releasing them in a non-two-phase manner does not have any implications on transaction serializability. These locks, and other similar locks that are held for a short duration, are often referred to as latches. – Database System Concepts 6th Edition

对于lock和latch的区别,在于首先latch作用在buffer block也就是内存中,而lock应该是作用在数据库表中或者指定数据集上(当然如果是disk database那就是作用在disk上,如果是memory database那么也是作用在内存中)。第二个关键区别在于,latch的作用时间一般比较短,主要是互斥作用比如短时间pin住一个数据进行操作,而lock的作用时间比较长,一般都是要便随整个流程。

关于Oracle中lock和latch的区别可以参考这篇[http://www.dba-oracle.com/t_difference_latch_lock.html]

2 Deadlock

死锁一般指的是一组Txn之间的锁相互依赖,造成任何一个Txn都无法继续进行导致系统无限次询问锁释放而进入系统空转。对于死锁有两种解决方案,一种采取一定的方法预防死锁的方法[deadlock prevention],另一种是允许死锁的发生,但提供机制进行定期检测,如果有则用对应的方案处理[deadlock detection and recovery]

A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is waiting for another transaction in the set. More precisely, there exists a set of waiting transactions {T0, T1, …, Tn} such that T0 is waiting for a data item that T1 holds, and T1 is waiting for a data item that T2 holds, and…, and Tn−1 is waiting for a data item that Tn holds, and Tn is waiting for a data item that T0 holds. None of the transactions can make progress in such a situation.

2.1 Deadlock Prevention

prevention的思路就是破坏产生死锁的条件,从图的角度看死锁,其实就是在一个由Txn代表点,依赖关系代表边的图中产生了闭环,那么只要防止闭环产生就可以预防死锁。Prevention主要有三种方法,一次封锁法,顺序封锁法,限时封锁法。

2.1.1 Locks all before execution

这种方法要求Txn在执行前必须一次性将所有需要使用的数据全部加锁,否则就不能执行下去。显而易见,这种做法进行了一次数据锁的提前检查,并且只在所有数据都可以加锁而不需要依赖其他Txn的锁解除的情况下才执行Txn,这就100%预防了死锁的发生。但是这么做的缺点也很明显,首先这么做会大大降低整个数据库系统的并行度和I/O性能。其次,很多时候很难准确的知道整个Txn会用上哪些数据,或者说知道这个事比较麻烦,所以只好扩大锁的范围,比如无法确定需要表中哪些数据,只好锁住整个表,这么做就进一步降低了并发度。所以,只在死锁条件经常可能出现并且严格防止死锁的系统中才会使用。

The simplest scheme under the first approach requires that each transaction locks all its data items before it begins execution. Moreover, either all are locked in one step or none are locked. There are two main disadvantages to this protocol: (1) it is often hard to predict, before the transaction begins, what data items need to be locked; (2) data-item utilization may be very low, since many of the data items may be locked but unused for a long time.

2.1.2 Make ordering

顺序封锁就是对于数据预先设置一个加锁顺序,所有Txn都按照这个顺序进行加锁,比如在B Tree的index中规定加锁必须是从root到leaf,或者从leaf到root进行加锁。但是这个方法也很难实现,首先Txn的执行很多时候都是动态决定的,无法实现预测,其次,数据库的变化也是常态,要维护一个固定的顺序成本很高。

Another approach for preventing deadlocks is to impose an ordering of all data items, and to require that a transaction lock data items only in a sequence consistent with the ordering. We have seen one such scheme in the tree protocol, which uses a partial ordering of data items.

这里介绍两种基于timestamp的死锁预防方法

  1. The wait–die scheme is a nonpreemptive technique. When transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp smaller than that of Tj (that is, Ti is older than Tj). Otherwise, Ti is rolled back (dies). For example, suppose that transactions T1, T2, and T3 have timestamps 5, 10, and 15, respectively. If T1 requests a data item held by T2, then T1 will wait. If T3 requests a data item held by T2, then T3 will be rolled back.

一般来说在timestamp机制中,后到的Txn的timestamp会大于先来的Txn,所以wait-die方法的意思是,先来的等后到的,如果T1的timestamp较小而且申请访问的数据正在被一个timestamp较大的Txn加锁,那么T1就会等待,否则就会回滚T1。之所以说wait-die是nonpreemptive technique,就是因为timestamp大小只是决定申请对数据加锁的Txn的后续动作,并不会改变已经获得锁的Txn的状态。

  1. The wound–wait scheme is a preemptive technique. It is a counterpart to the wait–die scheme. When transaction Ti requests a data item currently held by Tj, Ti is allowed to wait only if it has a timestamp larger than that of Tj (that is, Ti is younger than Tj). Otherwise, Tj is rolled back (Tj is wounded by Ti). Returning to the example, with transactions T1, T2, and T3, if T1 requests a data item held by T2, then the data item will be preempted from T2, and T2 will be rolled back. If T3 requests a data item held by T2, then T3 will wait.

wound-wait机制是由timestamp决定请求数据的Txn是等待锁释放,还是强行占有已经被加锁的数据(此时加锁的Txn要被回滚)

2.1.3 Setup a timeout

还有一种简单的方法就是对每个锁设置一个时限,比如每个锁最多维持30s,30s之后锁就会被自动解锁,这样做也不会产生死锁,因为不会陷入无限制的锁状态检测,但是人为规定一个时间很难满足所有Txn的要求,一旦Txn执行过程中锁被自动解除,造成并发操作错误,危险性也是很高的。

2.2 Deadlock Detection & Recovery

死锁检测和恢复是另一种设计理念,与其预防不如让死锁发生只要定期检测并且清除就可以了,这样就将死锁的危害限制在一个固定的检测时间窗之内。

2.2.1 Deadlock detection####

死锁检测可以用wait-for graph图完成,死锁的大原则是不同的Txn之间相互依赖,也就是说在图中形成闭环,比如如果Txn1等待Txn2,就存在有向边从Txn1到Txn2,如果通过waitfor graph能发现闭环,那么就是存在deadlock。此时就是使用recovery处理

2.2.2 Deadlock recovery

解除死锁就是破坏wait-for图中存在的闭环。所以只要rollback其中一个或者几个Txn,那么这个闭环就不存在了。简单的rollback方式可以直接rollback闭环中所有Txn及其相关Txn,也可以简单的随机选择一个来进行rollback,更合理的方式是估计rollback代价最小的Txn进行rollback(代价的衡量方式有很多,比如涉及的数据量和有关的Txn),当然我们也要保证不能每次都rollback同一个Txn,因为这样这个Txn永远也不乎执行完毕。

3 Two-Phase Locking Protocol

两阶段锁协议是一种保证可串行化[serialibility]的加锁方式,将Txn分为两个阶段,第一阶段是加锁也称为扩展阶段[Growing Phase],在此阶段Txn申请获得任何数据的任何类型锁,但是不能释放任何锁。第二阶段是解锁也称为收缩阶段[Shrinking Phase],Txn可以释放任何数据上的任何类型锁但是不能申请添加任何锁。如果并发执行的所有Txn都遵守2PL,那么对于这些Txn的任何并发操作调度都是可串行化的,但是2PL不能保证没有死锁

  1. Growing Phase: A transaction may obtain locks, but may not release any lock.

  2. Shrinking Phase: A transaction may release locks, but may not obtain any new locks.

2PL的一个变种strict two-phase locking protocol可以解决Txn的级联问题

Cascading rollbacks can be avoided by a modification of two-phase locking called the strict two-phase locking protocol. This protocol requires not only that locking be two phase, but also that all exclusive-mode locks taken by a transaction be held until that transaction commits. This requirement ensures that any data written by an uncommitted transaction are locked in exclusive mode until the transaction commits, preventing any other transaction from reading the data.

4 Multiple Granularity and Intention Lock

数据库的Granularity hierarchy通常适用一棵树来表示,根节点一般表示整个数据库,叶子节点可能是单个数据。可以通过这个树对来进行不同层级的加锁达到不同的并发控制的目的。比如我想对整个数据库加锁,那么在这个树中root节点就会被锁住,但是这样是不够的,因为其子节点还没有被锁住,所以还需要遍历整个tree并将所有节点锁住,而且还要检查是否存在锁冲突。更高效的办法就是使用意向锁,如果对一个节点加意向锁,则说明该节点的下层节点正在被加锁,对任何节点解锁必须先对其上层节点添加意向锁

A more efficient way to gain this knowledge is to introduce a new class of lock modes, called intention lock modes. If a node is locked in an intention mode, explicit locking is done at a lower level of the tree (that is, at a finer granularity). Intention locks are put on all the ancestors of a node before that node is locked explicitly. Thus, a transaction does not need to search the entire tree to determine whether it can lock a node successfully.