While talking to various people during pgCon, I was reminded that the nbtree README does a poor job of explaining the actual practical advantages of L&Y from a high level. The "move right" trick is initially mentioned only as an adjunct to a discussion of the special-case handling of root page splits, even though it's of huge significance. We also say this within nbtinsert.c:
/* * If the page was split between the time that we surrendered our read * lock and acquired our write lock, then this page may no longer be the * right place for the key we want to insert. In this case, we need to * move right in the tree. See Lehman and Yao for an excruciatingly * precise description. */ (Why we need to say something about _bt_moveright() within _bt_doinsert() that is equally true of both _bt_moveright() callers isn't clear). I think that this isn't enough. Attached patch proposes to add a small paragraph at the top of the nbtree README, to clarify the advantages of L&Y from a high level. I don't think it's appropriate to state the advantages of an algorithm in a README file generally, but in this instance it makes it easier to understand the algorithm. -- Peter Geoghegan
*** a/src/backend/access/nbtree/README --- b/src/backend/access/nbtree/README *************** use a simplified version of the deletion *** 11,16 **** --- 11,36 ---- Shasha (V. Lanin and D. Shasha, A Symmetric Concurrent B-Tree Algorithm, Proceedings of 1986 Fall Joint Computer Conference, pp 380-389). + Lehman and Yao don't require read locks, but assume that in-memory + copies of tree pages are unshared. Postgres shares in-memory buffers + among backends. As a result, we do page-level read locking on btree + pages in order to guarantee that no record is modified while we are + examining it. This reduces concurrency but guarantees correct + behavior. An advantage is that when trading in a read lock for a + write lock, we need not re-read the page after getting the write lock. + Since we're also holding a pin on the shared buffer containing the + page, we know that buffer still contains the page and is up-to-date. + + Even with these read locks, Lehman and Yao's approach obviates the + need of earlier schemes to hold multiple read locks concurrently when + descending the tree as part of servicing index scans (pessimistic lock + coupling). The addition of right-links at all levels, as well as the + addition of a page "high key" allows detection of, and dynamic + recovery from concurrent page splits (that is, splits between + unlocking an internal page, and subsequently locking its child page + during a descent). L&Y Trees are sometimes referred to as "B-Link + trees" in the literature. + The Lehman and Yao Algorithm and Insertions ------------------------------------------- *************** to be inserted has a choice whether or n *** 42,57 **** key could go on either page. (Currently, we try to find a page where there is room for the new key without a split.) - Lehman and Yao don't require read locks, but assume that in-memory - copies of tree pages are unshared. Postgres shares in-memory buffers - among backends. As a result, we do page-level read locking on btree - pages in order to guarantee that no record is modified while we are - examining it. This reduces concurrency but guarantees correct - behavior. An advantage is that when trading in a read lock for a - write lock, we need not re-read the page after getting the write lock. - Since we're also holding a pin on the shared buffer containing the - page, we know that buffer still contains the page and is up-to-date. - We support the notion of an ordered "scan" of an index as well as insertions, deletions, and simple lookups. A scan in the forward direction is no problem, we just use the right-sibling pointers that --- 62,67 ----
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers