On Fri, Jul 7, 2017 at 12:45 AM Alik Khilazhev <a.khilaz...@postgrespro.ru> wrote: > PostgreSQL shows very bad results in YCSB Workload A (50% SELECT and 50% > UPDATE of random row by PK) on benchmarking with big number of clients using > Zipfian distribution. MySQL also has decline but it is not significant as it > is in PostgreSQL. MongoDB does not have decline at all. And if pgbench would > have Zipfian distribution random number generator, everyone will be able to > make research on this topic without using YCSB.
I've been thinking about this again, in light of my recent work to improve the B-Tree code by making all entries have fully unique keys, adding suffix truncation, and making better choices about split points [1]. Somebody posted a patch that fixed the issue by making the heavyweight lock manager lock a value rather than a heap TID within the last year. I can't find that thread right now -- perhaps someone can point it out now. I suspect that my patch will have a similar effect to this other patch I'm thinking of with the Zipfian workload, though it will do so without touching anything outside of the B-Tree code, and without changing the user-visible semantics of locking. Can we possibly find a way to rerun this Zipfian experiment/test case? I bet Alexander's recent B-Tree patch will also help. Here is why I suspect my patch will help with the Zipfian workload, particularly on a multi-socket machine: The logic for following downlinks in Postgres is that downlinks are a lower bound, but we still cannot follow them when the key is equal to our insertion scankey -- we must go left, even when we have all keys filled out. This means that we sometimes end up holding an exclusive buffer lock on "the first leaf page the value could be on", even though that page doesn't have any such values (they're all in later pages, to the left). So we accidentally lock-out insertions of the value 1 when we insert the value 2, and of the value 2 when we insert the value 3, and of the value 3 when we insert the value 4, and so on down the line. This is the worse possible thing for the Zipfian test case, I think, since most of the contention is artificial. I think that my patch may ameliorate the problem, because: * We make the tie-breaker heap TID attribute have DESC sort order, so generally speaking contention starts on the leftmost page among leaf pages full of duplicates -- for reads, and for writes. * The patch works very hard to make sure that large groups of duplicates are not spread across pages. There would be no mixing of leaf pages containing the value 1 and the value 2 with the Zipfian test case, for example. Old duplicates would be packed into full pages. * We can invent a fake lower-than-any-real-value heap TID in the insertion scankey in certain cases that don't have one. This way, the scan key as a whole is *not* equal to pivot tuples that are at least equal on non-heap-TID attributes, so we *can* follow a downlink that is "equal" to our scankey, provided it has a truncated-away heap TID attribute value. In short, the artificial "false sharing" that we see in the B-Tree for the Zipfian test case may go away. There will be *no* contention between an inserter (or really UPDATE-er) that inserts the value 2 with concurrent inserters of the value 1. All of the ways in which that was likely to happen are each fixed by the patch. There is still buffer lock contention on heap pages, and the contention of waiting for a row lock. Maybe the busy waiting will automatically be fixed, though, since you have one point of contention for each of the really hot values at the far left of the leaf level. [1] https://commitfest.postgresql.org/20/1787/ -- Peter Geoghegan