On Wed, Sep 25, 2019 at 2:33 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > v27-0001-Index-skip-scan.patch
Some random thoughts on this: * Is _bt_scankey_within_page() really doing the right thing within empty pages? It looks like you're accidentally using the high key when the leaf page is empty with forward scans (assuming that the leaf page isn't rightmost). You'll need to think about empty pages for both forward and backward direction scans there. Actually, using the high key in some cases may be desirable, once the details are worked out -- the high key is actually very helpful with low cardinality indexes. If you populate an index with retail insertions (i.e. don't just do a CREATE INDEX after the table is populated), and use low cardinality data in the indexed columns then you'll see this effect. You can have a few hundred index entries for each distinct value, and the page split logic added to Postgres 12 (by commit fab25024) will still manage to "trap" each set of duplicates on their own distinct leaf page. Leaf pages will have a high key that looks like the values that appear on the page to the right. The goal is for forward index scans to access the minimum number of leaf pages, especially with low cardinality data and with multi-column indexes. (See also: commit 29b64d1d) A good way to see this for yourself is to get the Wisconsin Benchmark tables (the tenk1 table and related tables from the regression tests) populated using retail insertions. "CREATE TABLE __tenk1(like tenk1 including indexes); INSERT INTO __tenk1 SELECT * FROM tenk1;" is how I like to set this up. Then you can see that we only access one leaf page easily by forcing bitmap scans (i.e. "set enable* ..."), and using "EXPLAIN (analyze, buffers) SELECT ... FROM __tenk1 WHERE ...", where the SELECT query is a simple point lookup query (bitmap scans happen to instrument the index buffer accesses in a way that makes it absolutely clear how many index page buffers were touched). IIRC the existing tenk1 indexes have no more than a few hundred duplicates for each distinct value in all cases, so only one leaf page needs to be accessed by simple "key = val" queries in all cases. (I imagine that the "four" index you add in the regression test would generally need to visit more than one leaf page for simple point lookup queries, but in any case the high key is a useful way of detecting a "break" in the values when indexing low cardinality data -- these breaks are generally "aligned" to leaf page boundaries.) I also like to visualize the keyspace of indexes when poking around at that stuff, generally by using some of the queries that you can find on the Wiki [1]. * The extra scankeys that you are using in most of the new nbtsearch.c code is an insertion scankey -- not a search style scankey. I think that you should try to be a bit clearer on that distinction in comments. It is already confusing now, but at least there is only zero or one insertion scankeys per scan (for the initial positioning). * There are two _bt_skip() prototypes in nbtree.h (actually, there is a btskip() and a _bt_skip()). I understand that the former is a public wrapper of the latter, but I find the naming a little bit confusing. Maybe rename _bt_skip() to something that is a little bit more suggestive of its purpose. * Suggest running pgindent on the patch. [1] https://wiki.postgresql.org/wiki/Index_Maintenance#Summarize_keyspace_of_a_B-Tree_index -- Peter Geoghegan