On Thu, Jul 4, 2019 at 5:06 AM Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote: > i - number of distinct values in the index. > So i=1 means that all rows have the same key, > and i=10000000 means that all keys are different. > > i / old size (MB) / new size (MB) > 1 215 88 > 1000 215 90 > 100000 215 71 > 10000000 214 214 > > For more, see the attached diagram with test results.
I tried this on my own "UK land registry" test data [1], which was originally used for the v12 nbtree work. My test case has a low cardinality, multi-column text index. I chose this test case because it was convenient for me. On v12/master, the index is 1100MB. Whereas with your patch, it ends up being 196MB -- over 5.5x smaller! I also tried it out with the "Mouse genome informatics" database [2], which was already improved considerably by the v12 work on duplicates. This is helped tremendously by your patch. It's not quite 5.5x across the board, of course. There are 187 indexes (on 28 tables), and almost all of the indexes are smaller. Actually, *most* of the indexes are *much* smaller. Very often 50% smaller. I don't have time to do an in-depth analysis of these results today, but clearly the patch is very effective on real world data. I think that we tend to underestimate just how common indexes with a huge number of duplicates are. [1] https://https:/postgr.es/m/cah2-wzn_nayk4pr0hrwo0stwhmxjp5qyu+x8vppt030xpqr...@mail.gmail.com [2] http://www.informatics.jax.org/software.shtml -- Peter Geoghegan