On Fri, May 24, 2019 at 10:25 AM Jeremy Finzel <finz...@gmail.com> wrote:
> I have been hoping for clearer direction from the community about > specifically btree_gin indexes for low cardinality columns (as well as low > cardinality multi-column indexes). In general there is very little > discussion about this both online and in the docs. Rather, the emphasis > for GIN indexes discussed is always on full text search of JSON indexing, > not btree_gin indexes. > > However, I have never been happy with the options open to me for indexing > low cardinality columns and was hoping this could be a big win. Often I > use partial indexes as a solution, but I really want to know how many use > cases btree_gin could solve better than either a normal btree or a partial > index. > > Here are my main questions: > > 1. > > "The docs say regarding *index only scans*: The index type must support > index-only scans. B-tree indexes always do. GiST and SP-GiST indexes > support index-only scans for some operator classes but not others. Other > index types have no support. The underlying requirement is that the index > must physically store, or else be able to reconstruct, the original data > value for each index entry. As a counterexample, GIN indexes cannot support > index-only scans because each index entry typically holds only part of the > original data value." > > This is confusing to say "B-tree indexes always do" and "GIN indexes > cannot support index-only scans", when we have a btree_gin index type. > Explanation please ??? > > Is it true that for a btree_gin index on a regular column, "each index > entry typically holds only part of the original data value"? Do these > still not support index only scans? Could they? I can't see why they > shouldn't be able to for a single indexed non-expression field? > > 2. > > Lack of index only scans is definitely a downside. However, I see > basically identical performance, but way less memory and space usage, for > gin indexes. In terms of read-only performance, if index only scans are > not a factor, why not always recommend btree_gin indexes instead of regular > btree for low cardinality fields, which will yield similar performance but > use far, far less space and resources? > > 3. > > This relates to 2. I understand the write overhead can be much greater > for GIN indexes, which is why the fastupdate feature exists. But again, in > those discussions in the docs, it appears to me they are emphasizing that > penalty more for full text or json GIN indexes. Does the same overhead > apply to a btree_gin index on a regular column with no expressions? > > Those are my questions. > > FYI, I can see an earlier thread about this topic ( > https://www.postgresql.org/message-id/flat/E260AEE7-95B3-4142-9A4B-A4416F1701F0%40aol.com#5def5ce1864298a3c0ba2d4881a660c2), > but a few questions were left unanswered and unclear there. > > I first started seriously considering using btree_gin indexes for low > cardinality columns, for example some text field with 30 unique values > across 100 million rows, after reading a summary of index types from > Bruce's article: https://momjian.us/main/writings/pgsql/indexing.pdf > > This article was also helpful but yet again I wonder it's broader > viability: > http://hlinnaka.iki.fi/2014/03/28/gin-as-a-substitute-for-bitmap-indexes/ > > > Thank you! > Jeremy > Could anyone shed any light on these questions? I appreciate it. Thanks, Jeremy