On 10/26/21 21:39, Pavel Borisov wrote:
I've already answered OP but it seems in the wrong thread, so I copy it here:

I think now in many cases you can effectively use covering index to have fast index-only scans without index duplication. It will help if you don't have great selectivity on the last column (most probably you don't). E.g.:

CREATE INDEX ON table_name (`job`,`nlp`,`year`) INCLUDE (`scan_id`, `issue_flag`, `sequence`)

But I consider the feature can be useful when there is a very little selectivity in the first index columns. I.e. if (job`,`nlp`,`year') has many repeats and the most selection is done in the last column. I am not sure how often this can arise but in general, I see it as a useful b-tree generalization.

I'm not sure how it should be done. In my view, we need to add an ordered posting tree as a leaf element if b-tree and now we have index storage only for tuples. The change of on-disk format was previously not easy in nbtree and if we consider the change, we need an extra bit to mark posting trees among index tuples. Maybe it could be done in a way similar to deduplicated tuples if some bits in the tuple header are still could be freed.

Thoughts?

    If I get what you propose, you want to have a "top" tree for (job, nlp,
    year), which "splits" the data set into subsets of ~5000-7000 rows. And
    then for each subset you want a separate "small" trees on each of the
    other columns, so in this case three trees.

    Well, the problem with this is pretty obvious - each of the small trees
    requires separate copies of the leaf pages. And remember, in a btree the
    internal pages are usually less than 1% of the index, so this pretty
    much triples the size of the index. And if you insert a row into the
    index, it has to insert the item pointer into each of the small trees,
    likely requiring a separate I/O for each.

    So I'd bet this is not any different from just having three separate
    indexes - it doesn't save space, doesn't save I/O, nothing.

Tomas, I really think we should not try realizing this feature using existing index pages that contain only tuples. You are right, it will cause large overhead. If instead we decide and succeed in creating "posting trees" as a new on-disk page entry type we can have an index with space comparable to the abovementioned covering index but with sorting of values in these trees (i.e. all values are sorted, and "key" ones).


Well, there was no explanation about how it could/should be implemented, and maybe there is some elaborate way to handle the "posting trees" that I can't quite think of (at least not in the btree context).

I'm still rather skeptical about it - for such feature to be useful the prefix columns must not be very selective, i.e. the posting trees are expected to be fairly large (e.g. 5-7k rows). It pretty much has to to require multiple (many) index pages, in order for the "larger" btree index to be slower. And at that point I'd expect the extra overhead to be worse than simply defining multiple simple indexes.

A simple experiment would be to measure timing for queries with a condition on "sequence" using two indexes:

1) (job, nlp, year, sequence)
2) (job, nlp, year, scan_id, issue_flag, sequence)

The (1) index is "optimal" i.e. there's unlikely to be a better index for this query, at least no tree-like. (2) is something like the "worst" case index that we can use for this query.

For the new feature to be useful, two things would need to be true:

* query with (2) is much slower than (1)
* the new index would need to be close to (1)

Obviously, if the new index is slower than (2), it's mostly useless right off the bat. And it probably can't be faster than (1) in practice, as it still is basically a btree index (at least the top half).

So I'd expect the performance to be somewhere between (1) and (2), but if (2) is very close to (1) - which I'd bet it is - then the potential benefit is also pretty small.

Perhaps I'm entirely wrong and there's a new type of index, better suited for cases similar to this. The "posting tree" reference actually made me thinking that maybe btree_gin might be applicable here?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to