I agree with the above mentioned.
The only concern I have is that we compare little wrong things.
For read we should compare
 (job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp,
year,  issue_flag  ) VS  (job, nlp, year, sequence, Scan_ID, issue_flag)
OR  (job, nlp, year INCLUDE(sequence, Scan_ID, issue_flag) )
Because our proposed index for reading should be closer to a combination of
those 3 and we have current solutions like index on all or with Include
statement.
We should try to find a gap between these three cases.
For DML queries
 (job, nlp, year, sequence, Scan_ID, issue_flag) OR  (job, nlp, year
INCLUDE(sequence, Scan_ID, issue_flag) ) VS  (job, nlp, year, sequence) AND
(job, nlp, year, Scan_ID) and (job, nlp, year,  issue_flag  )
Because again the proposed index should be just one and cover all 3
separate ones.

If you agree with these cases I will try to find a bigger time frame to
compare these two cases deeper.
The issue is not high prio but I strongly believe it can help and can be
nice feature for even more complicated cases.

Best regards.




вс, 31 окт. 2021 г. в 21:33, Tomas Vondra <tomas.von...@enterprisedb.com>:

>
>
> On 10/31/21 16:48, Pavel Borisov wrote:
> >        4 columns: 106 ms
> >        6 columns: 109 ms
> >
> >     So there's like 3% difference between the two cases, and even that
> >     might
> >     be just noise. This is consistent with the two indexes being about
> the
> >     same size.
> >
> > I also don't think we can get great speedup in the mentioned case, so it
> > is not urgently needed of course. My point is that it is just nice to
> > have a multicolumn index constructed on stacked trees constructed on
> > separate columns, not on the index tuples as a whole thing.
>
> Well, I'd say "nice to have" features are pointless unless they actually
> give tangible benefits (like speedup) to users. I'd bet no one is going
> to implement and maintain something unless it has such benefit, because
> they have to weight it against other beneficial features.
>
> Maybe there are use cases where this would be beneficial, but so far we
> haven't seen one. Usually it's the OP who presents such a case, and a
> plausible way to improve it - but it seems this thread presents a
> solution and now we're looking for an issue it might solve.
>
> > At least there is a benefit of sparing shared memory if we don't need
> > to cache index tuples of several similar indexes, instead caching one
> > "compound index". So if someone wants to propose this thing I'd
> > support it provided problems with concurrency, which were mentioned
> > by Peter are solved.
> >
>
> The problem with this it assumes the new index would use (significantly)
> less space than three separate indexes. I find that rather unlikely, but
> maybe there is a smart way to achieve that (certainly not in detail).
>
> I don't want to sound overly pessimistic and if you have an idea how to
> do this, I'd like to hear it. But it seems pretty tricky, particularly
> if we assume the suffix columns are more variable (which limits the
> "compression" ratio etc.).
>
> > These problems could be appear easy though, as we have index tuples
> > constructed in a similar way as heap tuples. Maybe it could be easier if
> > we had another heap am, which stored separate attributes (if so it could
> > be useful as a better JSON storage method than we have today).
> >
>
> IMO this just moved the goalposts somewhere outside the solar system.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Reply via email to