I agree with the above mentioned. The only concern I have is that we compare little wrong things. For read we should compare (job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp, year, issue_flag ) VS (job, nlp, year, sequence, Scan_ID, issue_flag) OR (job, nlp, year INCLUDE(sequence, Scan_ID, issue_flag) ) Because our proposed index for reading should be closer to a combination of those 3 and we have current solutions like index on all or with Include statement. We should try to find a gap between these three cases. For DML queries (job, nlp, year, sequence, Scan_ID, issue_flag) OR (job, nlp, year INCLUDE(sequence, Scan_ID, issue_flag) ) VS (job, nlp, year, sequence) AND (job, nlp, year, Scan_ID) and (job, nlp, year, issue_flag ) Because again the proposed index should be just one and cover all 3 separate ones.
If you agree with these cases I will try to find a bigger time frame to compare these two cases deeper. The issue is not high prio but I strongly believe it can help and can be nice feature for even more complicated cases. Best regards. вс, 31 окт. 2021 г. в 21:33, Tomas Vondra <tomas.von...@enterprisedb.com>: > > > On 10/31/21 16:48, Pavel Borisov wrote: > > 4 columns: 106 ms > > 6 columns: 109 ms > > > > So there's like 3% difference between the two cases, and even that > > might > > be just noise. This is consistent with the two indexes being about > the > > same size. > > > > I also don't think we can get great speedup in the mentioned case, so it > > is not urgently needed of course. My point is that it is just nice to > > have a multicolumn index constructed on stacked trees constructed on > > separate columns, not on the index tuples as a whole thing. > > Well, I'd say "nice to have" features are pointless unless they actually > give tangible benefits (like speedup) to users. I'd bet no one is going > to implement and maintain something unless it has such benefit, because > they have to weight it against other beneficial features. > > Maybe there are use cases where this would be beneficial, but so far we > haven't seen one. Usually it's the OP who presents such a case, and a > plausible way to improve it - but it seems this thread presents a > solution and now we're looking for an issue it might solve. > > > At least there is a benefit of sparing shared memory if we don't need > > to cache index tuples of several similar indexes, instead caching one > > "compound index". So if someone wants to propose this thing I'd > > support it provided problems with concurrency, which were mentioned > > by Peter are solved. > > > > The problem with this it assumes the new index would use (significantly) > less space than three separate indexes. I find that rather unlikely, but > maybe there is a smart way to achieve that (certainly not in detail). > > I don't want to sound overly pessimistic and if you have an idea how to > do this, I'd like to hear it. But it seems pretty tricky, particularly > if we assume the suffix columns are more variable (which limits the > "compression" ratio etc.). > > > These problems could be appear easy though, as we have index tuples > > constructed in a similar way as heap tuples. Maybe it could be easier if > > we had another heap am, which stored separate attributes (if so it could > > be useful as a better JSON storage method than we have today). > > > > IMO this just moved the goalposts somewhere outside the solar system. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >