Hi All! 1. Generally the difference of my patch in comparison to Tom's patch 0001 is that I tried to move previous logic of GIN's own TS_execute_ternary() to the general logic of TS_execute_recurse and in case we have index without positions to avoid diving into phrase operator replacing (only in this case) in by an AND operator. The reason for this I suppose is speed and I've done testing of some corner cases like phrase operator with big number of OR comparisons inside it.
----------------------------- BEFORE ANY PATCH: Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545 width=1234) (actual time=652.294..2719.961 rows=4904 loops=1) Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <-> ''start'''::tsquery) Rows Removed by Index Recheck: 108191 Heap Blocks: exact=73789 -> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545 width=0) (actual time=636.883..636.883 rows=113095 loops=1) Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <-> ''start'''::tsquery) Planning Time: 3.016 ms Execution Time: *2721.002 ms* ------------------------------- AFTER TOM's PATCH (0001) Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545 width=1234) (actual time=916.640..2960.571 rows=4904 loops=1) Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <-> ''start'''::tsquery) Rows Removed by Index Recheck: 108191 Heap Blocks: exact=73789 -> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545 width=0) (actual time=900.472..900.472 rows=113095 loops=1) Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <-> ''start'''::tsquery) Planning Time: 2.688 ms Execution Time: *2961.704 ms* ---------------------------- AFTER MY PATCH (gin-gist-weight-patch-v3) Bitmap Heap Scan on pglist (cost=1715.72..160233.31 rows=114545 width=1234) (actual time=616.982..2710.571 rows=4904 loops=1) Recheck Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <-> ''start'''::tsquery) Rows Removed by Index Recheck: 108191 Heap Blocks: exact=73789 -> Bitmap Index Scan on pglist_fts_idx (cost=0.00..1687.09 rows=114545 width=0) (actual time=601.586..601.586 rows=113095 loops=1) Index Cond: (fts @@ '( ''worth'' | ''good'' | ''result'' | ''index'' | ''anoth'' | ''know'' | ''like'' | ''tool'' | ''job'' | ''think'' | ''slow'' | ''articl'' | ''knowledg'' | ''join'' | ''need'' | ''experi'' | ''understand'' | ''free'' | ''say'' | ''comment'' | ''littl'' | ''move'' | ''function'' | ''new'' | ''never'' | ''general'' | ''get'' | ''java'' | ''postgresql'' | ''notic'' | ''recent'' | ''serious'' ) <-> ''start'''::tsquery) Planning Time: 3.115 ms Execution Time: *2711.533 ms* I've done the test several times and seems that difference is real effect, though not very big (around 7%). So maybe there is some reason to save PHRASE_AS_AND behavior for GIN-GIST indexes despite migration from GIN's own TS_execute_ternary() to general TS_execute_recurse. 2. As for shifting from bool to ternary callback I am not quite sure whether it can be useful to save bool callbacks via bool-ternary wrapper. We can include this for compatibility with old callers and can drop. Any ideas? 3. As for patch 0002 which removes TS_EXEC_CALC_NOT flag I'd like to note that indexes which are written as extensions like RUM index ( https://github.com/postgrespro/rum) use this flag as default behavior of TS_execute was NOT doing TS_EXEC_CALC_NOT. If we's like to change this default it can break the callers. At least I propose to leave TS_EXEC_CALC_NOT definition in ts_utils.h but in general I'd like to save default behaviour of TS_execute and not apply patch 0002. Maybe it is only worth to leave notice in a comments in code that TS_EXEC_CALC_NOT left for compatibilty reasons etc. I'd appreciate any ideas and review of all aforementioned patches. Best regards, Pavel Borisov. ср, 20 мая 2020 г. в 18:04, Pavel Borisov <pashkin.e...@gmail.com>: > 1. Really if it's possible to avoid bool callbacks at all and shift > everywhere to ternary it makes code quite beautiful and even. But I also > think we are still not obliged to drop support for (legacy or otherwise) > bool callbacks and also consistent functions form some old extensions (I > don't know for sur, whether they exist) which expect old style bool result > from TS_execute. > > In my patch I used ternary logic from TS_execute_recurse on, which can be > called by "new" ternary consistent callers and leave bool TS_execute, which > works as earlier. It also makes callback function wrapping to allow some > hypothetical old extension enjoy binary behavior. I am not sure it is very > much necessary but as it is not hard I'd propose somewhat leave this > feature by combining patches. > > 2. Overall I see two reasons to consider when choosing ternary/boolean > calls in TS_execute: speed and compatibility. I'd like to make some > performance tests for different types of queries (plain without weights, > and containing weights in some or all operands) to evaluate first of these > effects in both cases. > > Then we'll have reasons to commit a certain type of patch or maybe some > combination of them. > > Best regards, > Pavel Borisov. > > вс, 17 мая 2020 г. в 23:53, Pavel Borisov <pashkin.e...@gmail.com>: > >> Hi, all! >> Below is my variant how to patch Gin-Gist weights issue: >> 1. First of all I propose to shift from previously Gin's own TS_execute >> variant and leave only two: TS_execute with bool result and bool type >> callback and ternary TS_execute_recurse with ternary callback. I suppose >> all legacy consistent callers can still use bool via provided wrapper. >> 2. I integrated logic for indexes which do not support weights and >> positions inside (which gives MAYBE in certain cases on negation) inside >> previous TS_execute_recurse function called with additional flag for this >> class of indexes. >> 3. Check function for GIST and GIN now gives ternary result and is called >> with ternary type callback. I think in future nothing prevents smoothly >> shifting callback functions, check functions and even TS_execute result to >> ternary. >> >> So I also send my variant patch for review and discussion. >> >> Regards, >> Pavel Borisov >> >> вс, 17 мая 2020 г. в 03:14, Tom Lane <t...@sss.pgh.pa.us>: >> >>> I wrote: >>> > I think the root of the problem is that if we have a query using >>> > weights, and we are testing tsvector data that lacks positions/weights, >>> > we can never say there's definitely a match. I don't see any decently >>> > clean way to fix this without redefining the TSExecuteCallback API >>> > to return a tri-state YES/NO/MAYBE result, because really we need to >>> > decide that it's MAYBE at the level of processing the QI_VAL node, >>> > not later on. I'd tried to avoid that in e81e5741a, but maybe we >>> > should just bite that bullet, and not worry about whether there's >>> > any third-party code providing its own TSExecuteCallback routine. >>> > codesearch.debian.net suggests that there are no external callers >>> > of TS_execute, so maybe we can get away with that. >>> >>> 0001 attached is a proposed patch that does it that way. Given the >>> API break involved, it's not quite clear what to do with this. >>> ISTM we have three options: >>> >>> 1. Ignore the API issue and back-patch. Given the apparent lack of >>> external callers of TS_execute, maybe we can get away with that; >>> but I wonder if we'd get pushback from distros that have automatic >>> ABI-break detectors in place. >>> >>> 2. Assume we can't backpatch, but it's still OK to slip this into >>> v13. (This option clearly has a limited shelf life, but I think >>> we could get away with it until late beta.) >>> >>> 3. Assume we'd better hold this till v14. >>> >>> I find #3 unduly conservative, seeing that this is clearly a bug >>> fix, but on the other hand #1 is a bit scary. Aside from the API >>> issue, it's not impossible that this has introduced some corner >>> case behavioral changes that we'd consider to be new bugs rather >>> than bug fixes. >>> >>> Anyway, some notes for reviewers: >>> >>> * The core idea of the patch is to make the TS_execute callbacks >>> have ternary results and to insist they return TS_MAYBE in any >>> case where the correct result is uncertain. >>> >>> * That fixes the bug at hand, and it also allows getting rid of >>> some kluges at higher levels. The GIN code no longer needs its >>> own TS_execute_ternary implementation, and the GIST code no longer >>> needs to suppose that it can't trust NOT results. >>> >>> * I put some effort into not leaking memory within tsvector_op.c's >>> checkclass_str and checkcondition_str. (The final output array >>> can still get leaked, I believe. Fixing that seems like material >>> for a different patch, and it might not be worth any trouble.) >>> >>> * The new test cases in tstypes.sql are to verify that we didn't >>> change behavior of the basic tsvector @@ tsquery code. There wasn't >>> any coverage of these cases before, and the logic for checkclass_str >>> without position info had to be tweaked to preserve this behavior. >>> >>> * The new cases in tsearch verify that the GIN and GIST code gives >>> the same results as the basic operator. >>> >>> Now, as for the 0002 patch attached: after 0001, the only TS_execute() >>> callers that are not specifying TS_EXEC_CALC_NOT are hlCover(), >>> which I'd already complained is probably a bug, and the first of >>> the two calls in tsrank.c's Cover(). It seems difficult to me to >>> argue that it's not a bug for Cover() to process NOT in one call >>> but not the other --- moreover, if there was any argument for that >>> once upon a time, it probably falls to the ground now that (a) we >>> have a less buggy implementation of NOT and (b) the presence of >>> phrase queries significantly raises the importance of not taking >>> short-cuts. Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT >>> flag and has TS_execute compute NOT expressions accurately all the >>> time. >>> >>> As it stands, 0002 changes no regression test results, which I'm >>> afraid speaks more to our crummy test coverage than anything else; >>> tests that exercise those two functions with NOT-using queries >>> would easily show that there is a difference. >>> >>> Even if we decide to back-patch 0001, I would not suggest >>> back-patching 0002, as it's more nearly a definitional change >>> than a bug fix. But I think it's a good idea anyway. >>> >>> I'll stick this in the queue for the July commitfest, in case >>> anybody wants to review it. >>> >>> regards, tom lane >>> >>>