On Wed, Sep 18, 2024 at 10:00 PM Tatsuo Ishii <is...@postgresql.org> wrote: > > Attached are the v22 patches. Just rebased.
Thanks! With some bigger partitions, I hit an `ERROR: wrong pos: 1024`. A test that reproduces it is attached. While playing with the feature, I've been trying to identify runs of matched rows by eye. But it's pretty difficult -- the best I can do is manually count rows using a `COUNT(*) OVER ...`. So I'd like to suggest that MEASURES be part of the eventual v1 feature, if there's no other way to determine whether a row was skipped by a previous match. (That was less obvious to me before the fix in v17.) -- I've been working on an implementation [1] of SQL/RPR's "parenthesized language" and preferment order. (These are defined in SQL/Foundation 2023, section 9.41.) The tool gives you a way to figure out, for a given pattern, what matches are supposed to be attempted and in what order: $ ./src/test/modules/rpr/rpr_prefer "a b? a" ( ( a ( b ) ) a ) ( ( a ( ) ) a ) Many simple patterns result in an infinite set of possible matches. So if you use an unbounded quantifiers, you have to also use --max-rows to limit the size of the hypothetical window frame: $ ./src/test/modules/rpr/rpr_prefer --max-rows 2 "^ PERMUTE(a*, b+)? $" ( ( ^ ( ( ( ( ( ( a ) ( b ) ) ) - ) ) ) ) $ ) ( ( ^ ( ( ( ( ( ( ) ( b b ) ) ) - ) ) ) ) $ ) ( ( ^ ( ( ( ( ( ( ) ( b ) ) ) - ) ) ) ) $ ) ( ( ^ ( ( ( - ( ( ( b b ) ( ) ) ) ) ) ) ) $ ) ( ( ^ ( ( ( - ( ( ( b ) ( a ) ) ) ) ) ) ) $ ) ( ( ^ ( ( ( - ( ( ( b ) ( ) ) ) ) ) ) ) $ ) ( ( ^ ( ) ) $ ) I've found this useful to check my personal understanding of the spec and the match behavior, but it could also potentially be used to generate test cases, or to help users debug their own patterns. For example, a pattern that has a bunch of duplicate sequences in its PL is probably not very well optimized: $ ./src/test/modules/rpr/rpr_prefer --max-rows 4 "a+ a+" ( ( a a a ) ( a ) ) ( ( a a ) ( a a ) ) ( ( a a ) ( a ) ) ( ( a ) ( a a a ) ) ( ( a ) ( a a ) ) ( ( a ) ( a ) ) And patterns with catastrophic backtracking behavior tend to show a "sawtooth" pattern in the output, with a huge number of potential matches being generated relative to the number of rows in the frame. My implementation is really messy -- it leaks memory like a sieve, and I cannibalized the parser from ECPG, which just ended up as an exercise in teaching myself flex/bison. But if there's interest in having this kind of tool in the tree, I can work on making it reviewable. Either way, I should be able to use it to double-check more complicated test cases. A while back [2], you were wondering whether our Bison implementation would be able to parse the PATTERN grammar directly. I think this tool proves that the answer is "yes", but PERMUTE in particular causes a shift/reduce conflict. To fix it, I applied the same precedence workaround that we use for CUBE and ROLLUP. Thanks again! --Jacob [1] https://github.com/jchampio/postgres/tree/dev/rpr [2] https://www.postgresql.org/message-id/20230721.151648.412762379013769790.t-ishii%40sranhm.sra.co.jp
commit 819c1abba21c9b59cceedd55962c3fdcb982aad5 Author: Jacob Champion <jacob.champ...@enterprisedb.com> Date: Thu Sep 26 14:12:38 2024 -0700 RPR: test larger window partitions The second test case fails with ERROR: wrong pos: 1024 diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out index 0789e09435..dcfd67f143 100644 --- a/src/test/regress/expected/rpr.out +++ b/src/test/regress/expected/rpr.out @@ -834,3 +834,40 @@ ERROR: SEEK is not supported LINE 8: SEEK ^ HINT: Use INITIAL. +-- Smoke test for larger partitions. +WITH s AS ( + SELECT v, count(*) OVER w AS c + FROM (SELECT generate_series(1, 5000) v) + WINDOW w AS ( + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN ( r+ ) + DEFINE r AS TRUE + ) +) +-- Should be exactly one long match across all rows. +SELECT * FROM s WHERE c > 0; + v | c +---+------ + 1 | 5000 +(1 row) + +WITH s AS ( + SELECT v, count(*) OVER w AS c + FROM (SELECT generate_series(1, 5000) v) + WINDOW w AS ( + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN ( r ) + DEFINE r AS TRUE + ) +) +-- Every row should be its own match. +SELECT count(*) FROM s WHERE c > 0; + count +------- + 5000 +(1 row) + diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql index 302e2b86a5..6ff61e6d60 100644 --- a/src/test/regress/sql/rpr.sql +++ b/src/test/regress/sql/rpr.sql @@ -403,3 +403,32 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER UP AS price > PREV(price), DOWN AS price < PREV(price) ); + +-- Smoke test for larger partitions. +WITH s AS ( + SELECT v, count(*) OVER w AS c + FROM (SELECT generate_series(1, 5000) v) + WINDOW w AS ( + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN ( r+ ) + DEFINE r AS TRUE + ) +) +-- Should be exactly one long match across all rows. +SELECT * FROM s WHERE c > 0; + +WITH s AS ( + SELECT v, count(*) OVER w AS c + FROM (SELECT generate_series(1, 5000) v) + WINDOW w AS ( + ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING + AFTER MATCH SKIP PAST LAST ROW + INITIAL + PATTERN ( r ) + DEFINE r AS TRUE + ) +) +-- Every row should be its own match. +SELECT count(*) FROM s WHERE c > 0;