Hi Ishii-san,
I'm sending a patch that refactors the NFA context absorption logic
for Row Pattern Recognition (RPR).
== Overview ==
This patch fixes a flawed NFA execution model where interleaved
match/advance
logic prevented correct context absorption. The new design reorganizes
execution into a clear three-phase cycle: "match → absorb → advance",
enabling proper state synchronization for absorption.
== Major Changes ==
1. NFA Execution Flow Redesign (nodeWindowAgg.c)
Previously, match and advance logic were interleaved in
nfa_step_single().
This interleaved approach prevented proper state synchronization needed
for absorption - states would scatter to different positions before
absorption could compare them. Now they are clearly separated:
- nfa_match(): Convergence phase. Evaluates all waiting states against
the current row. Checks VAR elements' DEFINE conditions and removes
failed states.
- nfa_absorb_contexts(): Absorbs redundant contexts. Performs absorption
checks at synchronized points after matching completes.
- nfa_advance(): Divergence phase. Expands epsilon transitions.
Handles ALT (alternation), END (group termination), and optional
elements.
Advantages of three-phase design:
- Correct absorption: All states evaluate the same row first, ensuring
synchronized positions for meaningful absorption comparison.
- Clear semantics: Each phase has single responsibility - convergence,
optimization, divergence - making the logic easier to understand.
- Predictable state positions: After match phase, all states are at
known positions, enabling reliable absorption decisions.
2. Two-flag Absorption Design (rpr.c)
At planning time, analyzes absorption eligibility and sets two flags:
- RPR_ELEM_ABSORBABLE: Absorption judgment point (WHERE to compare).
Set on the VAR element itself for simple unbounded VAR (A+),
or on END only for groups ((A B)+).
- RPR_ELEM_ABSORBABLE_BRANCH: Marks the absorbable region.
Set on ALL elements within the region. Used at runtime to track
state.isAbsorbable.
Example - Pattern "(A B)+ C":
Element 0 (A): ABSORBABLE_BRANCH
Element 1 (B): ABSORBABLE_BRANCH
Element 2 (END): ABSORBABLE | ABSORBABLE_BRANCH <- judgment point
Element 3 (C): (none)
Advantages of two-flag design:
- Correctness: Ensures absorption comparison happens only at
synchronized
points (e.g., END for groups), not at arbitrary positions.
- Efficiency: All analysis done at planning time; runtime uses simple
O(1) flag checks instead of pattern structure analysis.
- Simplicity: Runtime logic is just monotonic flag propagation -
once state leaves absorbable region, it stays non-absorbable.
3. Context-level Absorption Flags (execnodes.h)
Added two flags to RPRNFAContext structure:
- hasAbsorbableState: True if at least one state is absorbable.
Indicates whether this context can absorb other contexts.
- allStatesAbsorbable: True if ALL states are absorbable.
Indicates whether this context can be absorbed.
Absorption condition: Ctx1.hasAbsorbableState && Ctx2.allStatesAbsorbable
Advantages of asymmetric two-flag design:
- Fast pre-filter: O(1) flag check eliminates most context pairs before
expensive state-by-state comparison.
- Flexible absorption: Ctx1 can have extra non-absorbable states and
still absorb Ctx2, as long as Ctx1 covers all of Ctx2's absorbable
states.
- Optimized updates: hasAbsorbableState is monotonic (true→false only),
skipping recalculation once false. allStatesAbsorbable can recover
(false→true) when non-absorbable states die.
4. State-level isAbsorbable Flag (execnodes.h)
Added isAbsorbable field to RPRNFAState structure:
- Computed at creation: sourceAbsorbable && (elem->flags &
ABSORBABLE_BRANCH)
- Monotonic property: once false, stays false forever
- Cannot re-enter the absorbable region once left
5. Advance Function Separation
Split the monolithic processing into per-element-type functions:
- nfa_advance_alt(): Expands ALT branches. Processes all branches in
DFS order to preserve lexical priority.
- nfa_advance_end(): END group repetition logic. Decides loop/exit
based on count vs min/max constraints.
- nfa_advance_var(): VAR repetition/exit transitions.
- nfa_route_to_elem(): Routes to next element. If VAR, adds to waiting
states; otherwise, recursively expands epsilon transitions.
6. VAR→END Inline Transition
When a simple VAR (min=max=1) matches and the next element is END,
we transition to END immediately in the match phase. This is necessary
for correct absorption: states must be at END to be marked absorbable
before the absorption check occurs.
7. EXPLAIN Output Improvements (explain.c)
Changed average length statistics from integer to float with 1 decimal:
- ExplainPropertyInteger → ExplainPropertyFloat
- "%.0f" → "%.1f"
== Performance Impact ==
For patterns like "A+ B", context absorption achieves O(n²) → O(n)
complexity improvement.
Example: Input A A A A A B
- Row 0: Ctx1 at A (count=1)
- Row 1: Ctx1 at A (count=2), Ctx2 at A (count=1)
→ Ctx1.count >= Ctx2.count, so Ctx2 is absorbed
- Row 2: Ctx1 at A (count=3), Ctx3 at A (count=1)
→ Ctx3 is absorbed
- ...
Only one context is maintained throughout, achieving O(n).
== Algorithm Details ==
The absorption algorithm works as follows:
For each pair (older Ctx1, newer Ctx2):
1. Pre-check: Ctx1.hasAbsorbableState && Ctx2.allStatesAbsorbable
→ If false, skip (fast filter)
2. Coverage check: For each Ctx2 state with isAbsorbable=true,
find Ctx1 state with same elemIdx and count >= Ctx2.count
3. If all Ctx2 absorbable states are covered, absorb Ctx2
The asymmetric design (Ctx1 needs hasAbsorbable, Ctx2 needs allAbsorbable)
allows absorption even when Ctx1 has extra non-absorbable states.
== Key Design Decisions ==
- VAR→END transition in match phase: When a simple VAR (max=1) matches
and the next element is END, we transition immediately in the match
phase rather than waiting for advance. This ensures states are at END
(marked absorbable) before the absorption check.
- Optional VAR skip paths: When advance lands on a VAR with min=0,
we create both a waiting state AND a skip state (like ALT branches).
This ensures patterns like "A B? C" work correctly.
- END→END count increment: When transitioning from one END to another
END within advance, we must increment the outer END's count. This
handles nested groups like "((A|B)+)+" correctly.
- initialAdvance flag: The first advance after context creation must
skip FIN recording. Reaching FIN without evaluating any VAR would
create a zero-length match, which is invalid.
== Future Work ==
- Reluctant quantifiers: Currently rejected at parse time with error
"reluctant quantifiers are not yet supported". The semantics for mixed
patterns (e.g., A+? B+) and the optimal implementation approach need
further clarification.
- State pruning: May be required for reluctant quantifiers (enables early
match finalization) and context absorption (increases absorption
eligibility by removing competing states).
- No more large-scale refactoring is planned. Future work will focus on
quality verification through additional test cases, comprehensive test
coverage, and memory leak checks.
Please review.
Best regards,
Henson
From 1bd8eb3ee0faf11b23689143f251de2704d3327c Mon Sep 17 00:00:00 2001
From: Henson Choi <[email protected]>
Date: Mon, 19 Jan 2026 21:22:14 +0900
Subject: [PATCH] Row pattern recognition: Refactor NFA absorption logic
---
src/backend/commands/explain.c | 24 +-
src/backend/executor/nodeWindowAgg.c | 1473 ++++++++++++++---------
src/backend/optimizer/plan/createplan.c | 2 +-
src/backend/optimizer/plan/rpr.c | 263 +++-
src/backend/parser/gram.y | 5 +
src/include/nodes/execnodes.h | 21 +
src/include/optimizer/rpr.h | 14 +-
src/test/regress/expected/rpr.out | 60 +-
src/test/regress/sql/rpr.sql | 32 +-
9 files changed, 1247 insertions(+), 647 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ba114bd491c..3cbf2ec235b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3675,32 +3675,32 @@ show_windowagg_info(WindowAggState *winstate,
ExplainState *es)
{
ExplainPropertyInteger("NFA Match Length Min",
NULL, winstate->nfaMatchLen.min, es);
ExplainPropertyInteger("NFA Match Length Max",
NULL, winstate->nfaMatchLen.max, es);
- ExplainPropertyInteger("NFA Match Length Avg",
NULL,
-
(int64) ((winstate->nfaMatchLen.total + winstate->nfaMatchesSucceeded / 2) /
winstate->nfaMatchesSucceeded),
+ ExplainPropertyFloat("NFA Match Length Avg",
NULL,
+
(double) winstate->nfaMatchLen.total / winstate->nfaMatchesSucceeded, 1,
es);
}
if (winstate->nfaMatchesFailed > 0)
{
ExplainPropertyInteger("NFA Mismatch Length
Min", NULL, winstate->nfaFailLen.min, es);
ExplainPropertyInteger("NFA Mismatch Length
Max", NULL, winstate->nfaFailLen.max, es);
- ExplainPropertyInteger("NFA Mismatch Length
Avg", NULL,
-
(int64) ((winstate->nfaFailLen.total + winstate->nfaMatchesFailed / 2) /
winstate->nfaMatchesFailed),
+ ExplainPropertyFloat("NFA Mismatch Length Avg",
NULL,
+
(double) winstate->nfaFailLen.total / winstate->nfaMatchesFailed, 1,
es);
}
if (winstate->nfaContextsAbsorbed > 0)
{
ExplainPropertyInteger("NFA Absorbed Length
Min", NULL, winstate->nfaAbsorbedLen.min, es);
ExplainPropertyInteger("NFA Absorbed Length
Max", NULL, winstate->nfaAbsorbedLen.max, es);
- ExplainPropertyInteger("NFA Absorbed Length
Avg", NULL,
-
(int64) ((winstate->nfaAbsorbedLen.total + winstate->nfaContextsAbsorbed / 2) /
winstate->nfaContextsAbsorbed),
+ ExplainPropertyFloat("NFA Absorbed Length Avg",
NULL,
+
(double) winstate->nfaAbsorbedLen.total / winstate->nfaContextsAbsorbed, 1,
es);
}
if (winstate->nfaContextsSkipped > 0)
{
ExplainPropertyInteger("NFA Skipped Length
Min", NULL, winstate->nfaSkippedLen.min, es);
ExplainPropertyInteger("NFA Skipped Length
Max", NULL, winstate->nfaSkippedLen.max, es);
- ExplainPropertyInteger("NFA Skipped Length
Avg", NULL,
-
(int64) ((winstate->nfaSkippedLen.total + winstate->nfaContextsSkipped / 2) /
winstate->nfaContextsSkipped),
+ ExplainPropertyFloat("NFA Skipped Length Avg",
NULL,
+
(double) winstate->nfaSkippedLen.total / winstate->nfaContextsSkipped, 1,
es);
}
}
@@ -3724,7 +3724,7 @@ show_windowagg_info(WindowAggState *winstate,
ExplainState *es)
{
double avgLen = (double)
winstate->nfaMatchLen.total / winstate->nfaMatchesSucceeded;
appendStringInfo(es->str,
- INT64_FORMAT "
matched (len " INT64_FORMAT "/" INT64_FORMAT "/%.0f)",
+ INT64_FORMAT "
matched (len " INT64_FORMAT "/" INT64_FORMAT "/%.1f)",
winstate->nfaMatchesSucceeded,
winstate->nfaMatchLen.min,
winstate->nfaMatchLen.max,
@@ -3738,7 +3738,7 @@ show_windowagg_info(WindowAggState *winstate,
ExplainState *es)
{
double avgFail = (double)
winstate->nfaFailLen.total / winstate->nfaMatchesFailed;
appendStringInfo(es->str,
- ", "
INT64_FORMAT " mismatched (len " INT64_FORMAT "/" INT64_FORMAT "/%.0f)",
+ ", "
INT64_FORMAT " mismatched (len " INT64_FORMAT "/" INT64_FORMAT "/%.1f)",
winstate->nfaMatchesFailed,
winstate->nfaFailLen.min,
winstate->nfaFailLen.max,
@@ -3760,7 +3760,7 @@ show_windowagg_info(WindowAggState *winstate,
ExplainState *es)
{
double avgAbsorbed = (double)
winstate->nfaAbsorbedLen.total / winstate->nfaContextsAbsorbed;
appendStringInfo(es->str,
-
INT64_FORMAT " absorbed (len " INT64_FORMAT "/" INT64_FORMAT "/%.0f)",
+
INT64_FORMAT " absorbed (len " INT64_FORMAT "/" INT64_FORMAT "/%.1f)",
winstate->nfaContextsAbsorbed,
winstate->nfaAbsorbedLen.min,
winstate->nfaAbsorbedLen.max,
@@ -3775,7 +3775,7 @@ show_windowagg_info(WindowAggState *winstate,
ExplainState *es)
{
double avgSkipped = (double)
winstate->nfaSkippedLen.total / winstate->nfaContextsSkipped;
appendStringInfo(es->str,
- ", "
INT64_FORMAT " skipped (len " INT64_FORMAT "/" INT64_FORMAT "/%.0f)",
+ ", "
INT64_FORMAT " skipped (len " INT64_FORMAT "/" INT64_FORMAT "/%.1f)",
winstate->nfaContextsSkipped,
winstate->nfaSkippedLen.min,
winstate->nfaSkippedLen.max,
diff --git a/src/backend/executor/nodeWindowAgg.c
b/src/backend/executor/nodeWindowAgg.c
index 6002ca05c90..5185ad40237 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -256,31 +256,70 @@ static void update_reduced_frame(WindowObject winobj,
int64 pos);
static void check_rpr_navigation(Node *node, bool is_prev);
static bool rpr_navigation_walker(Node *node, void *context);
-/* NFA-based pattern matching functions */
+/* Forward declarations - NFA state management */
static RPRNFAState *nfa_state_alloc(WindowAggState *winstate);
static void nfa_state_free(WindowAggState *winstate, RPRNFAState *state);
static void nfa_state_free_list(WindowAggState *winstate, RPRNFAState *list);
static RPRNFAState *nfa_state_clone(WindowAggState *winstate, int16 elemIdx,
- int16
altPriority, int32 *counts);
-static bool nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched);
+ int16
altPriority, int32 *counts,
+ bool
sourceAbsorbable);
+static bool nfa_states_equal(WindowAggState *winstate, RPRNFAState *s1,
+ RPRNFAState *s2);
+static bool nfa_add_state_unique(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState
*state);
+static void nfa_add_matched_state(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState
*state, int64 matchEndRow);
+
+/* Forward declarations - NFA context management */
static RPRNFAContext *nfa_context_alloc(WindowAggState *winstate);
static void nfa_unlink_context(WindowAggState *winstate, RPRNFAContext *ctx);
static void nfa_context_free(WindowAggState *winstate, RPRNFAContext *ctx);
static RPRNFAContext *nfa_start_context(WindowAggState *winstate, int64
startPos);
-static void nfa_step(WindowAggState *winstate, RPRNFAContext *ctx,
- bool *varMatched, int64 pos);
-static void nfa_finalize_all_contexts(WindowAggState *winstate, int64 lastPos);
-static void nfa_process_context(WindowAggState *winstate, RPRNFAContext *ctx,
- int64
currentPos, bool hasLimitedFrame, int64 frameOffset);
-static void nfa_step_single(WindowAggState *winstate, RPRNFAContext *ctx,
- RPRNFAState *state,
bool *varMatched, int64 currentPos);
static RPRNFAContext *nfa_find_context_for_pos(WindowAggState *winstate, int64
pos);
+
+/* Forward declarations - NFA statistics */
+static void nfa_update_length_stats(int64 count, NFALengthStats *stats, int64
newLen);
+static void nfa_record_context_failure(WindowAggState *winstate, int64
failedLen);
+
+/* Forward declarations - NFA row evaluation */
+static bool nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched);
+
+/* Forward declarations - NFA context lifecycle */
static void nfa_remove_contexts_up_to(WindowAggState *winstate, int64 endPos,
-
RPRNFAContext *excludeCtx);
+
RPRNFAContext *excludeCtx);
static void nfa_cleanup_dead_contexts(WindowAggState *winstate, RPRNFAContext
*excludeCtx);
-static void nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext
*excludeCtx, int64 currentPos);
-static void update_nfa_length_stats(int64 count, NFALengthStats *stats, int64
newLen);
-static void record_nfa_context_failure(WindowAggState *winstate, int64
failedLen);
+
+/* Forward declarations - NFA absorption */
+static void nfa_update_absorption_flags(RPRNFAContext *ctx, RPRPattern
*pattern);
+static bool nfa_states_covered(RPRPattern *pattern, RPRNFAContext *older,
+ RPRNFAContext
*newer);
+static bool nfa_try_absorb_context(WindowAggState *winstate, RPRNFAContext
*ctx);
+static void nfa_absorb_contexts(WindowAggState *winstate);
+
+/* Forward declarations - NFA execution */
+static inline bool nfa_eval_var_match(WindowAggState *winstate,
+
RPRPatternElement *elem, bool *varMatched);
+static void nfa_match(WindowAggState *winstate, RPRNFAContext *ctx,
+ bool *varMatched);
+static void nfa_advance_state(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state,
int64 currentPos, bool initialAdvance);
+static void nfa_route_to_elem(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state,
RPRPatternElement *nextElem,
+ int64 currentPos,
bool initialAdvance);
+static void nfa_advance_alt(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state,
RPRPatternElement *elem,
+ int64 currentPos, bool
initialAdvance);
+static void nfa_advance_end(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state,
RPRPatternElement *elem,
+ int64 currentPos, bool
initialAdvance);
+static void nfa_advance_var(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state,
RPRPatternElement *elem,
+ int64 currentPos, bool
initialAdvance);
+static void nfa_advance(WindowAggState *winstate, RPRNFAContext *ctx,
+ int64 currentPos, bool
initialAdvance);
+static void nfa_process_row(WindowAggState *winstate, int64 currentPos,
+ bool hasLimitedFrame,
int64 frameOffset);
+static void nfa_finalize_all_contexts(WindowAggState *winstate, int64 lastPos);
/*
* Not null info bit array consists of 2-bit items
@@ -4834,7 +4873,6 @@ update_reduced_frame(WindowObject winobj, int64 pos)
for (currentPos = startPos; targetCtx->states != NULL; currentPos++)
{
bool rowExists;
- RPRNFAContext *ctx;
/* Evaluate variables for this row - done only once, shared by
all contexts */
rowExists = nfa_evaluate_row(winobj, currentPos,
winstate->nfaVarMatched);
@@ -4846,18 +4884,23 @@ update_reduced_frame(WindowObject winobj, int64 pos)
/* Clean up dead contexts from finalization */
nfa_cleanup_dead_contexts(winstate, targetCtx);
/* Absorb contexts at partition boundary */
- if (winstate->rpSkipTo == ST_PAST_LAST_ROW &&
!hasLimitedFrame &&
- winstate->rpPattern->isAbsorbable)
- nfa_absorb_contexts(winstate, NULL, currentPos
- 1);
+ if (winstate->rpPattern->isAbsorbable)
+ {
+ nfa_absorb_contexts(winstate);
+ }
break;
}
/* Update last processed row */
winstate->nfaLastProcessedRow = currentPos;
- /* Process each active context with this row's evaluation
results */
- for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next)
- nfa_process_context(winstate, ctx, currentPos,
hasLimitedFrame, frameOffset);
+ /*
+ * Process all contexts for this row:
+ * 1. Match all (convergence)
+ * 2. Absorb redundant
+ * 3. Advance all (divergence)
+ */
+ nfa_process_row(winstate, currentPos, hasLimitedFrame,
frameOffset);
/*
* Create a new context for the next potential start position.
@@ -4872,14 +4915,6 @@ update_reduced_frame(WindowObject winobj, int64 pos)
*/
nfa_cleanup_dead_contexts(winstate, targetCtx);
- /*
- * Absorb redundant contexts using simplified algorithm.
- * Only compares absorbable element counts (e.g., A+ or (A B)+).
- */
- if (winstate->rpSkipTo == ST_PAST_LAST_ROW && !hasLimitedFrame
&&
- winstate->rpPattern->isAbsorbable)
- nfa_absorb_contexts(winstate, targetCtx, currentPos);
-
/* Check if target context is now complete */
if (targetCtx->states == NULL)
break;
@@ -4896,7 +4931,7 @@ register_result:
int64 mismatchLen = targetCtx->lastProcessedRow -
targetCtx->matchStartRow + 1;
register_reduced_frame_map(winstate, targetCtx->matchStartRow,
RF_UNMATCHED);
- record_nfa_context_failure(winstate, mismatchLen);
+ nfa_record_context_failure(winstate, mismatchLen);
}
else
{
@@ -4908,7 +4943,7 @@ register_result:
register_reduced_frame_map(winstate, i, RF_SKIPPED);
}
winstate->nfaMatchesSucceeded++;
- update_nfa_length_stats(winstate->nfaMatchesSucceeded,
+ nfa_update_length_stats(winstate->nfaMatchesSucceeded,
&winstate->nfaMatchLen,
matchLen);
}
@@ -4934,6 +4969,83 @@ register_result:
*
* These functions implement direct NFA execution using the compiled
* RPRPattern structure, avoiding regex compilation overhead.
+ *
+ * Execution Flow: match → absorb → advance
+ * -----------------------------------------
+ * The NFA execution follows a three-phase cycle for each row:
+ *
+ * 1. MATCH (convergence): Evaluate all waiting states against current row.
+ * States on VAR elements are checked against their defining conditions.
+ * Failed matches are removed, successful ones may transition forward.
+ * This is a "convergence" phase - the number of states tends to decrease.
+ *
+ * 2. ABSORB: After matching, check if any context can absorb another.
+ * Context absorption is an optimization that merges equivalent contexts.
+ * A context can only be absorbed if ALL its states are absorbable.
+ *
+ * 3. ADVANCE (divergence): Expand states through epsilon transitions.
+ * States advance through ALT (alternation), END (group end), and
+ * optional elements until reaching VAR or FIN elements where they wait.
+ * This is a "divergence" phase - ALT creates multiple branch states.
+ *
+ * Key Design Decisions:
+ * ---------------------
+ * - VAR→END transition in match phase: When a simple VAR (max=1) matches
+ * and the next element is END, we transition immediately in the match
+ * phase rather than waiting for advance. This is necessary for correct
+ * absorption: states must be at END to be marked absorbable before the
+ * absorption check occurs.
+ *
+ * - Optional VAR skip paths: When advance lands on a VAR with min=0,
+ * we create both a waiting state AND a skip state (like ALT branches).
+ * This ensures patterns like "A B? C" work correctly - we need a state
+ * waiting for B AND a state that has already skipped to C.
+ *
+ * - END→END count increment: When transitioning from one END to another
+ * END within advance, we must increment the outer END's count. This
+ * handles nested groups like "((A|B)+)+" correctly - exiting the inner
+ * group counts as one iteration of the outer group.
+ *
+ * - initialAdvance flag: The first advance after context creation must
+ * skip FIN recording. Reaching FIN without evaluating any VAR would
+ * create a zero-length match, which is invalid.
+ *
+ * Context Absorption Runtime:
+ * ---------------------------
+ * Absorption uses flags computed at planning time (in rpr.c) and two
+ * context-level flags maintained at runtime:
+ *
+ * State-level:
+ * state.isAbsorbable: true if state is in the absorbable region.
+ * - Set at creation: elem->flags & RPR_ELEM_ABSORBABLE_BRANCH
+ * - At transition: prevAbsorbable && (newElem->flags & ABSORBABLE_BRANCH)
+ * - Monotonic: once false, stays false forever
+ *
+ * Context-level:
+ * ctx.hasAbsorbableState: can this context absorb others?
+ * - True if at least one state has isAbsorbable=true
+ * - Monotonic: true->false only (optimization: skip recalc when false)
+ *
+ * ctx.allStatesAbsorbable: can this context be absorbed?
+ * - True if ALL states have isAbsorbable=true
+ * - Dynamic: can change false->true (when non-absorbable states die)
+ *
+ * Absorption Algorithm:
+ * For each pair (older Ctx1, newer Ctx2):
+ * 1. Pre-check: Ctx1.hasAbsorbableState && Ctx2.allStatesAbsorbable
+ * -> If false, skip (fast filter)
+ * 2. Coverage check: For each Ctx2 state with isAbsorbable=true,
+ * find Ctx1 state with same elemIdx and count >= Ctx2.count
+ * 3. If all Ctx2 absorbable states are covered, absorb Ctx2
+ *
+ * Example: Pattern A+ B
+ * Row 1: Ctx1 at A (count=1)
+ * Row 2: Ctx1 at A (count=2), Ctx2 at A (count=1)
+ * -> Both at same elemIdx (A), Ctx1.count >= Ctx2.count
+ * -> Ctx2 absorbed
+ *
+ * The asymmetric design (Ctx1 needs hasAbsorbable, Ctx2 needs allAbsorbable)
+ * allows absorption even when Ctx1 has extra non-absorbable states.
*/
/*
@@ -5003,6 +5115,39 @@ nfa_state_free_list(WindowAggState *winstate,
RPRNFAState *list)
}
}
+/*
+ * nfa_state_clone
+ *
+ * Clone a state with given elemIdx, altPriority and counts.
+ * isAbsorbable is computed immediately: inherited AND new element's flag.
+ * Monotonic property: once false, stays false through all transitions.
+ *
+ * Caller is responsible for linking the returned state.
+ */
+static RPRNFAState *
+nfa_state_clone(WindowAggState *winstate, int16 elemIdx, int16 altPriority,
+ int32 *counts, bool sourceAbsorbable)
+{
+ RPRPattern *pattern = winstate->rpPattern;
+ int maxDepth = pattern->maxDepth;
+ RPRNFAState *state = nfa_state_alloc(winstate);
+ RPRPatternElement *elem = &pattern->elements[elemIdx];
+
+ state->elemIdx = elemIdx;
+ state->altPriority = altPriority;
+ if (counts != NULL && maxDepth > 0)
+ memcpy(state->counts, counts, sizeof(int32) * maxDepth);
+
+ /*
+ * Compute isAbsorbable immediately at transition time.
+ * isAbsorbable = sourceAbsorbable && (elem->flags & ABSORBABLE_BRANCH)
+ * Monotonic: once false, stays false (can't re-enter absorbable
region).
+ */
+ state->isAbsorbable = sourceAbsorbable &&
RPRElemIsAbsorbableBranch(elem);
+
+ return state;
+}
+
/*
* nfa_states_equal
*
@@ -5059,28 +5204,6 @@ nfa_add_state_unique(WindowAggState *winstate,
RPRNFAContext *ctx, RPRNFAState *
return true;
}
-/*
- * nfa_state_clone
- *
- * Clone a state with given elemIdx, altPriority and counts.
- * Caller is responsible for linking the returned state.
- */
-static RPRNFAState *
-nfa_state_clone(WindowAggState *winstate, int16 elemIdx, int16 altPriority,
- int32 *counts)
-{
- RPRPattern *pattern = winstate->rpPattern;
- int maxDepth = pattern->maxDepth;
- RPRNFAState *state = nfa_state_alloc(winstate);
-
- state->elemIdx = elemIdx;
- state->altPriority = altPriority;
- if (counts != NULL && maxDepth > 0)
- memcpy(state->counts, counts, sizeof(int32) * maxDepth);
-
- return state;
-}
-
/*
* nfa_add_matched_state
*
@@ -5120,74 +5243,6 @@ nfa_add_matched_state(WindowAggState *winstate,
RPRNFAContext *ctx,
}
}
-/*
- * nfa_evaluate_row
- *
- * Evaluate all DEFINE variables for current row.
- * Returns true if the row exists, false if out of partition.
- * If row exists, fills varMatched array.
- * varMatched[i] = true if variable i matched at current row.
- */
-static bool
-nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched)
-{
- WindowAggState *winstate = winobj->winstate;
- ExprContext *econtext = winstate->ss.ps.ps_ExprContext;
- int numDefineVars =
list_length(winstate->defineVariableList);
- ListCell *lc;
- int varIdx = 0;
- TupleTableSlot *slot;
-
- /*
- * Set up slots for current, previous, and next rows.
- * We don't call get_slots() here to avoid recursion through
- * row_is_in_frame -> update_reduced_frame -> nfa_match_pattern.
- */
-
- /* Current row -> ecxt_outertuple */
- slot = winstate->temp_slot_1;
- if (!window_gettupleslot(winobj, pos, slot))
- return false; /* No row exists */
- econtext->ecxt_outertuple = slot;
-
- /* Previous row -> ecxt_scantuple (for PREV) */
- if (pos > 0)
- {
- slot = winstate->prev_slot;
- if (!window_gettupleslot(winobj, pos - 1, slot))
- econtext->ecxt_scantuple = winstate->null_slot;
- else
- econtext->ecxt_scantuple = slot;
- }
- else
- econtext->ecxt_scantuple = winstate->null_slot;
-
- /* Next row -> ecxt_innertuple (for NEXT) */
- slot = winstate->next_slot;
- if (!window_gettupleslot(winobj, pos + 1, slot))
- econtext->ecxt_innertuple = winstate->null_slot;
- else
- econtext->ecxt_innertuple = slot;
-
- foreach(lc, winstate->defineClauseList)
- {
- ExprState *exprState = (ExprState *) lfirst(lc);
- Datum result;
- bool isnull;
-
- /* Evaluate DEFINE expression */
- result = ExecEvalExpr(exprState, econtext, &isnull);
-
- varMatched[varIdx] = (!isnull && DatumGetBool(result));
-
- varIdx++;
- if (varIdx >= numDefineVars)
- break;
- }
-
- return true; /* Row exists */
-}
-
/*
* nfa_context_alloc
*
@@ -5216,6 +5271,11 @@ nfa_context_alloc(WindowAggState *winstate)
ctx->matchEndRow = -1;
ctx->lastProcessedRow = -1;
ctx->matchedState = NULL;
+ /* Initialize two-flag absorption design based on pattern */
+ ctx->hasAbsorbableState = (winstate->rpPattern != NULL &&
+
winstate->rpPattern->isAbsorbable);
+ ctx->allStatesAbsorbable = (winstate->rpPattern != NULL &&
+
winstate->rpPattern->isAbsorbable);
/* Update statistics */
winstate->nfaContextsActive++;
@@ -5279,17 +5339,50 @@ nfa_context_free(WindowAggState *winstate,
RPRNFAContext *ctx)
* nfa_start_context
*
* Start a new match context at given position.
+ * Initializes context and state absorption flags.
* Adds context to winstate->nfaContext list and returns the new context.
*/
static RPRNFAContext *
nfa_start_context(WindowAggState *winstate, int64 startPos)
{
RPRNFAContext *ctx;
+ RPRPattern *pattern = winstate->rpPattern;
ctx = nfa_context_alloc(winstate);
ctx->matchStartRow = startPos;
ctx->states = nfa_state_alloc(winstate); /* initial state at
elem 0 */
+ /*
+ * Initialize two-flag absorption design (see
CONTEXT_ABSORPTION_DESIGN.txt):
+ * hasAbsorbableState: can this context absorb others? (≥1 absorbable
state)
+ * allStatesAbsorbable: can this context be absorbed? (ALL states
absorbable)
+ * Both initialized from pattern->isAbsorbable at context start.
+ */
+ ctx->hasAbsorbableState = (pattern != NULL && pattern->isAbsorbable);
+ ctx->allStatesAbsorbable = (pattern != NULL && pattern->isAbsorbable);
+
+ if (ctx->states != NULL && pattern != NULL && pattern->numElements > 0)
+ {
+ RPRPatternElement *elem = &pattern->elements[0];
+
+ /*
+ * Initial state at element 0.
+ * Check if element 0 is in absorbable branch.
+ */
+ if (RPRElemIsAbsorbableBranch(elem))
+ {
+ /* Element 0 is in absorbable branch - flags stay true
*/
+ ctx->states->isAbsorbable = true;
+ }
+ else
+ {
+ /* Element 0 is NOT in absorbable branch - turn flags
OFF */
+ ctx->hasAbsorbableState = false;
+ ctx->allStatesAbsorbable = false;
+ ctx->states->isAbsorbable = false;
+ }
+ }
+
/* Add to tail of active context list (doubly-linked, oldest-first) */
ctx->prev = winstate->nfaContextTail;
ctx->next = NULL;
@@ -5299,6 +5392,16 @@ nfa_start_context(WindowAggState *winstate, int64
startPos)
winstate->nfaContext = ctx; /* first context becomes head */
winstate->nfaContextTail = ctx;
+ /*
+ * Initial advance (divergence): expand ALT branches and create exit
+ * states for VAR elements with min=0. This prepares the context for
+ * the first row's match phase.
+ *
+ * Pass initialAdvance=true to prevent recording zero-length matches
+ * when optional patterns can skip all VARs to reach FIN immediately.
+ */
+ nfa_advance(winstate, ctx, startPos, true);
+
return ctx;
}
@@ -5328,13 +5431,13 @@ nfa_find_context_for_pos(WindowAggState *winstate,
int64 pos)
}
/*
- * update_nfa_length_stats
+ * nfa_update_length_stats
*
* Helper function to update min/max/total length statistics.
* Called when tracking match/mismatch/absorbed/skipped lengths.
*/
static void
-update_nfa_length_stats(int64 count, NFALengthStats *stats, int64 newLen)
+nfa_update_length_stats(int64 count, NFALengthStats *stats, int64 newLen)
{
if (count == 1)
{
@@ -5352,14 +5455,14 @@ update_nfa_length_stats(int64 count, NFALengthStats
*stats, int64 newLen)
}
/*
- * record_nfa_context_failure
+ * nfa_record_context_failure
*
* Record a failed context in statistics.
* If failedLen == 1, count as pruned (failed on first row).
* If failedLen > 1, count as mismatched and update length stats.
*/
static void
-record_nfa_context_failure(WindowAggState *winstate, int64 failedLen)
+nfa_record_context_failure(WindowAggState *winstate, int64 failedLen)
{
if (failedLen == 1)
{
@@ -5368,12 +5471,80 @@ record_nfa_context_failure(WindowAggState *winstate,
int64 failedLen)
else
{
winstate->nfaMatchesFailed++;
- update_nfa_length_stats(winstate->nfaMatchesFailed,
+ nfa_update_length_stats(winstate->nfaMatchesFailed,
&winstate->nfaFailLen,
failedLen);
}
}
+/*
+ * nfa_evaluate_row
+ *
+ * Evaluate all DEFINE variables for current row.
+ * Returns true if the row exists, false if out of partition.
+ * If row exists, fills varMatched array.
+ * varMatched[i] = true if variable i matched at current row.
+ */
+static bool
+nfa_evaluate_row(WindowObject winobj, int64 pos, bool *varMatched)
+{
+ WindowAggState *winstate = winobj->winstate;
+ ExprContext *econtext = winstate->ss.ps.ps_ExprContext;
+ int numDefineVars =
list_length(winstate->defineVariableList);
+ ListCell *lc;
+ int varIdx = 0;
+ TupleTableSlot *slot;
+
+ /*
+ * Set up slots for current, previous, and next rows.
+ * We don't call get_slots() here to avoid recursion through
+ * row_is_in_frame -> update_reduced_frame -> nfa_process_row.
+ */
+
+ /* Current row -> ecxt_outertuple */
+ slot = winstate->temp_slot_1;
+ if (!window_gettupleslot(winobj, pos, slot))
+ return false; /* No row exists */
+ econtext->ecxt_outertuple = slot;
+
+ /* Previous row -> ecxt_scantuple (for PREV) */
+ if (pos > 0)
+ {
+ slot = winstate->prev_slot;
+ if (!window_gettupleslot(winobj, pos - 1, slot))
+ econtext->ecxt_scantuple = winstate->null_slot;
+ else
+ econtext->ecxt_scantuple = slot;
+ }
+ else
+ econtext->ecxt_scantuple = winstate->null_slot;
+
+ /* Next row -> ecxt_innertuple (for NEXT) */
+ slot = winstate->next_slot;
+ if (!window_gettupleslot(winobj, pos + 1, slot))
+ econtext->ecxt_innertuple = winstate->null_slot;
+ else
+ econtext->ecxt_innertuple = slot;
+
+ foreach(lc, winstate->defineClauseList)
+ {
+ ExprState *exprState = (ExprState *) lfirst(lc);
+ Datum result;
+ bool isnull;
+
+ /* Evaluate DEFINE expression */
+ result = ExecEvalExpr(exprState, econtext, &isnull);
+
+ varMatched[varIdx] = (!isnull && DatumGetBool(result));
+
+ varIdx++;
+ if (varIdx >= numDefineVars)
+ break;
+ }
+
+ return true; /* Row exists */
+}
+
/*
* nfa_remove_contexts_up_to
*
@@ -5403,7 +5574,7 @@ nfa_remove_contexts_up_to(WindowAggState *winstate, int64
endPos,
int64 skippedLen = ctx->lastProcessedRow -
ctx->matchStartRow + 1;
winstate->nfaContextsSkipped++;
- update_nfa_length_stats(winstate->nfaContextsSkipped,
+ nfa_update_length_stats(winstate->nfaContextsSkipped,
&winstate->nfaSkippedLen,
skippedLen);
}
@@ -5445,7 +5616,7 @@ nfa_cleanup_dead_contexts(WindowAggState *winstate,
RPRNFAContext *excludeCtx)
if (ctx->lastProcessedRow >= ctx->matchStartRow)
{
int64 failedLen = ctx->lastProcessedRow -
ctx->matchStartRow + 1;
- record_nfa_context_failure(winstate, failedLen);
+ nfa_record_context_failure(winstate, failedLen);
}
/* else: context was never processed (beyond-partition), just
remove */
@@ -5454,573 +5625,713 @@ nfa_cleanup_dead_contexts(WindowAggState *winstate,
RPRNFAContext *excludeCtx)
}
/*
- * nfa_absorb_contexts
+ * nfa_update_absorption_flags
*
- * Absorb newer contexts into older ones when states are fully covered.
- * For pattern like A+, if older context (started earlier) has count >= newer
- * context's count at the same element, the newer context produces subset
matches.
+ * Update context's absorption flags after state changes.
*
- * Absorption condition:
- * - For unbounded quantifiers (max=INT32_MAX): older.counts >= newer.counts
- * - For bounded quantifiers: older.counts == newer.counts
+ * Two flags control absorption behavior:
+ * hasAbsorbableState: true if context has at least one absorbable state.
+ * This flag is monotonic (true -> false only). Once all absorbable states
+ * die, no new absorbable states can be created through transitions.
+ * allStatesAbsorbable: true if ALL states in context are absorbable.
+ * This flag is dynamic and can change false -> true when non-absorbable
+ * states die off.
*
- * Note: List is oldest-first (head=oldest, tail=newest), so we start from
- * tail (newest) and check if older contexts (via prev) can absorb it.
+ * Optimization: Once hasAbsorbableState becomes false, both flags remain false
+ * permanently, so we skip recalculation.
*/
static void
-nfa_absorb_contexts(WindowAggState *winstate, RPRNFAContext *excludeCtx, int64
currentPos)
+nfa_update_absorption_flags(RPRNFAContext *ctx, RPRPattern *pattern)
{
- RPRPattern *pattern = winstate->rpPattern;
- RPRNFAContext *ctx;
+ RPRNFAState *state;
+ bool hasAbsorbable = false;
+ bool allAbsorbable = true;
- /* Need at least 2 contexts for absorption */
- if (winstate->nfaContext == NULL || winstate->nfaContext->next == NULL)
+ /*
+ * Optimization: Once hasAbsorbableState becomes false, it stays false.
+ * No need to recalculate - both flags remain false permanently.
+ */
+ if (!ctx->hasAbsorbableState)
+ {
+ ctx->allStatesAbsorbable = false;
return;
+ }
- if (pattern == NULL)
+ /* No states means no absorbable states */
+ if (ctx->states == NULL)
+ {
+ ctx->hasAbsorbableState = false;
+ ctx->allStatesAbsorbable = false;
return;
+ }
- /*
- * Only absorb for patterns marked as absorbable during planning.
- * See computeAbsorbability() in rpr.c for criteria.
- */
- if (!pattern->isAbsorbable)
+ if (pattern == NULL)
return;
/*
- * Context absorption: A later context (started at higher row) can be
- * absorbed by an earlier context if ALL states in the later context
- * are "covered" by states in the earlier context.
- *
- * A later state is covered by an earlier state if:
- * 1. They are at the same element
- * 2. For unbounded elements (max == INT32_MAX): earlier.counts[d] >=
later.counts[d]
- * for all depths d
- * 3. For bounded elements: counts must be exactly equal at all depths
- *
- * List is oldest-first (head = lowest matchStartRow, tail = highest).
- * We iterate from tail (newest) via prev and check if older contexts
can absorb it.
+ * Iterate through all states to check absorption status.
+ * Uses state->isAbsorbable which tracks if state is in absorbable
region.
+ * This is different from RPRElemIsAbsorbable(elem) which checks
judgment point.
*/
- ctx = winstate->nfaContextTail;
-
- while (ctx != NULL)
+ for (state = ctx->states; state != NULL; state = state->next)
{
- RPRNFAContext *nextCtx = ctx->prev; /* traverse toward
older (head) */
- RPRNFAContext *older;
- bool absorbed = false;
+ if (state->isAbsorbable)
+ hasAbsorbable = true;
+ else
+ allAbsorbable = false;
+ }
- /* Never absorb the excluded context (it's the target being
completed) */
- if (ctx == excludeCtx)
- {
- ctx = nextCtx;
- continue;
- }
+ ctx->hasAbsorbableState = hasAbsorbable;
+ ctx->allStatesAbsorbable = allAbsorbable;
+}
- /* Skip contexts that haven't started processing yet (just
created for future row) */
- if (ctx->matchStartRow > currentPos)
- {
- ctx = nextCtx;
- continue;
- }
-
- /*
- * Handle completed contexts (states == NULL) separately.
- * A completed context can be absorbed by an older completed
context
- * if the older one has the same or later matchEndRow.
- */
- if (ctx->states == NULL)
- {
- /* Only completed contexts with valid matchEndRow can
be absorbed */
- if (ctx->matchEndRow < 0)
- {
- ctx = nextCtx;
- continue;
- }
-
- /* Check if any older completed context can absorb this
one */
- for (older = ctx->prev; older != NULL && !absorbed;
older = older->prev)
- {
- /* Must have started earlier */
- if (older->matchStartRow >= ctx->matchStartRow)
- continue;
-
- /* Older must also be completed with valid
matchEndRow */
- if (older->states != NULL || older->matchEndRow
< 0)
- continue;
+/*
+ * nfa_states_covered
+ *
+ * Check if all states in newer context are "covered" by older context.
+ *
+ * A newer state is covered when older context has an absorbable state at the
+ * same pattern element (elemIdx) with count >= newer's count at that depth.
+ * The covering state must be absorbable because only absorbable states can
+ * guarantee to produce superset matches.
+ *
+ * If all newer states are covered, newer context's eventual matches will be
+ * a subset of older context's matches, making newer redundant.
+ */
+static bool
+nfa_states_covered(RPRPattern *pattern, RPRNFAContext *older, RPRNFAContext
*newer)
+{
+ RPRNFAState *newerState;
- /*
- * Older context absorbs newer if it has the
same or later
- * matchEndRow, meaning all matches from newer
are subsets.
- */
- if (older->matchEndRow >= ctx->matchEndRow)
- {
- /* Track absorbed context length
statistics */
- int64 absorbedLen = ctx->matchEndRow -
ctx->matchStartRow + 1;
-
- /* Absorb: remove ctx (newer) */
- nfa_context_free(winstate, ctx);
- winstate->nfaContextsAbsorbed++;
-
update_nfa_length_stats(winstate->nfaContextsAbsorbed,
-
&winstate->nfaAbsorbedLen,
-
absorbedLen);
- absorbed = true;
- }
- }
+ for (newerState = newer->states; newerState != NULL; newerState =
newerState->next)
+ {
+ RPRNFAState *olderState;
+ RPRPatternElement *elem;
+ int depth;
+ bool found = false;
- ctx = nextCtx;
- continue;
- }
+ /* All states are absorbable (caller checks
allStatesAbsorbable) */
+ elem = &pattern->elements[newerState->elemIdx];
+ depth = elem->depth;
- /*
- * SIMPLIFIED ABSORPTION: Compare counts at depth 0 only.
- *
- * For absorbable patterns (A+ or (A B)+), we use a simple rule:
- * If older context has count[0] >= newer context's count[0] for
- * ANY state, older can absorb newer.
- *
- * This works because:
- * - A+: count[0] tracks how many A's matched
- * - (A B)+: count[0] tracks how many groups matched
- *
- * If older is ahead or equal, newer's future matches are
subsets.
- */
- for (older = ctx->prev; older != NULL && !absorbed; older =
older->prev)
+ for (olderState = older->states; olderState != NULL; olderState
= olderState->next)
{
- int32 newerMaxCount = -1;
- int32 olderMaxCount = -1;
- RPRNFAState *s;
-
- /* Skip if not started earlier */
- if (older->matchStartRow >= ctx->matchStartRow)
- continue;
-
- /* Skip contexts that haven't started processing yet */
- if (older->matchStartRow > currentPos)
- continue;
-
- /* For in-progress ctx, older must also be in-progress
*/
- if (older->states == NULL)
- continue;
-
- /* Find maximum count[0] in newer context */
- for (s = ctx->states; s != NULL; s = s->next)
+ /* Covering state must also be absorbable */
+ if (olderState->isAbsorbable &&
+ olderState->elemIdx == newerState->elemIdx &&
+ olderState->counts[depth] >=
newerState->counts[depth])
{
- if (s->counts[0] > newerMaxCount)
- newerMaxCount = s->counts[0];
- }
-
- /* Find maximum count[0] in older context */
- for (s = older->states; s != NULL; s = s->next)
- {
- if (s->counts[0] > olderMaxCount)
- olderMaxCount = s->counts[0];
- }
-
- /* If older is ahead or equal, absorb newer */
- if (olderMaxCount >= 0 && newerMaxCount >= 0 &&
- olderMaxCount >= newerMaxCount)
- {
- int64 absorbedLen = ctx->lastProcessedRow -
ctx->matchStartRow + 1;
-
- nfa_context_free(winstate, ctx);
- winstate->nfaContextsAbsorbed++;
-
update_nfa_length_stats(winstate->nfaContextsAbsorbed,
-
&winstate->nfaAbsorbedLen,
-
absorbedLen);
- absorbed = true;
+ found = true;
+ break;
}
}
- ctx = nextCtx;
+ if (!found)
+ return false;
}
+
+ return true;
}
/*
- * nfa_step
+ * nfa_try_absorb_context
+ *
+ * Try to absorb ctx (newer) into an older in-progress context.
+ * Returns true if ctx was absorbed and freed.
*
- * Process all states in context through NFA for one row.
- * Calls nfa_step_single for each state.
+ * Absorption requires three conditions:
+ * 1. ctx must have all states absorbable (allStatesAbsorbable).
+ * If ctx has any non-absorbable state, it may produce unique matches.
+ * 2. older must have at least one absorbable state (hasAbsorbableState).
+ * Without absorbable states, older cannot cover newer's states.
+ * 3. All ctx states must be covered by older's absorbable states.
+ * This ensures older will produce all matches that ctx would produce.
+ *
+ * Context list is ordered by creation time (oldest first via prev chain).
+ * Each row creates at most one context, so earlier contexts have smaller
+ * matchStartRow values.
*/
-static void
-nfa_step(WindowAggState *winstate, RPRNFAContext *ctx, bool *varMatched, int64
pos)
+static bool
+nfa_try_absorb_context(WindowAggState *winstate, RPRNFAContext *ctx)
{
- RPRNFAState *states = ctx->states;
- RPRNFAState *state;
- RPRNFAState *nextState;
-
- ctx->states = NULL;
+ RPRPattern *pattern = winstate->rpPattern;
+ RPRNFAContext *older;
- /* Track last processed row for failed match statistics */
- if (pos > ctx->lastProcessedRow)
- ctx->lastProcessedRow = pos;
+ /* Early exit: ctx must have all states absorbable */
+ if (!ctx->allStatesAbsorbable)
+ return false;
- for (state = states; state != NULL; state = nextState)
+ for (older = ctx->prev; older != NULL; older = older->prev)
{
- nextState = state->next;
- state->next = NULL;
- nfa_step_single(winstate, ctx, state, varMatched, pos);
+ /*
+ * By invariant: ctx->prev chain is in creation order (oldest
first),
+ * and each row creates at most one context. So all contexts in
this
+ * chain have matchStartRow < ctx->matchStartRow.
+ */
+
+ /* Older must also be in-progress */
+ if (older->states == NULL)
+ continue;
+
+ /* Older must have at least one absorbable state */
+ if (!older->hasAbsorbableState)
+ continue;
+
+ /* Check if all newer states are covered by older */
+ if (nfa_states_covered(pattern, older, ctx))
+ {
+ int64 absorbedLen = ctx->lastProcessedRow -
ctx->matchStartRow + 1;
+
+ nfa_context_free(winstate, ctx);
+ winstate->nfaContextsAbsorbed++;
+ nfa_update_length_stats(winstate->nfaContextsAbsorbed,
+
&winstate->nfaAbsorbedLen,
+
absorbedLen);
+ return true;
+ }
}
+
+ return false;
}
/*
- * nfa_finalize_all_contexts
+ * nfa_absorb_contexts
*
- * Finalize all active contexts when partition ends.
- * Calls nfa_step with NULL varMatched to complete without new row data.
+ * Absorb redundant contexts to reduce memory usage and computation.
+ *
+ * For patterns like A+, newer contexts starting later will produce subset
+ * matches of older contexts with higher counts. By absorbing these redundant
+ * contexts early, we avoid duplicate work.
+ *
+ * Iterates from tail (newest) toward head (oldest) via prev chain.
+ * Only in-progress contexts (states != NULL) are candidates for absorption;
+ * completed contexts represent valid match results.
*/
static void
-nfa_finalize_all_contexts(WindowAggState *winstate, int64 lastPos)
+nfa_absorb_contexts(WindowAggState *winstate)
{
RPRNFAContext *ctx;
+ RPRNFAContext *nextCtx;
- for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next)
+ for (ctx = winstate->nfaContextTail; ctx != NULL; ctx = nextCtx)
{
+ nextCtx = ctx->prev;
+
+ /* Only absorb in-progress contexts; completed contexts are
valid results */
if (ctx->states != NULL)
- nfa_step(winstate, ctx, NULL, lastPos);
+ nfa_try_absorb_context(winstate, ctx);
}
}
/*
- * nfa_process_context
+ * nfa_eval_var_match
*
- * Process one context for the current row.
- * Handles frame boundary check and NFA step.
+ * Evaluate if a VAR element matches the current row.
+ * Undefined variables (varId >= defineVariableList length) default to TRUE.
*/
-static void
-nfa_process_context(WindowAggState *winstate, RPRNFAContext *ctx,
- int64 currentPos, bool hasLimitedFrame,
int64 frameOffset)
+static inline bool
+nfa_eval_var_match(WindowAggState *winstate, RPRPatternElement *elem,
+ bool *varMatched)
{
- /* Skip already-completed contexts */
- if (ctx->states == NULL)
- return;
-
- /* Check frame boundary */
- if (hasLimitedFrame)
- {
- int64 ctxFrameEnd = ctx->matchStartRow + frameOffset + 1;
- if (currentPos >= ctxFrameEnd)
- {
- nfa_step(winstate, ctx, NULL, ctxFrameEnd - 1);
- return;
- }
- }
-
- /* Process states for this context */
- nfa_step(winstate, ctx, winstate->nfaVarMatched, currentPos);
+ if (varMatched == NULL)
+ return false;
+ if (elem->varId >= list_length(winstate->defineVariableList))
+ return true;
+ return varMatched[elem->varId];
}
/*
- * nfa_step_single
+ * nfa_match
*
- * Process one state through NFA for one row.
- * New states are added to ctx->states.
- * Match (FIN) is recorded in ctx->matchedState.
- * When FIN is reached by matching (not skipping), matchEndRow is updated.
+ * Match phase (convergence): evaluate VAR elements against current row.
+ * Only updates counts and removes dead states. Minimal transitions.
+ *
+ * For VAR elements:
+ * - matched: count++, keep state (unless count > max)
+ * - not matched: remove state (exit alternatives already exist from
+ * previous advance when count >= min was satisfied)
+ *
+ * For simple VARs (min=max=1) followed by END:
+ * - Advance to END and update group count before absorb phase
+ * - This ensures absorption can compare states by group completion
+ *
+ * Non-VAR elements (ALT, END, FIN) are kept as-is for advance phase.
*/
static void
-nfa_step_single(WindowAggState *winstate, RPRNFAContext *ctx,
- RPRNFAState *state, bool *varMatched, int64
currentPos)
+nfa_match(WindowAggState *winstate, RPRNFAContext *ctx, bool *varMatched)
{
RPRPattern *pattern = winstate->rpPattern;
RPRPatternElement *elements = pattern->elements;
- RPRNFAState *pending = state; /* states to process in current
row */
- RPRNFAState *pending_tail = state; /* tail for FIFO append */
+ RPRNFAState **prevPtr = &ctx->states;
+ RPRNFAState *state;
- while (pending != NULL)
+ /*
+ * Evaluate VAR elements against current row.
+ * For simple VARs with END next, advance to END and update group count
+ * inline so absorb phase can compare states properly.
+ */
+ for (state = ctx->states; state != NULL; )
{
- RPRPatternElement *elem;
- bool matched;
- int32 count;
- int depth;
-
- /* Pop from head */
- state = pending;
- pending = pending->next;
- if (pending == NULL)
- pending_tail = NULL;
- state->next = NULL;
-
- Assert(state->elemIdx >= 0 && state->elemIdx <
pattern->numElements);
- elem = &elements[state->elemIdx];
- depth = elem->depth;
+ RPRNFAState *nextState = state->next;
+ RPRPatternElement *elem = &elements[state->elemIdx];
if (RPRElemIsVar(elem))
{
- /*
- * Variable: check if it matches current row.
- * If varMatched is NULL (boundary), all variables are
forced to false.
- * Otherwise, varId < numDefines uses DEFINE expr,
- * varId >= numDefines defaults to TRUE.
- */
- if (varMatched == NULL)
- matched = false;
- else
- {
- int numDefines =
list_length(winstate->defineVariableList);
-
- if (elem->varId >= numDefines)
- matched = true; /* Not defined
in DEFINE, defaults to TRUE */
- else
- matched = varMatched[elem->varId];
- }
+ bool matched;
+ int depth = elem->depth;
+ int32 count = state->counts[depth];
- count = state->counts[depth];
+ matched = nfa_eval_var_match(winstate, elem,
varMatched);
if (matched)
{
- /*
- * Clamp count to prevent overflow with very
large partitions.
- * Once count reaches RPR_COUNT_MAX, we stop
incrementing.
- * This is safe because for unbounded
quantifiers, only the
- * comparison with min matters, and for bounded
quantifiers,
- * max is always < RPR_COUNT_MAX.
- */
+ /* Increment count */
if (count < RPR_COUNT_MAX)
count++;
+ /* Check max constraint */
if (elem->max != RPR_QUANTITY_INF && count >
elem->max)
{
+ /* Exceeded max - remove state */
+ *prevPtr = nextState;
nfa_state_free(winstate, state);
+ state = nextState;
continue;
}
state->counts[depth] = count;
- /* Can repeat more? Clone for staying */
- if (elem->max == RPR_QUANTITY_INF || count <
elem->max)
- {
- RPRNFAState *clone =
nfa_state_clone(winstate, state->elemIdx,
-
state->altPriority,
-
state->counts);
- nfa_add_state_unique(winstate, ctx,
clone);
- }
-
- /* Satisfied? Advance to next element */
- if (count >= elem->min)
+ /*
+ * For simple VAR (min=max=1) with END next,
advance to END
+ * and update group count inline. This keeps
state in place,
+ * preserving lexical order.
+ */
+ if (elem->min == 1 && elem->max == 1 && count
== 1 &&
+ RPRElemIsEnd(&elements[elem->next]))
{
- RPRPatternElement *nextElem;
+ RPRPatternElement *endElem =
&elements[elem->next];
+ int endDepth =
endElem->depth;
+ int32 endCount =
state->counts[endDepth];
- state->counts[depth] = 0;
- state->elemIdx = elem->next;
- nextElem = &elements[state->elemIdx];
+ /* Increment group count with overflow
protection */
+ if (endCount < RPR_COUNT_MAX)
+ endCount++;
- if (RPRElemIsFin(nextElem))
- {
- /* Match ends at current row
since we matched */
- nfa_add_matched_state(winstate,
ctx, state, currentPos);
- }
- else if (RPRElemIsEnd(nextElem))
+ /* Check END's max constraint */
+ if (endElem->max != RPR_QUANTITY_INF &&
endCount > endElem->max)
{
- /*
- * END is epsilon transition -
process immediately on same row.
- * This ensures match end
position is recorded at the row where
- * the last VAR matched, not
the next row.
- */
- state->next = NULL;
- if (pending_tail)
- pending_tail->next =
state;
- else
- pending = state;
- pending_tail = state;
- }
- else
- {
- /*
- * VAR, ALT - wait for next
row. ALT dispatches to VARs that
- * need input, so it must wait
for the next row after VAR
- * consumed the current row.
- */
- nfa_add_state_unique(winstate,
ctx, state);
+ /* Exceeded END's max - remove
state */
+ *prevPtr = nextState;
+ nfa_state_free(winstate, state);
+ state = nextState;
+ continue;
}
+
+ state->elemIdx = elem->next;
+ state->counts[endDepth] = endCount;
+
+ /* Clear deeper counts */
+ for (int d = endDepth + 1; d <
pattern->maxDepth; d++)
+ state->counts[d] = 0;
}
- else
- {
- nfa_state_free(winstate, state);
- }
+ /* else: stay at VAR for advance phase */
}
else
{
- /* Not matched: can we skip? */
- if (count >= elem->min)
- {
- RPRPatternElement *nextElem;
+ /*
+ * Not matched - remove state.
+ * Exit alternatives were already created by
advance phase
+ * when count >= min was satisfied.
+ */
+ *prevPtr = nextState;
+ nfa_state_free(winstate, state);
+ state = nextState;
+ continue;
+ }
+ }
+ /* Non-VAR elements: keep as-is for advance phase */
- state->counts[depth] = 0;
- state->elemIdx = elem->next;
- nextElem = &elements[state->elemIdx];
+ prevPtr = &state->next;
+ state = nextState;
+ }
+}
- if (RPRElemIsFin(nextElem))
- {
- /* Match ends at previous row
since current didn't match */
- nfa_add_matched_state(winstate,
ctx, state, currentPos - 1);
- }
- else if (RPRElemIsVar(nextElem))
- {
- /*
- * Current row was NOT consumed
(skip case), so next VAR
- * must be tried on the SAME
row via pending list
- */
- state->next = NULL;
- if (pending_tail)
- pending_tail->next =
state;
- else
- pending = state;
- pending_tail = state;
- }
- else
- {
- state->next = NULL;
- if (pending_tail)
- pending_tail->next =
state;
- else
- pending = state;
- pending_tail = state;
- }
- }
- else
- {
- nfa_state_free(winstate, state);
- }
- }
+/*
+ * nfa_route_to_elem
+ *
+ * Route state to next element. If VAR, add to ctx->states and process
+ * skip path if optional. Otherwise, continue epsilon expansion via recursion.
+ */
+static void
+nfa_route_to_elem(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state, RPRPatternElement
*nextElem,
+ int64 currentPos, bool initialAdvance)
+{
+ if (RPRElemIsVar(nextElem))
+ {
+ nfa_add_state_unique(winstate, ctx, state);
+ if (RPRElemCanSkip(nextElem))
+ {
+ RPRNFAState *skipState;
+ skipState = nfa_state_clone(winstate, nextElem->next,
+
state->altPriority, state->counts,
+
state->isAbsorbable);
+ nfa_advance_state(winstate, ctx, skipState, currentPos,
initialAdvance);
}
- else if (RPRElemIsFin(elem))
+ }
+ else
+ {
+ nfa_advance_state(winstate, ctx, state, currentPos,
initialAdvance);
+ }
+}
+
+/*
+ * nfa_advance_alt
+ *
+ * Handle ALT element: expand all branches in lexical order (DFS).
+ * Sets altPriority to element index to preserve lexical order for match
selection.
+ */
+static void
+nfa_advance_alt(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state, RPRPatternElement *elem,
+ int64 currentPos, bool initialAdvance)
+{
+ RPRPattern *pattern = winstate->rpPattern;
+ RPRPatternElement *elements = pattern->elements;
+ RPRElemIdx altIdx = elem->next;
+ bool first = true;
+
+ while (altIdx >= 0 && altIdx < pattern->numElements)
+ {
+ RPRPatternElement *altElem = &elements[altIdx];
+ RPRNFAState *newState;
+
+ if (first)
{
- /* Already at FIN - match ends at current row */
- nfa_add_matched_state(winstate, ctx, state, currentPos);
+ state->elemIdx = altIdx;
+ state->altPriority = altIdx;
+ newState = state;
+ first = false;
}
- else if (RPRElemIsAlt(elem))
+ else
{
- RPRElemIdx altIdx = elem->next;
- bool first = true;
+ newState = nfa_state_clone(winstate, altIdx, altIdx,
+
state->counts, state->isAbsorbable);
+ }
- /*
- * ALT doesn't consume a row - it's just a dispatch
point.
- * All branches should be evaluated on the CURRENT row.
- * Set altPriority to branch's elemIdx for lexical
order tracking.
- * Append to pending_tail to maintain lexical order.
- */
- while (altIdx >= 0 && altIdx < pattern->numElements)
- {
- RPRPatternElement *altElem = &elements[altIdx];
- RPRNFAState *newState;
+ /* Recursively process this branch before next */
+ nfa_advance_state(winstate, ctx, newState, currentPos,
initialAdvance);
+ altIdx = altElem->jump;
+ }
- if (first)
- {
- state->elemIdx = altIdx;
- state->altPriority = altIdx;
- newState = state;
- first = false;
- }
- else
- {
- newState = nfa_state_clone(winstate,
altIdx, altIdx,
-
state->counts);
- }
+ if (first)
+ nfa_state_free(winstate, state);
+}
- /* Append to tail for lexical order */
- newState->next = NULL;
- if (pending_tail)
- pending_tail->next = newState;
- else
- pending = newState;
- pending_tail = newState;
+/*
+ * nfa_advance_end
+ *
+ * Handle END element: group repetition logic.
+ * Decides whether to loop back or exit based on count vs min/max.
+ */
+static void
+nfa_advance_end(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state, RPRPatternElement *elem,
+ int64 currentPos, bool initialAdvance)
+{
+ RPRPattern *pattern = winstate->rpPattern;
+ RPRPatternElement *elements = pattern->elements;
+ int depth = elem->depth;
+ int32 count = state->counts[depth];
- altIdx = altElem->jump;
- }
+ if (count < elem->min)
+ {
+ /* Must loop back */
+ RPRPatternElement *jumpElem;
- if (first)
- nfa_state_free(winstate, state);
- }
- else if (RPRElemIsEnd(elem))
- {
- count = state->counts[depth] + 1;
+ for (int d = depth + 1; d < pattern->maxDepth; d++)
+ state->counts[d] = 0;
+ state->elemIdx = elem->jump;
+ jumpElem = &elements[state->elemIdx];
- if (count < elem->min)
- {
- /*
- * Haven't reached minimum yet - must loop back.
- * Add to ctx->states (next row) not pending
(same row).
- */
- state->counts[depth] = count;
- for (int d = depth + 1; d < pattern->maxDepth;
d++)
- state->counts[d] = 0;
- state->elemIdx = elem->jump;
- nfa_add_state_unique(winstate, ctx, state);
- }
- else if (elem->max != RPR_QUANTITY_INF && count >=
elem->max)
- {
- /* Reached maximum - must exit, continue
processing */
- state->counts[depth] = 0;
- state->elemIdx = elem->next;
- state->next = NULL;
- if (pending_tail)
- pending_tail->next = state;
- else
- pending = state;
- pending_tail = state;
- }
+ nfa_route_to_elem(winstate, ctx, state, jumpElem, currentPos,
initialAdvance);
+ }
+ else if (elem->max != RPR_QUANTITY_INF && count >= elem->max)
+ {
+ /* Must exit */
+ RPRPatternElement *nextElem;
+
+ state->counts[depth] = 0;
+ state->elemIdx = elem->next;
+ nextElem = &elements[state->elemIdx];
+
+ /* END→END: increment outer END's count */
+ if (RPRElemIsEnd(nextElem) && state->counts[nextElem->depth] <
RPR_COUNT_MAX)
+ state->counts[nextElem->depth]++;
+
+ nfa_route_to_elem(winstate, ctx, state, nextElem, currentPos,
initialAdvance);
+ }
+ else
+ {
+ /* Between min and max - can exit or loop */
+ RPRElemIdx exitAltPriority;
+ RPRNFAState *exitState;
+ RPRPatternElement *jumpElem;
+ RPRPatternElement *nextElem;
+
+ /* Preserve altPriority for greedy extension */
+ exitAltPriority = state->altPriority;
+ if (ctx->matchedState != NULL)
+ exitAltPriority = ctx->matchedState->altPriority;
+
+ /* Create exit state first (need original counts before
modifying state) */
+ exitState = nfa_state_clone(winstate, elem->next,
exitAltPriority,
+
state->counts, state->isAbsorbable);
+ exitState->counts[depth] = 0;
+ nextElem = &elements[exitState->elemIdx];
+
+ /* END→END: increment outer END's count */
+ if (RPRElemIsEnd(nextElem) &&
exitState->counts[nextElem->depth] < RPR_COUNT_MAX)
+ exitState->counts[nextElem->depth]++;
+
+ /* Route loop state first (earlier in pattern = lexical order)
*/
+ state->counts[depth] = count;
+ for (int d = depth + 1; d < pattern->maxDepth; d++)
+ state->counts[d] = 0;
+ state->elemIdx = elem->jump;
+ jumpElem = &elements[state->elemIdx];
+
+ nfa_route_to_elem(winstate, ctx, state, jumpElem, currentPos,
initialAdvance);
+
+ /* Then route exit state */
+ nfa_route_to_elem(winstate, ctx, exitState, nextElem,
currentPos, initialAdvance);
+ }
+}
+
+/*
+ * nfa_advance_var
+ *
+ * Handle VAR element: loop/exit transitions.
+ * After match phase, all VAR states have matched - decide next action.
+ */
+static void
+nfa_advance_var(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state, RPRPatternElement *elem,
+ int64 currentPos, bool initialAdvance)
+{
+ RPRPattern *pattern = winstate->rpPattern;
+ RPRPatternElement *elements = pattern->elements;
+ int depth = elem->depth;
+ int32 count = state->counts[depth];
+ bool canLoop = (elem->max == RPR_QUANTITY_INF || count < elem->max);
+ bool canExit = (count >= elem->min);
+
+ if (canLoop && canExit)
+ {
+ /* Both: clone for loop, modify original for exit */
+ RPRNFAState *loopState;
+ RPRPatternElement *nextElem;
+
+ loopState = nfa_state_clone(winstate, state->elemIdx,
state->altPriority,
+
state->counts, state->isAbsorbable);
+ nfa_add_state_unique(winstate, ctx, loopState);
+
+ /* Exit: advance to next element */
+ state->counts[depth] = 0;
+ state->elemIdx = elem->next;
+ nextElem = &elements[state->elemIdx];
+
+ nfa_route_to_elem(winstate, ctx, state, nextElem, currentPos,
initialAdvance);
+ }
+ else if (canLoop)
+ {
+ /* Loop only: keep state as-is */
+ nfa_add_state_unique(winstate, ctx, state);
+ }
+ else if (canExit)
+ {
+ /* Exit only: modify state */
+ RPRPatternElement *nextElem;
+
+ state->counts[depth] = 0;
+ state->elemIdx = elem->next;
+ nextElem = &elements[state->elemIdx];
+
+ nfa_route_to_elem(winstate, ctx, state, nextElem, currentPos,
initialAdvance);
+ }
+ else
+ {
+ /* Neither - shouldn't happen, free state */
+ nfa_state_free(winstate, state);
+ }
+}
+
+/*
+ * nfa_advance_state
+ *
+ * Recursively process a single state through epsilon transitions.
+ * Uses DFS traversal to maintain lexical order: lower altPriority paths
+ * are fully processed before higher altPriority paths, ensuring states
+ * are added to ctx->states in lexical order.
+ */
+static void
+nfa_advance_state(WindowAggState *winstate, RPRNFAContext *ctx,
+ RPRNFAState *state, int64 currentPos, bool
initialAdvance)
+{
+ RPRPattern *pattern = winstate->rpPattern;
+ RPRPatternElement *elem;
+
+ Assert(state->elemIdx >= 0 && state->elemIdx < pattern->numElements);
+ elem = &pattern->elements[state->elemIdx];
+
+ switch (elem->varId)
+ {
+ case RPR_VARID_FIN:
+ /* FIN: record match (skip for initial advance) */
+ if (!initialAdvance)
+ nfa_add_matched_state(winstate, ctx, state,
currentPos);
else
- {
- /*
- * Between min and max - can exit or continue.
- * Exit state handling depends on what comes
next:
- * - If next is FIN: process on same row (match
ends here)
- * - If next is VAR/ALT: wait for next row
(needs new input)
- * Loop state always waits for next row.
- */
- RPRPatternElement *nextElem =
&elements[elem->next];
- RPRElemIdx exitAltPriority;
- RPRNFAState *exitState;
+ nfa_state_free(winstate, state);
+ break;
- /*
- * For greedy extension: if context already has
a match recorded,
- * preserve its altPriority. This ensures that
iterations of a
- * quantified group don't compete on lexical
order - the first
- * iteration's choice sets the preference, and
subsequent
- * iterations can extend it if they produce a
longer match.
- */
- exitAltPriority = state->altPriority;
- if (ctx->matchedState != NULL)
- exitAltPriority =
ctx->matchedState->altPriority;
+ case RPR_VARID_ALT:
+ nfa_advance_alt(winstate, ctx, state, elem, currentPos,
initialAdvance);
+ break;
- exitState = nfa_state_clone(winstate,
elem->next,
-
exitAltPriority,
-
state->counts);
- exitState->counts[depth] = 0;
+ case RPR_VARID_END:
+ nfa_advance_end(winstate, ctx, state, elem, currentPos,
initialAdvance);
+ break;
- if (RPRElemIsFin(nextElem))
- {
- /* Match ends at current row - append
to pending */
- exitState->next = NULL;
- if (pending_tail)
- pending_tail->next = exitState;
- else
- pending = exitState;
- pending_tail = exitState;
- }
- else
- {
- /* Next element (VAR/ALT) needs new row
input */
- nfa_add_state_unique(winstate, ctx,
exitState);
- }
+ default:
+ /* VAR element */
+ nfa_advance_var(winstate, ctx, state, elem, currentPos,
initialAdvance);
+ break;
+ }
+}
- state->counts[depth] = count;
- for (int d = depth + 1; d < pattern->maxDepth;
d++)
- state->counts[d] = 0;
- state->elemIdx = elem->jump;
- nfa_add_state_unique(winstate, ctx, state);
+/*
+ * nfa_advance
+ *
+ * Advance phase (divergence): transition from all surviving states.
+ * Called after match phase with matched VAR states, or at context creation
+ * for initial epsilon expansion (initialAdvance=true skips FIN matches).
+ *
+ * Processes states in order, using recursive DFS to maintain lexical order.
+ */
+static void
+nfa_advance(WindowAggState *winstate, RPRNFAContext *ctx, int64 currentPos,
+ bool initialAdvance)
+{
+ RPRNFAState *states = ctx->states;
+ RPRNFAState *state;
+
+ ctx->states = NULL; /* Will rebuild */
+
+ /* Process each state in order */
+ while (states != NULL)
+ {
+ state = states;
+ states = states->next;
+ state->next = NULL;
+
+ nfa_advance_state(winstate, ctx, state, currentPos,
initialAdvance);
+ }
+}
+
+/*
+ * nfa_process_row
+ *
+ * Process all contexts for one row using the new flow:
+ * 1. Match all contexts (convergence) - evaluate VARs, prune dead states
+ * 2. Absorb redundant contexts - ideal timing after convergence
+ * 3. Advance all contexts (divergence) - create new states for next row
+ */
+static void
+nfa_process_row(WindowAggState *winstate, int64 currentPos,
+ bool hasLimitedFrame, int64 frameOffset)
+{
+ RPRNFAContext *ctx;
+ bool *varMatched = winstate->nfaVarMatched;
+
+ /*
+ * Phase 1: Match all contexts (convergence)
+ * Evaluate VAR elements, update counts, remove dead states.
+ */
+ for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next)
+ {
+ if (ctx->states == NULL)
+ continue;
+
+ /* Check frame boundary - finalize if exceeded */
+ if (hasLimitedFrame)
+ {
+ int64 ctxFrameEnd = ctx->matchStartRow + frameOffset +
1;
+ if (currentPos >= ctxFrameEnd)
+ {
+ /* Frame boundary: match with NULL (force
mismatch), then advance */
+ nfa_match(winstate, ctx, NULL);
+ nfa_advance(winstate, ctx, ctxFrameEnd - 1,
false);
+ continue;
}
}
- else
+
+ nfa_match(winstate, ctx, varMatched);
+ ctx->lastProcessedRow = currentPos;
+ }
+
+ /*
+ * Phase 2: Absorb redundant contexts
+ * After match phase, states have converged - ideal for absorption.
+ * First update absorption flags that may have changed due to state
removal.
+ */
+ if (winstate->rpPattern->isAbsorbable)
+ {
+ RPRPattern *pattern = winstate->rpPattern;
+
+ for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next)
+ nfa_update_absorption_flags(ctx, pattern);
+
+ nfa_absorb_contexts(winstate);
+ }
+
+ /*
+ * Phase 3: Advance all contexts (divergence)
+ * Create new states (loop/exit) from surviving matched states.
+ */
+ for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next)
+ {
+ if (ctx->states == NULL)
+ continue;
+
+ /* Skip contexts already finalized in phase 1 */
+ if (hasLimitedFrame)
{
- state->elemIdx = elem->next;
- state->next = NULL;
- if (pending_tail)
- pending_tail->next = state;
- else
- pending = state;
- pending_tail = state;
+ int64 ctxFrameEnd = ctx->matchStartRow + frameOffset +
1;
+ if (currentPos >= ctxFrameEnd)
+ continue;
}
+
+ nfa_advance(winstate, ctx, currentPos, false);
}
}
+/*
+ * nfa_finalize_all_contexts
+ *
+ * Finalize all active contexts when partition ends.
+ * Match with NULL to force mismatch, then advance to process epsilon
transitions.
+ */
+static void
+nfa_finalize_all_contexts(WindowAggState *winstate, int64 lastPos)
+{
+ RPRNFAContext *ctx;
+
+ for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next)
+ {
+ if (ctx->states != NULL)
+ {
+ nfa_match(winstate, ctx, NULL);
+ nfa_advance(winstate, ctx, lastPos, false);
+ }
+ }
+}
diff --git a/src/backend/optimizer/plan/createplan.c
b/src/backend/optimizer/plan/createplan.c
index 99180ba339e..00dbcaed72c 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -2556,7 +2556,7 @@ create_windowagg_plan(PlannerInfo *root, WindowAggPath
*best_path)
ordCollations,
best_path->runCondition,
wc->rpSkipTo,
-
buildRPRPattern(wc->rpPattern, defineVariableList),
+
buildRPRPattern(wc->rpPattern, defineVariableList, wc->rpSkipTo,
wc->frameOptions),
wc->defineClause,
best_path->qual,
best_path->topwindow,
diff --git a/src/backend/optimizer/plan/rpr.c b/src/backend/optimizer/plan/rpr.c
index 9b847c5e8b1..6883bc5466d 100644
--- a/src/backend/optimizer/plan/rpr.c
+++ b/src/backend/optimizer/plan/rpr.c
@@ -6,6 +6,25 @@
* This file contains functions for optimizing RPR pattern AST and
* compiling it to bytecode for execution by WindowAgg.
*
+ * Key components:
+ * 1. Pattern Optimization: Simplifies patterns before compilation
+ * (e.g., flatten nested SEQ/ALT, merge consecutive vars)
+ * 2. Pattern Compilation: Converts AST to flat element array for NFA
+ * 3. Absorption Analysis: Computes flags for O(n^2)->O(n) optimization
+ *
+ * Context Absorption Optimization:
+ * When a pattern starts with an unbounded element (e.g., A+ or (A B)+),
+ * newer contexts cannot produce longer matches than older contexts.
+ * By absorbing (eliminating) redundant newer contexts, we reduce
+ * complexity from O(n^2) to O(n) for patterns like A+ B.
+ *
+ * The absorption analysis uses two element flags:
+ * - RPR_ELEM_ABSORBABLE: marks WHERE to compare (judgment point)
+ * - RPR_ELEM_ABSORBABLE_BRANCH: marks the absorbable region
+ *
+ * See computeAbsorbability() and the detailed comments before
+ * isUnboundedStart() for the full design explanation.
+ *
* Portions Copyright (c) 1996-2026, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
@@ -1124,14 +1143,13 @@ finalizeRPRPattern(RPRPattern *result)
int i;
RPRPatternElement *finElem;
- /* Set up next pointers for elements that don't have one yet */
+ /* Set up next pointers for elements that don't have one */
for (i = 0; i < finIdx; i++)
{
- if (result->elements[i].next == RPR_ELEMIDX_INVALID)
- {
- /* Point to next element, or to FIN if this is the last
*/
- result->elements[i].next = (i < finIdx - 1) ? i + 1 :
finIdx;
- }
+ RPRPatternElement *elem = &result->elements[i];
+
+ if (elem->next == RPR_ELEMIDX_INVALID)
+ elem->next = (i < finIdx - 1) ? i + 1 : finIdx;
}
/* Add FIN marker at the end */
@@ -1145,6 +1163,87 @@ finalizeRPRPattern(RPRPattern *result)
finElem->jump = RPR_ELEMIDX_INVALID;
}
+/*-------------------------------------------------------------------------
+ * CONTEXT ABSORPTION: TWO-FLAG DESIGN
+ *-------------------------------------------------------------------------
+ *
+ * Context absorption eliminates redundant match searches by absorbing newer
+ * contexts that cannot produce longer matches than older contexts. This
+ * achieves O(n^2) -> O(n) performance improvement for patterns like A+ B.
+ *
+ * Core Insight:
+ * For pattern A+ B, if Ctx1 starts at row 0 and Ctx2 starts at row 1,
+ * both matching A continuously, Ctx1 will always have more A matches.
+ * When B finally appears, Ctx1's match (0 to current) is always longer
+ * than Ctx2's match (1 to current). So Ctx2 can be safely eliminated.
+ *
+ * Two Flags:
+ * 1. RPR_ELEM_ABSORBABLE - "Absorption judgment point"
+ * WHERE contexts can be compared for absorption.
+ * - Simple unbounded VAR (A+): the VAR element itself
+ * - Unbounded GROUP ((A B)+): the END element only
+ *
+ * 2. RPR_ELEM_ABSORBABLE_BRANCH - "Absorbable region marker"
+ * ALL elements within the absorbable region.
+ * - Used for tracking state.isAbsorbable at runtime
+ * - States leaving this region become non-absorbable permanently
+ *
+ * Why Two Flags?
+ * For pattern "(A B)+", contexts at different positions (one at A,
+ * another at B) cannot be compared - they must synchronize at END.
+ *
+ * Example: "(A B)+" with input A B A B A B...
+ * Row 0 (A): Ctx1 starts, matches A
+ * Row 1 (B): Ctx1 matches B -> END (count=1)
+ * Row 2 (A): Ctx1 loops to A, Ctx2 starts at A
+ * Row 3 (B): Ctx1 at END (count=2), Ctx2 at END (count=1)
+ * -> Both at END, comparable! Ctx1 absorbs Ctx2.
+ *
+ * Contexts synchronize at END every group-length rows. Therefore:
+ * - ABSORBABLE marks END as judgment point (where to compare)
+ * - ABSORBABLE_BRANCH keeps state.isAbsorbable=true through A->B->END
+ *
+ * Pattern Examples:
+ *
+ * Pattern: A+ B
+ * Element 0 (A): ABSORBABLE | ABSORBABLE_BRANCH <- judgment point
+ * Element 1 (B): (none)
+ * -> Compare at A every row. When contexts move to B, absorption stops.
+ *
+ * Pattern: (A B)+ C
+ * Element 0 (A): ABSORBABLE_BRANCH
+ * Element 1 (B): ABSORBABLE_BRANCH
+ * Element 2 (END): ABSORBABLE | ABSORBABLE_BRANCH <- judgment point
+ * Element 3 (C): (none)
+ * -> Compare at END every 2 rows. When contexts move to C, absorption stops.
+ *
+ * Pattern: (A+ B+)+ C
+ * Element 0 (A): ABSORBABLE | ABSORBABLE_BRANCH <- only first A+ flagged
+ * Element 1 (B): (none)
+ * Element 2 (END): (none)
+ * Element 3 (C): (none)
+ * -> Only first unbounded portion (A+) gets flags. Absorption happens
+ * at A during first iteration. After moving to B+, absorption stops.
+ *
+ * First Unbounded Portion Strategy:
+ * The algorithm only flags the FIRST unbounded portion starting from
+ * element 0. This is sufficient because:
+ * - Absorption in first portion already achieves O(n) complexity
+ * - Later portions have different synchronization characteristics
+ * - Nested unbounded patterns are too complex for simple absorption
+ * - Complex patterns (nested groups, etc.) naturally die from mismatch
+ *
+ * Runtime Usage (in nodeWindowAgg.c):
+ * - state.isAbsorbable = (previous && elem.ABSORBABLE_BRANCH)
+ * - Monotonic: once false, stays false (cannot re-enter region)
+ * - context.hasAbsorbableState: can absorb others (>=1 absorbable state)
+ * - context.allStatesAbsorbable: can be absorbed (ALL states absorbable)
+ * - Absorption check: if Ctx1.hasAbsorbable && Ctx2.allAbsorbable,
+ * compare counts at same elemIdx, absorb if Ctx1.count >= Ctx2.count
+ *
+ *-------------------------------------------------------------------------
+ */
+
/*
* isUnboundedStart
* Check if the element at idx starts an unbounded sequence.
@@ -1152,6 +1251,11 @@ finalizeRPRPattern(RPRPattern *result)
* For context absorption to work, the sequence starting at idx must be
* unbounded (max = infinity) so that we can "shift" by decrementing count.
*
+ * Algorithm:
+ * - Descend through ALT/GROUP structures to find first actual VAR
+ * - For GROUP: must be unbounded AND contain only simple {1,1} VARs
+ * - Check if that VAR is unbounded
+ *
* Two cases are handled:
* 1. Simple var: A+ B C - first element A has max=INF
* 2. Group: (A B){2,} C - group END has max=INF, and all elements
@@ -1180,7 +1284,7 @@ isUnboundedStart(RPRPattern *pattern, RPRElemIdx idx)
e = &pattern->elements[i];
/* Exit when depth drops (left the group) or reached FIN */
- if (e->depth < startDepth || e->varId == RPR_VARID_FIN)
+ if (e->depth < startDepth || RPRElemIsFin(e))
{
lastElem = e;
break;
@@ -1202,7 +1306,7 @@ isUnboundedStart(RPRPattern *pattern, RPRElemIdx idx)
Assert(lastElem != NULL);
/* Case 2: Unbounded group - END element points back to idx */
- if (lastElem->varId == RPR_VARID_END &&
+ if (RPRElemIsEnd(lastElem) &&
lastElem->jump == idx &&
lastElem->max == RPR_QUANTITY_INF)
{
@@ -1219,9 +1323,9 @@ isUnboundedStart(RPRPattern *pattern, RPRElemIdx idx)
{
RPRPatternElement *e = &pattern->elements[j];
- if (e->varId == RPR_VARID_FIN)
+ if (RPRElemIsFin(e))
break; /* Reached end, no
outer nesting */
- if (e->varId == RPR_VARID_END && e->depth <
lastElem->depth)
+ if (RPRElemIsEnd(e) && e->depth < lastElem->depth)
return false; /* Found outer END, nested
structure */
}
@@ -1232,24 +1336,99 @@ isUnboundedStart(RPRPattern *pattern, RPRElemIdx idx)
return RPRElemIsVar(elem) && elem->max == RPR_QUANTITY_INF;
}
+/*
+ * setAbsorbabilityFlagsForBranch
+ * Set absorption flags for a single branch.
+ *
+ * For unbounded GROUP (e.g., (A B)+):
+ * - ABSORBABLE_BRANCH on all elements (A, B, END)
+ * - ABSORBABLE on END only (judgment point)
+ *
+ * For simple unbounded VAR (e.g., A+):
+ * - Both flags on first element only
+ */
+static void
+setAbsorbabilityFlagsForBranch(RPRPattern *pattern, RPRElemIdx branchIdx)
+{
+ RPRPatternElement *branchFirst = &pattern->elements[branchIdx];
+ RPRDepth startDepth = branchFirst->depth;
+ bool isUnboundedGroup = false;
+ RPRElemIdx endIdx = RPR_ELEMIDX_INVALID;
+ RPRElemIdx i;
+
+ /*
+ * First pass: detect if this is an unbounded GROUP pattern.
+ * Look for END element with unbounded max that jumps back to branch
start.
+ * Note: END is at parent depth (less than content depth), so check END
+ * before the depth-based break condition.
+ */
+ for (i = branchIdx; i < pattern->numElements; i++)
+ {
+ RPRPatternElement *e = &pattern->elements[i];
+
+ /* Check for unbounded END first (END is at lower depth than
content) */
+ if (RPRElemIsEnd(e) &&
+ e->jump == branchIdx &&
+ e->max == RPR_QUANTITY_INF)
+ {
+ isUnboundedGroup = true;
+ endIdx = i;
+ break;
+ }
+
+ if (e->depth < startDepth || RPRElemIsFin(e))
+ break;
+ }
+
+ /*
+ * Set flags based on pattern type.
+ */
+ if (isUnboundedGroup)
+ {
+ /*
+ * Unbounded GROUP: set ABSORBABLE_BRANCH on all elements from
+ * branchIdx to endIdx, and ABSORBABLE on END only.
+ */
+ for (i = branchIdx; i <= endIdx; i++)
+ {
+ RPRPatternElement *e = &pattern->elements[i];
+
+ e->flags |= RPR_ELEM_ABSORBABLE_BRANCH;
+ if (i == endIdx)
+ e->flags |= RPR_ELEM_ABSORBABLE;
+ }
+ }
+ else
+ {
+ /*
+ * Simple unbounded VAR: set both flags on first element only.
+ */
+ branchFirst->flags |= RPR_ELEM_ABSORBABLE_BRANCH |
RPR_ELEM_ABSORBABLE;
+ }
+}
+
/*
* computeAbsorbability
* Determine if pattern supports context absorption optimization.
*
- * Context absorption allows the executor to reuse NFA match states from
- * previous row positions. When a match at position P can be "shifted" to
- * position P+1 by decrementing the count of the first unbounded element,
- * we avoid re-running the NFA from scratch.
+ * Context absorption eliminates redundant match searches by absorbing
+ * newer contexts that cannot produce longer matches than older contexts.
+ * This achieves O(n²) → O(n) performance improvement.
*
- * This function sets RPR_ELEM_ABSORBABLE on elements that can be shifted,
- * and pattern->isAbsorbable if any element is absorbable.
+ * This function sets two flags:
+ * RPR_ELEM_ABSORBABLE: Absorption judgment point
+ * - Simple unbounded VAR: the VAR itself (e.g., A in A+)
+ * - Unbounded GROUP: the END element (e.g., END in (A B)+)
+ * RPR_ELEM_ABSORBABLE_BRANCH: All elements in absorbable region
+ * - All elements at same depth as unbounded start
*
* Examples:
- * A+ B C - absorbable (unbounded var at start)
- * (A B){2,} C - absorbable (unbounded group)
- * A B+ - NOT absorbable (unbounded not at start)
- * A+ | B+ C - both branches absorbable (checked independently)
- * A+ B | C D - first branch only
+ * A+ B C - absorbable (A gets both flags)
+ * (A B)+ C - absorbable (A,B,END get BRANCH, END gets ABSORBABLE)
+ * A B+ - NOT absorbable (unbounded not at start)
+ * (A+ B+)+ - only first A+ on first iteration (nested unbounded not
supported)
+ * A+ | B+ - both branches absorbable independently
+ * A+ | C D - only A+ branch absorbable (C D branch not absorbable)
*/
static void
computeAbsorbability(RPRPattern *pattern)
@@ -1261,10 +1440,11 @@ computeAbsorbability(RPRPattern *pattern)
if (pattern->numElements < 2) /* At minimum: one element + FIN */
return;
- if (first->varId == RPR_VARID_ALT)
+ if (RPRElemIsAlt(first))
{
/* ALT: check each branch */
RPRElemIdx branchIdx = first->next;
+ bool hasAbsorbableBranch = false;
while (branchIdx != RPR_ELEMIDX_INVALID &&
branchIdx < pattern->numElements - 1)
@@ -1273,20 +1453,28 @@ computeAbsorbability(RPRPattern *pattern)
if (isUnboundedStart(pattern, branchIdx))
{
- branchFirst->flags |= RPR_ELEM_ABSORBABLE;
pattern->isAbsorbable = true;
+ hasAbsorbableBranch = true;
+ setAbsorbabilityFlagsForBranch(pattern,
branchIdx);
}
branchIdx = branchFirst->jump;
}
+
+ /*
+ * If any branch is absorbable, mark ALT element too.
+ * This allows efficient branch-level flag management.
+ */
+ if (hasAbsorbableBranch)
+ first->flags |= RPR_ELEM_ABSORBABLE_BRANCH;
}
else
{
/* Non-ALT: check first element */
if (isUnboundedStart(pattern, 0))
{
- first->flags |= RPR_ELEM_ABSORBABLE;
pattern->isAbsorbable = true;
+ setAbsorbabilityFlagsForBranch(pattern, 0);
}
}
}
@@ -1307,7 +1495,8 @@ computeAbsorbability(RPRPattern *pattern)
* Returns NULL if pattern is NULL.
*/
RPRPattern *
-buildRPRPattern(RPRPatternNode *pattern, List *defineVariableList)
+buildRPRPattern(RPRPatternNode *pattern, List *defineVariableList,
+ RPSkipTo rpSkipTo, int frameOptions)
{
RPRPattern *result;
RPRPatternNode *optimized;
@@ -1316,6 +1505,7 @@ buildRPRPattern(RPRPatternNode *pattern, List
*defineVariableList)
int numElements;
RPRDepth maxDepth;
int idx;
+ bool hasLimitedFrame;
if (pattern == NULL)
return NULL;
@@ -1336,11 +1526,28 @@ buildRPRPattern(RPRPatternNode *pattern, List
*defineVariableList)
idx = 0;
fillRPRPattern(optimized, result, &idx, 0);
- /* Finalize: set up next pointers and add FIN marker */
+ /* Finalize: set up next pointers, flags, and add FIN marker */
finalizeRPRPattern(result);
- /* Compute context absorption eligibility */
- computeAbsorbability(result);
+ /*
+ * Compute context absorption eligibility.
+ * Absorption requires both structural absorbability and runtime
conditions.
+ * Check runtime conditions first to avoid unnecessary pattern analysis.
+ */
+ hasLimitedFrame = (frameOptions & FRAMEOPTION_ROWS) &&
+ !(frameOptions &
FRAMEOPTION_END_UNBOUNDED_FOLLOWING);
+
+ if (rpSkipTo == ST_PAST_LAST_ROW && !hasLimitedFrame)
+ {
+ /* Runtime conditions met - check structural absorbability */
+ computeAbsorbability(result);
+ }
+ else
+ {
+ /* Runtime conditions not met - skip pattern analysis */
+ result->isAbsorbable = false;
+ }
return result;
}
+
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 984316d9a61..a0fbdb196d4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -20324,6 +20324,11 @@ makeRPRQuantifier(int min, int max, bool reluctant)
{
RPRPatternNode *n = makeNode(RPRPatternNode);
+ if (reluctant)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("reluctant quantifiers are not yet
supported")));
+
n->min = min;
n->max = max;
n->reluctant = reluctant;
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a25d7642d3a..8bc51e3750b 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -2523,17 +2523,34 @@ typedef enum WindowAggStatus
*
* counts[] tracks repetition counts at each nesting depth.
* altPriority tracks lexical order for alternation (lower = earlier
alternative).
+ *
+ * isAbsorbable tracks if state is in absorbable region (ABSORBABLE_BRANCH).
+ * Monotonic property: once false, stays false (can't re-enter region).
+ *
+ * Absorption comparison uses elemIdx and counts[depth] directly because:
+ * - VAR elements consume a row, forcing states to wait for next row
+ * - END loop puts states at group start with iteration count preserved
+ * - At row end, comparable states are at the same position (elemIdx)
*/
typedef struct RPRNFAState
{
struct RPRNFAState *next; /* next state in linked list */
int16 elemIdx; /* current pattern element
index */
int16 altPriority; /* lexical order priority (lower =
preferred) */
+ bool isAbsorbable; /* true if state is in absorbable
region */
int32 counts[FLEXIBLE_ARRAY_MEMBER]; /* repetition counts by
depth */
} RPRNFAState;
/*
* RPRNFAContext - context for NFA pattern matching execution
+ *
+ * Two-flag absorption design:
+ * hasAbsorbableState: can this context absorb others? (≥1 absorbable state)
+ * - Monotonic: true→false only, cannot recover once false
+ * - Used to skip absorption attempts once all absorbable states are gone
+ * allStatesAbsorbable: can this context be absorbed? (ALL states absorbable)
+ * - Dynamic: can change false→true (when non-absorbable states die)
+ * - Used to determine if this context is eligible for absorption
*/
typedef struct RPRNFAContext
{
@@ -2545,6 +2562,10 @@ typedef struct RPRNFAContext
int64 matchEndRow; /* row where match ended (-1 = no
match) */
int64 lastProcessedRow; /* last row processed (for fail
depth) */
RPRNFAState *matchedState; /* FIN state for greedy fallback
(cloned) */
+
+ /* Two-flag absorption optimization (see CONTEXT_ABSORPTION_DESIGN.txt)
*/
+ bool hasAbsorbableState; /* can absorb others
(≥1 absorbable state) */
+ bool allStatesAbsorbable; /* can be absorbed (ALL states
absorbable) */
} RPRNFAContext;
/*
diff --git a/src/include/optimizer/rpr.h b/src/include/optimizer/rpr.h
index 4846ad6f2b7..be2ebb02c33 100644
--- a/src/include/optimizer/rpr.h
+++ b/src/include/optimizer/rpr.h
@@ -18,7 +18,6 @@
/* Limits and special values */
#define RPR_VARID_MAX 252 /* max pattern
variables: 252 */
-#define RPR_DEPTH_MAX UINT8_MAX /* max nesting depth: 255 */
#define RPR_QUANTITY_INF INT32_MAX /* unbounded quantifier */
#define RPR_COUNT_MAX INT32_MAX /* max runtime count (NFA
state) */
#define RPR_ELEMIDX_MAX INT16_MAX /* max pattern elements
*/
@@ -30,18 +29,21 @@
#define RPR_VARID_FIN ((RPRVarId) 255) /* pattern finish */
/* Element flags */
-#define RPR_ELEM_RELUCTANT 0x01 /* reluctant (non-greedy) quantifier */
-#define RPR_ELEM_ABSORBABLE 0x02 /* branch supports context absorption */
+#define RPR_ELEM_RELUCTANT 0x01 /* reluctant
(non-greedy) quantifier */
+#define RPR_ELEM_ABSORBABLE_BRANCH 0x02 /* element in absorbable region
*/
+#define RPR_ELEM_ABSORBABLE 0x04 /* absorption judgment
point */
/* Accessor macros for RPRPatternElement */
-#define RPRElemIsReluctant(e) ((e)->flags & RPR_ELEM_RELUCTANT)
-#define RPRElemIsAbsorbable(e) ((e)->flags & RPR_ELEM_ABSORBABLE)
+#define RPRElemIsReluctant(e) ((e)->flags &
RPR_ELEM_RELUCTANT)
+#define RPRElemIsAbsorbableBranch(e) ((e)->flags &
RPR_ELEM_ABSORBABLE_BRANCH)
+#define RPRElemIsAbsorbable(e) ((e)->flags &
RPR_ELEM_ABSORBABLE)
#define RPRElemIsVar(e) ((e)->varId <= RPR_VARID_MAX)
#define RPRElemIsAlt(e) ((e)->varId == RPR_VARID_ALT)
#define RPRElemIsEnd(e) ((e)->varId == RPR_VARID_END)
#define RPRElemIsFin(e) ((e)->varId == RPR_VARID_FIN)
#define RPRElemCanSkip(e) ((e)->min == 0)
-extern RPRPattern *buildRPRPattern(RPRPatternNode *pattern, List
*defineVariableList);
+extern RPRPattern *buildRPRPattern(RPRPatternNode *pattern, List
*defineVariableList,
+ RPSkipTo
rpSkipTo, int frameOptions);
#endif /* OPTIMIZER_RPR_H */
diff --git a/src/test/regress/expected/rpr.out
b/src/test/regress/expected/rpr.out
index dad454b858b..5a96ced4b52 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -2623,7 +2623,7 @@ SELECT company, tdate, price, first_value(price) OVER w,
last_value(price) OVER
UP AS price > PREV(1),
DOWN AS price < PREV(1)
);
-ERROR: row pattern navigation operation's argument must include at least one
column reference
+ERROR: reluctant quantifiers are not yet supported
-- Maximum pattern variables is 252 (RPR_VARID_MAX)
-- Ok: 252 variables (maximum allowed)
DO $$
@@ -2776,6 +2776,44 @@ WINDOW w AS (
-- Row 1: A B C D E (1-5)
-- Row 2: B C D (2-4) <- ends first!
-- Row 3: C D E F (3-6) <- ends last!
+-- Test 1b: Longer pattern FAILS, shorter pattern should survive
+-- Pattern: A+ B C D | A+ B (greedy at front for absorption)
+-- Data: A B C X (D expected but X found)
+-- A+ B C D fails at row 4 (X instead of D)
+-- A+ B succeeds at row 2
+-- Result should be match 1-2 (A+ B)
+WITH test_overlap1b AS (
+ SELECT * FROM (VALUES
+ (1, ARRAY['A']),
+ (2, ARRAY['B']),
+ (3, ARRAY['C']),
+ (4, ARRAY['D']),
+ (5, ARRAY['X'])
+ ) AS t(id, flags)
+)
+SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w
AS match_end
+FROM test_overlap1b
+WINDOW w AS (
+ ORDER BY id
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ PATTERN (A+ B C D E | B+ C)
+ DEFINE
+ A AS 'A' = ANY(flags),
+ B AS 'B' = ANY(flags),
+ C AS 'C' = ANY(flags),
+ D AS 'D' = ANY(flags),
+ E AS 'E' = ANY(flags)
+);
+ id | flags | match_start | match_end
+----+-------+-------------+-----------
+ 1 | {A} | |
+ 2 | {B} | 2 | 3
+ 3 | {C} | |
+ 4 | {D} | |
+ 5 | {X} | |
+(5 rows)
+
-- Test 2: A B+ C | B+ D - long B sequence with different endings
WITH test_overlap2 AS (
SELECT * FROM (VALUES
@@ -3592,7 +3630,7 @@ WINDOW w AS (
-- Expected: 1-4 (A C B C)
-- ============================================
--- RELUCTANT quantifiers (parser only - executor TODO)
+-- RELUCTANT quantifiers (not yet supported)
-- ============================================
-- Test: A+? B (reluctant) - CURRENTLY ACTS LIKE A+ (GREEDY)
-- Parser accepts the syntax but executor doesn't differentiate
@@ -3615,14 +3653,7 @@ WINDOW w AS (
A AS 'A' = ANY(flags),
B AS 'B' = ANY(flags)
);
- id | flags | match_start | match_end
-----+-------+-------------+-----------
- 1 | {A} | 1 | 4
- 2 | {A} | |
- 3 | {A} | |
- 4 | {B} | |
-(4 rows)
-
+ERROR: reluctant quantifiers are not yet supported
-- Current: 1-4 (A A A B) - greedy behavior
-- Expected when RELUCTANT implemented: 3-4 (A B) - shortest A before B
-- Test: A{1,3}? B (reluctant bounded)
@@ -3645,13 +3676,6 @@ WINDOW w AS (
A AS 'A' = ANY(flags),
B AS 'B' = ANY(flags)
);
- id | flags | match_start | match_end
-----+-------+-------------+-----------
- 1 | {A} | 1 | 4
- 2 | {A} | |
- 3 | {A} | |
- 4 | {B} | |
-(4 rows)
-
+ERROR: reluctant quantifiers are not yet supported
-- Current: 1-4 (greedy, takes max A before B)
-- Expected when RELUCTANT implemented: 3-4 (takes min A before B)
diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
index 4586eb4b04e..f9c161a791a 100644
--- a/src/test/regress/sql/rpr.sql
+++ b/src/test/regress/sql/rpr.sql
@@ -1320,6 +1320,36 @@ WINDOW w AS (
-- Row 2: B C D (2-4) <- ends first!
-- Row 3: C D E F (3-6) <- ends last!
+-- Test 1b: Longer pattern FAILS, shorter pattern should survive
+-- Pattern: A+ B C D | A+ B (greedy at front for absorption)
+-- Data: A B C X (D expected but X found)
+-- A+ B C D fails at row 4 (X instead of D)
+-- A+ B succeeds at row 2
+-- Result should be match 1-2 (A+ B)
+WITH test_overlap1b AS (
+ SELECT * FROM (VALUES
+ (1, ARRAY['A']),
+ (2, ARRAY['B']),
+ (3, ARRAY['C']),
+ (4, ARRAY['D']),
+ (5, ARRAY['X'])
+ ) AS t(id, flags)
+)
+SELECT id, flags, first_value(id) OVER w AS match_start, last_value(id) OVER w
AS match_end
+FROM test_overlap1b
+WINDOW w AS (
+ ORDER BY id
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP PAST LAST ROW
+ PATTERN (A+ B C D E | B+ C)
+ DEFINE
+ A AS 'A' = ANY(flags),
+ B AS 'B' = ANY(flags),
+ C AS 'C' = ANY(flags),
+ D AS 'D' = ANY(flags),
+ E AS 'E' = ANY(flags)
+);
+
-- Test 2: A B+ C | B+ D - long B sequence with different endings
WITH test_overlap2 AS (
SELECT * FROM (VALUES
@@ -1945,7 +1975,7 @@ WINDOW w AS (
-- Expected: 1-4 (A C B C)
-- ============================================
--- RELUCTANT quantifiers (parser only - executor TODO)
+-- RELUCTANT quantifiers (not yet supported)
-- ============================================
-- Test: A+? B (reluctant) - CURRENTLY ACTS LIKE A+ (GREEDY)
--
2.50.1 (Apple Git-155)