Hi.
Disclaimer: I had some private off-list discussions with Henson Choi
regarding execRPR.c, and the NFA.
The following are more formal comments regarding execRPR.c and the NFA
algorithm, based on v47-0001 to v47-0009, I haven't applied the
incremental diff yet.
(Since the incremental diffs are scattered across several threads, a
unified v48 would be better).
(I'm still wrapping my head around the NFA in execRPR.c, so take the
comments below with a grain of salt.)
PATTERN (A B C D)
We can short-circuit and exit early if any of the evaluations (A, B,
or C) fail in nfa_match.
This is necessary since the chance of a pattern element evaluation
returning false is not rare, I think.
For src/backend/executor/README.rpr:
We should explicitly explain 'absorbable' and 'absorption' somewhere in
README.rpr, as the text currently just assumes the reader knows what they mean.
Using some example illustrate "absorption" meaning, put it on
README.rpr would be great.
We can also mention that 'DFS' refers to Depth-First Search".
``````
(4) Call nfa_advance(initialAdvance=true)
``````
In V47, the variable `initialAdvance` does not exist.
In nfa_advance_var, after the first Assert, we can add:
Assert(elem->next < pattern->numElements);
ExecRPRFinalizeAllContexts seems unnecessary; I commented it out,
rerun the regress tests
(TESTS='test_setup rpr_base rpr_nfa rpr_explain rpr_integration rpr'
meson test -C $BUILD3 --num-processes 20 --suite regress --verbose)
Only two SQL tests in rpr_explain.sql failed.
SELECT first_value(id) OVER w AS match_start FROM stock_ticks
WINDOW w AS ( ORDER BY id ROWS BETWEEN CURRENT ROW AND UNBOUNDED
FOLLOWING AFTER MATCH SKIP PAST LAST ROW PATTERN ((A B) {2}) DEFINE A
AS price < 100, B AS price < 100);
The query above invokes the following code. Since the PATTERN above is
not greedy, is the comment below incorrect?
``````
else
{
/* Greedy: enter first, skip second */
state->elemIdx = elem->next;
nfa_route_to_elem(winstate, ctx, state,
&elements[state->elemIdx], currentPos);
if (skipState != NULL)
{
nfa_route_to_elem(winstate, ctx, skipState,
&elements[elem->jump], currentPos);
}
}
``````
nfa_advance_var
```
else if (canExit)
{
state->counts[depth] = 0;
state->elemIdx = elem->next;
nextElem = &elements[state->elemIdx];
/*
* Update isAbsorbable for target element (monotonic: AND preserves
* false)
*/
state->isAbsorbable = state->isAbsorbable &&
RPRElemIsAbsorbableBranch(nextElem);
/* See comment above: increment outer END count for quantified VARs */
if (RPRElemIsEnd(nextElem))
{
if (state->counts[nextElem->depth] < RPR_COUNT_MAX)
state->counts[nextElem->depth]++;
}
nfa_route_to_elem(winstate, ctx, state, nextElem, currentPos);
}
```
The above ELSE IF overrides all RPRNFAState field values except
RPRNFAState->next.
Should we set RPRNFAState->next to NULL?
(If I add ``state->next = NULL;`` in the above ELSE IF branch, all the
regress tests still pass)
Some comments about the situation of (RPRNFAState) state->next would be great.
This also happens in `` if (canLoop && canExit)``, ``if (reluctant)`` branch.
For function nfa_advance_var, I don't understand the meaning of the
variable "count", after the first Assert I have added below:
if (count > 2 && !state->isAbsorbable && RPRElemIsReluctant(elem))
elog(INFO, "count is %d and elem is reluctant and isAbsorbable
is false XXXXX", count);
if (count > 2 && state->isAbsorbable && RPRElemIsReluctant(elem))
elog(INFO, "count is %d and elem is reluctant and isAbsorbable
is true XXXXX", count);
Rerunning the regress tests shows that count >= 3 occurs very infrequently.
I'm don't understand isAbsorbable, so I am not sure if the second
``elog(INFO`` is actually reachable or not.
Can we add more complex queries (more count >= 3) to check if the
"count" variable is working correctly?
In function nfa_add_state_unique:
/* Mark VAR in visited before duplicate check to prevent DFS loops */
winstate->nfaVisitedElems[WORDNUM(state->elemIdx)] |=
((bitmapword) 1 << BITNUM(state->elemIdx));
I honestly don't understand the purpose of the code block above. But it doesn't
seem to influence the subsequent FOR LOOP;
It only modifies its own variables without any other side effects within
nfa_add_state_unique. Could we add some comments explaining which external
functions rely on this code and why it belongs in nfa_add_state_unique?
nfa_states_equal
compareDepth = elem->depth + 1; /* depth 0 needs 1 count, etc. */
The comment above isn't helpful, IMHO, and I don't understand it.
We should focus on why compareDepth should be ```elem->depth + 1```.
function nfa_add_state_unique return value is not being used?
Do we need to do something with the return value, or is this expected?
(I don't have an opinion on it, I guess it would be better to raise this issue)
In nfa_advance_alt, during the main WHILE loop, I think altElem->depth
must be larger than elem->depth.
Therefore we can do
``````
if (altElem->depth == elem->depth)
elog(ERROR, "nfa_advance_alt altElem->depth should not be
the same as elem->depth reached");
if (altElem->depth < elem->depth)
break;
``````
--
jian
https://www.enterprisedb.com/