On Sat, Sep 9, 2023 at 4:21 AM Tatsuo Ishii <is...@sraoss.co.jp> wrote: > Then we will get for str_set: > r0: B > r1: AB > > Because r0 only has classifier B, r1 can have A and B. Problem is, > r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then > r2 = AB. I guess this is the issue you pointed out.
Right. > Yeah, probably we have delay evaluation of such pattern variables like > A, then reevaluate A after the first scan. > > What about leaving this (reevaluation) for now? Because: > > 1) we don't have CLASSIFIER > 2) we don't allow to give CLASSIFIER to PREV as its arggument > > so I think we don't need to worry about this for now. Sure. I'm all for deferring features to make it easier to iterate; I just want to make sure the architecture doesn't hit a dead end. Or at least, not without being aware of it. Also: is CLASSIFIER the only way to run into this issue? > What if we don't follow the standard, instead we follow POSIX EREs? I > think this is better for users unless RPR's REs has significant merit > for users. Piggybacking off of what Vik wrote upthread, I think we would not be doing ourselves any favors by introducing a non-compliant implementation that performs worse than a traditional NFA. Those would be some awful bug reports. > > - I think we have to implement a parallel parser regardless (RPR PATTERN > > syntax looks incompatible with POSIX) > > I am not sure if we need to worry about this because of the reason I > mentioned above. Even if we adopted POSIX NFA semantics, we'd still have to implement our own parser for the PATTERN part of the query. I don't think there's a good way for us to reuse the parser in src/backend/regex. > > Does that seem like a workable approach? (Worst-case, my code is just > > horrible, and we throw it in the trash.) > > Yes, it seems workable. I think for the first cut of RPR needs at > least the +quantifier with reasonable performance. The current naive > implementation seems to have issue because of exhaustive search. +1 Thanks! --Jacob