Re: AbstractMultiTermQueryConstantScoreWrapper cost estimates (https://github.com/apache/lucene/issues/13029)

2024-08-06 Thread Patrick Zhai
No I don't think NFA will help this case? It potentially reduces the cost of NFA to DFA, but if DFA needs to scan the whole term dictionary then NFA needs to do the same I think On Tue, Aug 6, 2024, 08:01 Michael Sokolov wrote: > But actually Patrick Zhai added support for nondeterministic regex

Re: AbstractMultiTermQueryConstantScoreWrapper cost estimates (https://github.com/apache/lucene/issues/13029)

2024-08-06 Thread Michael Sokolov
But actually Patrick Zhai added support for nondeterministic regexes that might help with cases like that? There is this in TestRegexpQuery: /** Test worst-case for getCommonSuffix optimization */ public void testSlowCommonSuffix() throws Exception { expectThrows( TooComplexToDeter

Re: AbstractMultiTermQueryConstantScoreWrapper cost estimates (https://github.com/apache/lucene/issues/13029)

2024-08-06 Thread Michael Sokolov
Yes, I think degenerate regexes like *a* are potentially costly. Actually something like *Ⱗ* is probably worse since yeah it would need to scan the entire FST (which probably has some a's in it?) I don't see any way around that aside from: (1) telling user don't do that, or (2) putting some accoun