The test case is "only" parsing this query, not trying to run it,
right? So it doesn't involve automaton/FST ... just the flexible
query parser code?
It seems bad that flexible QP would take so long, even if the query is
"strange".
Can you open an issue, and maybe attach a thread dump so we can
I suspect you're getting leading wildcard searches as well, which must
do entire term scans unless you're doing the reverse trick.
Replacing all successive whitespace gives you:
Lorem*ipsum*dolor*sit*amet,*consetetur*sadipscing*elitr,*sed*diam*nonumy*eirmod*tempor*invidunt*ut*labore*et*dolore*magn
I'll defer the the hard-core Lucene committers for the technical details,
but I would suggest that a very large term with dozens of wildcards is a
"known limitation" (albeit not well-documented.) IOW, to use wildcards in
Lucene in a performant manner, they need to be "brief".
-- Jack Krupansky