I suspect you're getting leading wildcard searches as well, which must do entire term scans unless you're doing the reverse trick.
Replacing all successive whitespace gives you: Lorem*ipsum*dolor*sit*amet,*consetetur*sadipscing*elitr,*sed*diam*nonumy*eirmod*tempor*invidunt*ut*labore*et*dolore*magna*aliquyam*erat,*sed*diam*voluptua.*At*vero*eos*et*accusam*et*justo*duo*dolores*et*ea*rebum.*Stet*clita*kasd*gubergren,*no*sea*takimata*sanctus*est*Lorem*ipsum*dolor*sit*amet.*Lorem*ipsum*dolor*sit*amet,*consetetur*sadipscing*elitr,*sed*diam*nonumy*eirmod*tempor*invidunt*ut*labore*et*dolore*magna*aliquyam*erat,*sed*diam*voluptua.*At*vero*eos*et*accusam*et*justo*duo*dolores*et*ea*rebum.*Stet*clita*kasd*gubergren,*no*sea*takimata*sanctus*est*Lorem*ipsum*dolor*sit*amet Note, no spaces. Then you're pushing it through the KeywordTokenizer which does essentially nothing. What a term! Your point is valid however, why this is taking so long I don't quite know. But I tend to agree that it's such an edge case that the hard-core FST guys would look at it for curiosity's sake only.... Best, Erick On Thu, Jun 26, 2014 at 5:34 AM, Jack Krupansky <j...@basetechnology.com> wrote: > I'll defer the the hard-core Lucene committers for the technical details, > but I would suggest that a very large term with dozens of wildcards is a > "known limitation" (albeit not well-documented.) IOW, to use wildcards in > Lucene in a performant manner, they need to be "brief". > > -- Jack Krupansky > > -----Original Message----- From: Clemens Wyss DEV > Sent: Thursday, June 26, 2014 3:17 AM > To: java-user@lucene.apache.org > Subject: QueryParserUtil, big query with wildcards -> runs endlessly and > produces heavy load > > > The following "testcase" runs endlessly and produces VERY heavy load. > ... > String query = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed > diam nonumy eirmod tempor invidunt ut " > + "labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et > accusam et justo duo dolores et " > + "ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem > ipsum dolor sit amet. " > + "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy > eirmod tempor invidunt " > + "ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos > et accusam et justo duo dolores " > + "et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem > ipsum dolor sit amet"; > String query = query.replaceAll( "\\s+", "*" ); > try > { > QueryParserUtil.parse( query, new String[] { "test" }, new Occur[] { > Occur.MUST }, new KeywordAnalyzer() ); > } > catch ( Exception e ) > { > Assert.fail( e.getMessage() ); > } > ... > I don't say this testcase makes "sense", nevertheless the question remains > whether this is a bug or a "feature"? > > Context: Lucene 4.7.2, Java 6 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org