I thought about it, i think this is the best fix I think RegExp.toString is just for debugging, not really guaranteed to be reparseable.
On Mon, Jul 26, 2010 at 1:49 PM, Michael McCandless < [email protected]> wrote: > OK lemme try... > > Mike > > On Mon, Jul 26, 2010 at 1:47 PM, Robert Muir <[email protected]> wrote: > > maybe you can try this on your beast computer for a while? I think its > > better. > > Index: lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java > > =================================================================== > > --- lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java > > (revision 979377) > > +++ lucene/src/test/org/apache/lucene/search/TestRegexpRandom2.java > (working > > copy) > > @@ -86,9 +86,9 @@ > > private class DumbRegexpQuery extends MultiTermQuery { > > private final Automaton automaton; > > > > - DumbRegexpQuery(Term term) { > > + DumbRegexpQuery(Term term, int flags) { > > super(term.field()); > > - RegExp re = new RegExp(term.text()); > > + RegExp re = new RegExp(term.text(), flags); > > automaton = re.toAutomaton(); > > } > > > > @@ -130,8 +130,8 @@ > > * simple regexpquery implementation. > > */ > > private void assertSame(String regexp) throws IOException { > > - RegexpQuery smart = new RegexpQuery(new Term("field", regexp)); > > - DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", > regexp)); > > + RegexpQuery smart = new RegexpQuery(new Term("field", regexp), > > RegExp.NONE); > > + DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", > regexp), > > RegExp.NONE); > > > > // we can't compare the two if automaton rewrites to a simpler enum. > > // for example: "a\uda07\udcc7?.*?" gets rewritten to a simpler > query: > > Index: > > lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java > > =================================================================== > > --- > lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java > > (revision 979377) > > +++ > lucene/src/test/org/apache/lucene/util/automaton/AutomatonTestUtil.java > > (working copy) > > @@ -40,7 +40,9 @@ > > if (!UnicodeUtil.validUTF16String(regexp)) > > continue; > > try { > > - return new RegExp(regexp, RegExp.NONE); > > + // NOTE: we parse-tostring-parse again, because we are > > + // really abusing RegExp.toString() here (its just for > debugging) > > + return new RegExp(new RegExp(regexp, RegExp.NONE).toString(), > > RegExp.NONE); > > } catch (Exception e) {} > > } > > } > > > > On Mon, Jul 26, 2010 at 1:32 PM, Michael McCandless > > <[email protected]> wrote: > >> > >> Hmm that doesn't fix it. > >> > >> The magical seed is: 50686536365145364L (for TRR2). > >> > >> I still see the IAE even if I remove that RegExp.NONE. > >> > >> Mike > >> > >> On Mon, Jul 26, 2010 at 1:28 PM, Robert Muir <[email protected]> wrote: > >> > I don't think we should do that. I think i found the problem: > >> > AutomatonTestUtil's randomRegexp does this: > >> > try { > >> > return new RegExp(regexp, RegExp.NONE); > >> > } catch (Exception e) {} > >> > I think the RegExp.NONE is the problem, and we should remove it. > >> > because the test then toString's it and compiles it again, but without > >> > this > >> > option. > >> > On Mon, Jul 26, 2010 at 1:16 PM, Michael McCandless > >> > <[email protected]> wrote: > >> >> > >> >> My random stress testing hit an IllegalArgExc because the random > >> >> regexp was malformed. > >> >> > >> >> Does this patch look OK to fix? > >> >> > >> >> Index: src/test/org/apache/lucene/search/TestRegexpRandom2.java > >> >> =================================================================== > >> >> --- src/test/org/apache/lucene/search/TestRegexpRandom2.java > >> >> (revision > >> >> 979227) > >> >> +++ src/test/org/apache/lucene/search/TestRegexpRandom2.java > >> >> (working > >> >> copy) > >> >> @@ -130,6 +130,14 @@ > >> >> * simple regexpquery implementation. > >> >> */ > >> >> private void assertSame(String regexp) throws IOException { > >> >> + try { > >> >> + new RegExp(regexp); > >> >> + } catch (IllegalArgumentException iae) { > >> >> + // the random regexp could be malformed, eg "foo"bar", > >> >> + // so we ignore this > >> >> + return; > >> >> + } > >> >> + > >> >> RegexpQuery smart = new RegexpQuery(new Term("field", regexp)); > >> >> DumbRegexpQuery dumb = new DumbRegexpQuery(new Term("field", > >> >> regexp)); > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: [email protected] > >> >> For additional commands, e-mail: [email protected] > >> >> > >> > > >> > > >> > > >> > -- > >> > Robert Muir > >> > [email protected] > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] > >> For additional commands, e-mail: [email protected] > >> > > > > > > > > -- > > Robert Muir > > [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Robert Muir [email protected]
