Hi Steve,
I dug some more, and I think this small change fixes the buggy test:
diff --git
a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
index d97d8d4..f4ead23 100644
--- a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
+++ b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
@@ -165,7 +165,7 @@ public class TestBoolean2 extends LuceneTestCase {
"w1 w2 w3 w4 w5",
"w1 w3 w2 w3",
"w1 xx w2 yy w3",
- "w1 w3 xx w2 yy w3"
+ "w1 w3 xx w2 yy mm"
};
public void queriesTest(Query query, int[] expDocNrs) throws Exception {
Those strings are the documents that the test indexes, and the root cause
of the failure is that w3 appears twice in the last document (tf=2), and
the last document is longer. The test assumed (illegally) that the last
document would always get a worst score than the one before it, but that's
up to the similarity and something changed here between 5.x and 6.x.
I'll push this shortly to 5.x, 6.x, master...
Mike McCandless
http://blog.mikemccandless.com
On Fri, Jun 17, 2016 at 11:08 AM, Steve Rowe <[email protected]> wrote:
> Thanks for digging, Mike.
>
> These tests aren’t failing on 6.x (including the backport to branch_6_0:
> 0/100 TestBoolean2 beasting failures just nnw) - in your digging did you
> come across anything that would explain that?
>
> I’d rather not revert this bugfix backport just because it exposed other,
> possibly test-only?, bugs, but I understand that spending a bunch of time
> on an old patch release is non-optimal :).
>
> --
> Steve
> www.lucidworks.com
>
> > On Jun 17, 2016, at 9:45 AM, Michael McCandless <
> [email protected]> wrote:
> >
> > OK I dug a bit, specifically on this test failure:
> >
> > ant test -Dtestcase=TestBoolean2 -Dtests.method=testQueries01
> -Dtests.seed=5787EE10A58E0A9C -Dtests.multiplier=3 -Dtests.slow=true
> -Dtests.locale=nn-NO -Dtests.timezone=America/St_Vincent
> -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
> >
> > and something else is at play: this particular test case uses
> ConjunctionScorer, not BooleanScorer (where the original bug was).
> >
> > What happens for this failing seed is the correct 2 documents match, but
> the 2nd one unexpectedly gets a better score, possibly only when enough
> filler docs were added. I think it's a poor test because it seems to rely
> on the ClassicSimilarity valuing shorter document (5 vs 6 tokens) more than
> a higher tf for term w3 (2 vs 1), which is bad. Really our tests should
> not rely on specific scoring factors.
> >
> > Net/net this seems like a test bug, but I'm not sure how to fix it.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Fri, Jun 17, 2016 at 6:05 AM, Michael McCandless <
> [email protected]> wrote:
> > I'll dig.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > On Thu, Jun 16, 2016 at 10:55 PM, Steve Rowe <[email protected]> wrote:
> > Thanks for looking Hoss.
> >
> > I compared files changed by the commits on branch_6x and on branch_5_5,
> and I don’t see anything consequential, so I don’t think this is a case of
> a misapplied backport.
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Jun 16, 2016, at 6:25 PM, Chris Hostetter <[email protected]>
> wrote:
> > >
> > >
> > > : : I ran this test before I committed the backport, but it succeeded
> then.
> > > : : I beasted it on current branch_5_5 and 49/100 seeds succeeded.
> > > :
> > > : one of the things that cahnged as part of LUCENE-7132 was that i
> made all
> > > : the BQ related tests start randomizing setDisableCoord() ... so you
> might
> > > : be seeing some previously unidentified coord related bug that is
> only in
> > > : the 5.x line of code?
> > > :
> > > : that could probably jive with the roughtly 50% failure ratio you're
> > > : seeing?
> > >
> > > Hmmm .... nope. Even with the setDisableCoord commented out (but still
> > > consuming random().nextBoolean() consistently) the same seeds reliably
> > > fail on branch_5_5
> > >
> > > Looks like the "~50%" comes from the "use filler docs or not?" bit of
> the
> > > test? with the patch below i can't find any seeds to fail -- which
> makes
> > > it seem like the crux of the original bug (results incorrect when docs
> are
> > > in diff blocks) is still relevant even after the backport to
> branch_5_5.
> > >
> > > Mike -- any idea what might still be the problem here?
> > >
> > >
> > >
> > > -Hoss
> > > http://www.lucidworks.com/
> > >
> > >
> > > diff --git
> > > a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
> > > b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
> > > index d97d8d4..596eb64 100644
> > > --- a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
> > > +++ b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java
> > > @@ -67,6 +67,7 @@ public class TestBoolean2 extends LuceneTestCase {
> > > public static void beforeClass() throws Exception {
> > > // in some runs, test immediate adjacency of matches - in others,
> force a full bucket gap betwen docs
> > > NUM_FILLER_DOCS = random().nextBoolean() ? 0 : BooleanScorer.SIZE;
> > > + NUM_FILLER_DOCS = 0; // nocommit
> > > PRE_FILLER_DOCS = TestUtil.nextInt(random(), 0, (NUM_FILLER_DOCS /
> 2));
> > >
> > > directory = newDirectory();
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>