Ugh sorry I spaced out! Can you cherry-pick? Thanks. Mike McCandless
http://blog.mikemccandless.com On Fri, Jun 17, 2016 at 4:44 PM, Steve Rowe <[email protected]> wrote: > Thanks Mike! > > I noticed you didn’t push to branch_5_5 - was that just an oversight, or > am I being too impatient? (I can cherry-pick it if you’d like me to.) > > -- > Steve > www.lucidworks.com > > > On Jun 17, 2016, at 2:56 PM, Michael McCandless < > [email protected]> wrote: > > > > Hi Steve, > > > > I dug some more, and I think this small change fixes the buggy test: > > > > diff --git > a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > > index d97d8d4..f4ead23 100644 > > --- a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > > +++ b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > > @@ -165,7 +165,7 @@ public class TestBoolean2 extends LuceneTestCase { > > "w1 w2 w3 w4 w5", > > "w1 w3 w2 w3", > > "w1 xx w2 yy w3", > > - "w1 w3 xx w2 yy w3" > > + "w1 w3 xx w2 yy mm" > > }; > > > > public void queriesTest(Query query, int[] expDocNrs) throws > Exception { > > > > > > Those strings are the documents that the test indexes, and the root > cause of the failure is that w3 appears twice in the last document (tf=2), > and the last document is longer. The test assumed (illegally) that the > last document would always get a worst score than the one before it, but > that's up to the similarity and something changed here between 5.x and 6.x. > > > > I'll push this shortly to 5.x, 6.x, master... > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > On Fri, Jun 17, 2016 at 11:08 AM, Steve Rowe <[email protected]> wrote: > > Thanks for digging, Mike. > > > > These tests aren’t failing on 6.x (including the backport to branch_6_0: > 0/100 TestBoolean2 beasting failures just nnw) - in your digging did you > come across anything that would explain that? > > > > I’d rather not revert this bugfix backport just because it exposed > other, possibly test-only?, bugs, but I understand that spending a bunch of > time on an old patch release is non-optimal :). > > > > -- > > Steve > > www.lucidworks.com > > > > > On Jun 17, 2016, at 9:45 AM, Michael McCandless < > [email protected]> wrote: > > > > > > OK I dug a bit, specifically on this test failure: > > > > > > ant test -Dtestcase=TestBoolean2 -Dtests.method=testQueries01 > -Dtests.seed=5787EE10A58E0A9C -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=nn-NO -Dtests.timezone=America/St_Vincent > -Dtests.asserts=true -Dtests.file.encoding=US-ASCII > > > > > > and something else is at play: this particular test case uses > ConjunctionScorer, not BooleanScorer (where the original bug was). > > > > > > What happens for this failing seed is the correct 2 documents match, > but the 2nd one unexpectedly gets a better score, possibly only when enough > filler docs were added. I think it's a poor test because it seems to rely > on the ClassicSimilarity valuing shorter document (5 vs 6 tokens) more than > a higher tf for term w3 (2 vs 1), which is bad. Really our tests should > not rely on specific scoring factors. > > > > > > Net/net this seems like a test bug, but I'm not sure how to fix it. > > > > > > Mike McCandless > > > > > > http://blog.mikemccandless.com > > > > > > On Fri, Jun 17, 2016 at 6:05 AM, Michael McCandless < > [email protected]> wrote: > > > I'll dig. > > > > > > Mike McCandless > > > > > > http://blog.mikemccandless.com > > > > > > On Thu, Jun 16, 2016 at 10:55 PM, Steve Rowe <[email protected]> wrote: > > > Thanks for looking Hoss. > > > > > > I compared files changed by the commits on branch_6x and on > branch_5_5, and I don’t see anything consequential, so I don’t think this > is a case of a misapplied backport. > > > > > > -- > > > Steve > > > www.lucidworks.com > > > > > > > On Jun 16, 2016, at 6:25 PM, Chris Hostetter < > [email protected]> wrote: > > > > > > > > > > > > : : I ran this test before I committed the backport, but it > succeeded then. > > > > : : I beasted it on current branch_5_5 and 49/100 seeds succeeded. > > > > : > > > > : one of the things that cahnged as part of LUCENE-7132 was that i > made all > > > > : the BQ related tests start randomizing setDisableCoord() ... so > you might > > > > : be seeing some previously unidentified coord related bug that is > only in > > > > : the 5.x line of code? > > > > : > > > > : that could probably jive with the roughtly 50% failure ratio you're > > > > : seeing? > > > > > > > > Hmmm .... nope. Even with the setDisableCoord commented out (but > still > > > > consuming random().nextBoolean() consistently) the same seeds > reliably > > > > fail on branch_5_5 > > > > > > > > Looks like the "~50%" comes from the "use filler docs or not?" bit > of the > > > > test? with the patch below i can't find any seeds to fail -- which > makes > > > > it seem like the crux of the original bug (results incorrect when > docs are > > > > in diff blocks) is still relevant even after the backport to > branch_5_5. > > > > > > > > Mike -- any idea what might still be the problem here? > > > > > > > > > > > > > > > > -Hoss > > > > http://www.lucidworks.com/ > > > > > > > > > > > > diff --git > > > > a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > > > > b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > > > > index d97d8d4..596eb64 100644 > > > > --- a/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > > > > +++ b/lucene/core/src/test/org/apache/lucene/search/TestBoolean2.java > > > > @@ -67,6 +67,7 @@ public class TestBoolean2 extends LuceneTestCase { > > > > public static void beforeClass() throws Exception { > > > > // in some runs, test immediate adjacency of matches - in > others, force a full bucket gap betwen docs > > > > NUM_FILLER_DOCS = random().nextBoolean() ? 0 : > BooleanScorer.SIZE; > > > > + NUM_FILLER_DOCS = 0; // nocommit > > > > PRE_FILLER_DOCS = TestUtil.nextInt(random(), 0, (NUM_FILLER_DOCS > / 2)); > > > > > > > > directory = newDirectory(); > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [email protected] > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
