[jira] Updated: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Uwe Schindler (JIRA) Thu, 14 Oct 2010 02:24:00 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-2690:
----------------------------------

    Attachment: LUCENE-2690.patch

Thanks for the improvements, some comments and changes I did locally:

- The code in BooleanQueryRewrite uses += for the boost and docFreq in the case 
of (>=0, no entry in BytesRefHash), but this should only be an assignment. The 
update and comparison in the assert should be done only when an entry is 
already in the hash. Boosts should never be sumed up.
- The parts for update with LUCENE-2702 are marked, they wrap currently with 
new BytesRef(#get(i)) and should be replaced with code like it was before using 
PagedBytes
- The work for creating the BytesStartArray is much to do, maybe we can unfinal 
the DirectBytesStartArray and reuse the code. This would make it easier to 
extend it and simply add more parallel arrays. Client code should not need to 
replcate the code (this is maybe another issue).
- But there is also a problem with the current code in TermFreqBoostByteStart: 
The arrays may not use the exact same size as expected (depending how 
oversize/grow works). As they are parallel arrays, all should be equal size, so 
we should only use grow/oversize only for the base array and resize the others 
to same size. Do we have an ArrayUtil method for that? Currently it (may) be 
broken. Any comments?

> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-hack.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Updated: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Reply via email to