[jira] Issue Comment Edited: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Simon Willnauer (JIRA) Thu, 14 Oct 2010 21:26:03 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921015#action_12921015
 ]


Simon Willnauer edited comment on LUCENE-2690 at 10/15/10 12:25 AM:
--------------------------------------------------------------------

Just as a first result here are the results I see on my workstation with a 10 M 
Wikipedia index (5 segments):

||Query||QPS trunk||QPS LUCENE-2690||Pct diff||||
|unit state|3.74|3.81|{color:green}1.8%{color}|
|united~0.6|10.07|10.26|{color:green}1.9%{color}|
|unit*|11.89|12.65|{color:green}6.5%{color}|
|united~0.7|39.29|45.52|{color:green}15.9%{color}|
|un*d|15.17|27.86|{color:green}83.7%{color}|


using the latest patch.

those are run with Xmx2G on  an intel core2 3ghz


      was (Author: simonw):
    Just as a first result here are the results I see on my workstation with a 
10 M Wikipedia index:

||Query||QPS trunk||QPS LUCENE-2690||Pct diff||||
|unit state|3.74|3.81|{color:green}1.8%{color}|
|united~0.6|10.07|10.26|{color:green}1.9%{color}|
|unit*|11.89|12.65|{color:green}6.5%{color}|
|united~0.7|39.29|45.52|{color:green}15.9%{color}|
|un*d|15.17|27.86|{color:green}83.7%{color}|


using the latest patch.

those are run with Xmx2G on  an intel core2 3ghz

  
> Do MultiTermQuery boolean rewrites per segment
> ----------------------------------------------
>
>                 Key: LUCENE-2690
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2690
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 4.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.0
>
>         Attachments: LUCENE-2690-attributes.patch, 
> LUCENE-2690-attributes.patch, LUCENE-2690-attributes.patch, 
> LUCENE-2690-hack.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, LUCENE-2690.patch, 
> LUCENE-2690.patch, LUCENE-2690.patch
>
>
> MultiTermQuery currently rewrites FuzzyQuery (using 
> TopTermsBooleanQueryRewrite), the auto constant rewrite method and the 
> ScoringBQ rewrite methods using a MultiFields wrapper on the top-level 
> reader. This is inefficient.
> This patch changes the rewrite modes to do the rewrites per segment and uses 
> some additional datastructures (hashed sets/maps) to exclude duplicate terms. 
> All tests currently pass, but FuzzyQuery's tests should not, because it 
> depends for the minimum score handling, that the terms are collected in 
> order..
> Robert will fix FuzzyQuery in this issue, too. This patch is just a start.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Issue Comment Edited: (LUCENE-2690) Do MultiTermQuery boolean rewrites per segment

Reply via email to