[jira] [Commented] (LUCENE-1421) Ability to group search results by field

Michael McCandless (JIRA) Tue, 17 May 2011 03:38:32 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034691#comment-13034691
 ]


Michael McCandless commented on LUCENE-1421:
--------------------------------------------

I adding grouping queries to the nightly benchmarks
(http://people.apache.org/~mikemccand/lucenebench) -- see
TermGroup100/10K/1M.  The "F" annotation is the day grouping queries
first ran.

Those queries are the same queries running as TermQuery, just with
grouping turned on on 3 randomly generated fields, with 100, 10,000
and 1 million unique values.  So we can gauge the perf hit by
comparing to TermQuery each night.

I use the CachingCollector.

First off, I'm impressed that the perf hit for grouping is not too
bad:

||Query||QPS||Slowdown||
|TermQuery (baseline)|30.72|0|
|TermGroup100|13.59|2.26|
|TermQuery10K|13.2|2.34|
|TermQuery1M|12.15|2.53|

I had expected we'd pay a bigger perf hit!

Second, there more unique groups you have, the slower grouping gets,
but that multiplier really isn't so bad -- the 1M unique groups case
is only 10.6% slower than the 100 unique groups case.

Remember, though, that these groups are randomly generated
full-unicode strings, so real data could very well produce different
results...

Third, and this is insanity, the addition of grouping caused other
unexpected changes.  Most horribly, SpanNearQuery slowed down
by ~12.2%
(http://people.apache.org/~mikemccand/lucenebench/SpanNear.html),
while other queries seem to get a bit faster.  I think this is
[frustratingly!] due to hotspot making different decisions about which
code to optimize/inline.

Similarly strange, when I added sorting (TermQuery sorting by title
and date/time, "E" annotation in all graphs), I saw the variance in
the unsorted TermQuery performance drop substantially.  I'm pretty
sure this wide variance was due to hotspot's erratic decision making,
but somehow the addition of sorting, while not change TermQuery's mean
QPS, caused hotspot to at least be somewhat more consistent in how it
compiled the code.  Maybe as we add more and more diverse queries to
the benchmark we'll see hotspot behave more "reasonably"....


> Ability to group search results by field
> ----------------------------------------
>
>                 Key: LUCENE-1421
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1421
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: core/search
>            Reporter: Artyom Sokolov
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-1421.patch, LUCENE-1421.patch, 
> lucene-grouping.patch
>
>
> It would be awesome to group search results by specified field. Some 
> functionality was provided for Apache Solr but I think it should be done in 
> Core Lucene. There could be some useful information like total hits about 
> collapsed data like total count and so on.
> Thanks,
> Artyom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-1421) Ability to group search results by field

Reply via email to