[ 
https://issues.apache.org/jira/browse/SOLR-11769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-11769:
--------------------------------
    Attachment: SOLR-11769_Optimize_MatchAllDocsQuery_more.patch

I couldn't help myself today but dig deeper here and figure out why there would 
be a slow-down even without the above change.  The result is a more broader 
improvement for match-all-docs (*:*) scenarios that is not specific to the 
particular useFilterForSortedQuery situation above.  Essentially 
SolrIndexSearcher.getProcessedFilter would return an empty pf if there are no 
queries or filters (the semantics mean match-all-docs).  But there is the 
"answer" field that could be populated with getLiveDocs, and pf.answer is 
examined by getDocSet so it can return early.  I also optimized 
DocSetUtil.createDocSet to check that the query arg is a MatchAllDocsQuery and 
the live docs is "instantiated", allowing us to return that directly.

The only quirky thing about this was a test failure I fixed in 
TestSolrQueryParser that checked the filter cache insert delta after executing 
a query.  The additional call to getLiveDocs in this patch by 
getProcessedFilter occurred which got in the cache and increased the counter an 
additional time.  An assumption I make in the getProcessedFilter change is that 
returning getLiveDocs is either cheap or a forlorn conclusion that it will 
ultimately be instantiated at some point any way so might as well get it on 
with.  Alternatively, it's caller could instead check for this case (e.g. 
filter == null && postFilter == null then return getLiveDocs)?  But that seems 
less clean since "answer" is there for a reason so why avoid it.

[[email protected]] can you please review this?  It intersects with 
modifications you've done in the past.

As an aside, I think it would be good if more DocSet methods in 
SolrIndexSearcher move over to DocSetUtil so that we can keep the unwieldily 
SolrIndexSearcher tamed.  Essentially I suggest a similar change as I did for 
the SolrDocumentFetcher refactoring but for DocSets.  With such a change, it 
would have access to the searcher and not need it in the methods.

> Sorting performance degrades when useFilterForSortedQuery is enabled and 
> there is no filter query specified
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11769
>                 URL: https://issues.apache.org/jira/browse/SOLR-11769
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 4.10.4
>         Environment: OS: macOS Sierra (version 10.12.4)
> Memory: 16GB
> CPU: 2.9 GHz Intel Core i7
> Java Version: 1.8
>            Reporter: Betim Deva
>            Assignee: David Smiley
>              Labels: performance
>         Attachments: SOLR-11769_Optimize_MatchAllDocsQuery_more.patch
>
>
> The performance of sorting degrades significantly when the 
> {{useFilterForSortedQuery}} is enabled, and there's no filter query specified.
> *Steps to Reproduce:*
> 1. Set {{useFilterForSortedQuery=true}} in {{solrconfig.xml}}
> 2. Run a  query to match and return a single document. Also add sorting
> - Example {{/select?q=foo:123&sort=bar+desc}}
> Having a large index (> 10 million documents), this yields to a slow response 
> (a few hundreds of milliseconds on average) even when the resulting set 
> consists of a single document.
> *Observation 1:*
> - Disabling {{useFilterForSortedQuery}} improves the performance to < 1ms
> *Observation 2:*
> - Removing the {{sort}} improves the performance to < 1ms
> *Observation 3:*
> - Keeping the {{sort}}, and adding any filter query (such as {{fq=\*:\*}}) 
> improves the performance to < 1 ms.
> After profiling 
> [SolrIndexSearcher.java|https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob;f=solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java;h=9ee5199bdf7511c70f2cc616c123292c97d36b5b;hb=HEAD#l1400]
>  found that the bottleneck is on 
> {{DocSet bigFilt = getDocSet(cmd.getFilterList());}} 
> when {{cmd.getFilterList())}} is passed in as {{null}}. This is making 
> {{getDocSet()}} function collect document ids every single time it is called 
> without any caching.
> {code:java}
> 1394     if (useFilterCache) {
> 1395       // now actually use the filter cache.
> 1396       // for large filters that match few documents, this may be
> 1397       // slower than simply re-executing the query.
> 1398       if (out.docSet == null) {
> 1399         out.docSet = getDocSet(cmd.getQuery(), cmd.getFilter());
> 1400         DocSet bigFilt = getDocSet(cmd.getFilterList());
> 1401         if (bigFilt != null) out.docSet = 
> out.docSet.intersection(bigFilt);
> 1402       }
> 1403       // todo: there could be a sortDocSet that could take a list of
> 1404       // the filters instead of anding them first...
> 1405       // perhaps there should be a multi-docset-iterator
> 1406       sortDocSet(qr, cmd);
> 1407     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to