[ 
https://issues.apache.org/jira/browse/SOLR-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306291#comment-17306291
 ] 

Jan Høydahl commented on SOLR-15210:
------------------------------------

Please consider deleting the remote branch 
https://github.com/apache/solr/tree/jira/solr-15210 and instead use a private 
fork for new PRs against new solr.git. See SOLR-15253 for the central branch 
cleanup effort - I'm planning to clean up remaining non-active branches in a 
week or so...

> ParallelStream should execute hashing & filtering directly in ExportWriter
> --------------------------------------------------------------------------
>
>                 Key: SOLR-15210
>                 URL: https://issues.apache.org/jira/browse/SOLR-15210
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Major
>         Attachments: SOLR-15210.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ParallelStream uses {{HashQParserPlugin}} to partition the work 
> based on a hashed value of {{partitionKeys}}. Unfortunately, this filter has 
> a high initial runtime cost because it has to materialize all values of 
> {{partitionKeys}} on each worker in order to calculate their hash and decide 
> whether a particular doc belongs to the worker's partition.
> The alternative approach would be for the worker to collect and sort all 
> documents and only then filter out the ones that belong to the current 
> partition just before they are written out by {{ExportWriter}} - at this 
> point we have to materialize the fields anyway but also we can benefit from a 
> (minimal) BytesRef caching that the FieldWriters use. On the other hand we 
> pay the price of sorting all documents, and we also lose the query filter 
> caching that the {{HashQParserPlugin}} uses.
> This tradeoff is not obvious but should be investigated to see if it offers 
> better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to