[ https://issues.apache.org/jira/browse/SOLR-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306291#comment-17306291 ]
Jan Høydahl commented on SOLR-15210: ------------------------------------ Please consider deleting the remote branch https://github.com/apache/solr/tree/jira/solr-15210 and instead use a private fork for new PRs against new solr.git. See SOLR-15253 for the central branch cleanup effort - I'm planning to clean up remaining non-active branches in a week or so... > ParallelStream should execute hashing & filtering directly in ExportWriter > -------------------------------------------------------------------------- > > Key: SOLR-15210 > URL: https://issues.apache.org/jira/browse/SOLR-15210 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > Attachments: SOLR-15210.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently ParallelStream uses {{HashQParserPlugin}} to partition the work > based on a hashed value of {{partitionKeys}}. Unfortunately, this filter has > a high initial runtime cost because it has to materialize all values of > {{partitionKeys}} on each worker in order to calculate their hash and decide > whether a particular doc belongs to the worker's partition. > The alternative approach would be for the worker to collect and sort all > documents and only then filter out the ones that belong to the current > partition just before they are written out by {{ExportWriter}} - at this > point we have to materialize the fields anyway but also we can benefit from a > (minimal) BytesRef caching that the FieldWriters use. On the other hand we > pay the price of sorting all documents, and we also lose the query filter > caching that the {{HashQParserPlugin}} uses. > This tradeoff is not obvious but should be investigated to see if it offers > better performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)