[
https://issues.apache.org/jira/browse/SOLR-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025241#comment-15025241
]
Joel Bernstein edited comment on SOLR-8337 at 12/11/15 2:30 AM:
----------------------------------------------------------------
Patch adds a single *reduce()* method for ReduceOperation that returns a single
Tuple, which is the final reduction.
The *operate(Tuple)* method will be called for each Tuple that is read by the
*ReducerStream*.
The reduce() method will be called each time the group by key changes. This
will give the ReduceOperation a chance to finish the reduce algorithm and
return a single Tuple. The ReduceOperation will also clear it's internal memory
after each call to reduce() to prepare for the next Tuple grouping.
was (Author: joel.bernstein):
Patch adds a single *reduce()* method that returns a single Tuple, which is the
final reduction.
The *operate(Tuple)* method will be called for each Tuple that is read by the
*ReducerStream*.
The reduce() method will be called each time the group by key changes. This
will give the ReduceOperation a chance to finish the reduce algorithm and
return a single Tuple. The ReduceOperation will also clear it's internal memory
after each call to reduce() to prepare for the next Tuple grouping.
> Add ReduceOperation and wire it into the ReducerStream
> ------------------------------------------------------
>
> Key: SOLR-8337
> URL: https://issues.apache.org/jira/browse/SOLR-8337
> Project: Solr
> Issue Type: Bug
> Reporter: Joel Bernstein
> Attachments: SOLR-8337.patch, SOLR-8337.patch, SOLR-8337.patch,
> SOLR-8337.patch, SOLR-8337.patch
>
>
> The current ReducerStream groups all documents that share the same key(s)
> into a list and emits a single Tuple that contains this list. There is no way
> to tell the ReducerStream to do something more interesting with groups, for
> example summing a column within a group, or joining tuples.
> This ticket adds a new type of operation called a ReduceOperation which is
> passed to the ReducerStream so that the reduce behavior can be specialized.
> The ReduceOperation has two methods:
> 1) operate(Tuple) : This is called once for each Tuple in a group. This
> method can be used to aggregate Tuples as they added to a group.
> 2) reduce() : This is called when the group keys change. This method returns
> a single Tuple which is output by the ReducerStream. The ReduceOperation must
> clear it's internal structures when reduce is called as well, to prepare for
> the next group.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]