[ https://issues.apache.org/jira/browse/FLINK-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Flink Jira Bot updated FLINK-4854: ---------------------------------- Labels: auto-unassigned features (was: features stale-assigned) > Efficient Batch Operator in Streaming > ------------------------------------- > > Key: FLINK-4854 > URL: https://issues.apache.org/jira/browse/FLINK-4854 > Project: Flink > Issue Type: Improvement > Components: API / DataStream > Reporter: Xiaowei Jiang > Assignee: Guowei Ma > Priority: Major > Labels: auto-unassigned, features > Original Estimate: 168h > Remaining Estimate: 168h > > Very often, it's more efficient to process a batch of records at once instead > of processing them one by one. We can use window to achieve this > functionality. However, window will store all records in states, which can be > costly. It's desirable to have an efficient implementation of batch operator. > The batch operator works per task and behave similarly to aligned windows. > Here is an example of how the interface looks like to a user. > {code} > interface BatchFunction { > // add the record to the buffer > // returns if the batch is ready to be flushed > boolean addRecord(T record); > // process all pending records in the buffer > void flush(Collector collector) ; > } > DataStream ds = ... > BatchFunction func = ... > ds.batch(func); > {code} > The operator calls addRecord for each record. The batch function saves the > record in its own buffer. The addRecord returns if the pending buffer should > be flushed. In that case, the operator invokes flush. -- This message was sent by Atlassian Jira (v8.3.4#803005)