Hi Luke,

A batch has a beginning and an end. Although a stream has a beginning, it has 
no end.

Thus, you'll have three choices:

1. You can collect your aggregated sum at a period interval
2. You can aggregate the sum in a time window. If your window is large enough, 
you'll get the same result.
3. You can also run the stream in batch mode. 

Remember, a stream does not end (unless it is run in batch mode).

> On Apr 29, 2023, at 9:11 AM, Marco Villalobos <mvillalo...@kineteque.com> 
> wrote:
> 
> Hi Luke,
> 
> A batch has a beginning and an end. Although a stream has a beginning, it has 
> no end.
> 
> Thus, you'll have three choices:
> 
> 1. You can collect your aggregated sum at a period interval
> 2. You can aggregate the sum in a time window. If your window is large 
> enough, you'll get the same result.
> 3. You can also run the stream in batch mode. 
> 
> Remember, a stream does not end (unless it is run in batch mode).
> 
> 
> 
>> On Apr 28, 2023, at 10:56 PM, Luke Xiong <leix...@gmail.com> wrote:
>> 
>> Dear experts,
>> 
>> Is it possible to write a WordCount job that uses the DataStream API, but 
>> make it behave like the batch version WordCount example?
>> 
>> More specifically, I hope the job can get a DataStream of the final (word, 
>> count) records when fed a text file.
>> 
>> For example, given a text file:
>> ```input.txt
>> hello hello world hello world
>> hello world world world hello world
>> ```
>> 
>> In the flink WordCount examples, the batch version outputs:
>> ```batch.version.output
>> hello 5
>> world 6
>> ```
>> 
>> while the stream version outputs:
>> ```stream.version.output
>> (hello,1)
>> (hello,2)
>> (world,1)
>> (hello,3)
>> (world,2)
>> (hello,4)
>> (world,3)
>> (world,4)
>> (world,5)
>> (hello,5)
>> (world,6)
>> ```
>> Is it possible to have a DataStream that only has two elements: (hello, 5) 
>> and (world, 6)?
>> 
>> Regards,
>> Luke
> 

Reply via email to