Hi Luke, A batch has a beginning and an end. Although a stream has a beginning, it has no end.
Thus, you'll have three choices: 1. You can collect your aggregated sum at a period interval 2. You can aggregate the sum in a time window. If your window is large enough, you'll get the same result. 3. You can also run the stream in batch mode. Remember, a stream does not end (unless it is run in batch mode). > On Apr 29, 2023, at 9:11 AM, Marco Villalobos <mvillalo...@kineteque.com> > wrote: > > Hi Luke, > > A batch has a beginning and an end. Although a stream has a beginning, it has > no end. > > Thus, you'll have three choices: > > 1. You can collect your aggregated sum at a period interval > 2. You can aggregate the sum in a time window. If your window is large > enough, you'll get the same result. > 3. You can also run the stream in batch mode. > > Remember, a stream does not end (unless it is run in batch mode). > > > >> On Apr 28, 2023, at 10:56 PM, Luke Xiong <leix...@gmail.com> wrote: >> >> Dear experts, >> >> Is it possible to write a WordCount job that uses the DataStream API, but >> make it behave like the batch version WordCount example? >> >> More specifically, I hope the job can get a DataStream of the final (word, >> count) records when fed a text file. >> >> For example, given a text file: >> ```input.txt >> hello hello world hello world >> hello world world world hello world >> ``` >> >> In the flink WordCount examples, the batch version outputs: >> ```batch.version.output >> hello 5 >> world 6 >> ``` >> >> while the stream version outputs: >> ```stream.version.output >> (hello,1) >> (hello,2) >> (world,1) >> (hello,3) >> (world,2) >> (hello,4) >> (world,3) >> (world,4) >> (world,5) >> (hello,5) >> (world,6) >> ``` >> Is it possible to have a DataStream that only has two elements: (hello, 5) >> and (world, 6)? >> >> Regards, >> Luke >