Re: A WordCount job using DataStream API but behave like the batch WordCount example

2023-04-30 Thread Luke Xiong
Hi Ken, Never mind. I did a map of Tuple2.of(C, 1) -> Tuple2.of(F, 1) before keyBy, then replaced the map after sum with Tuple2.of(F, ) -> d . I got the expected result. So it's very likely something is wrong with C in determining equalness. Thank you all the same. - Luke On Sun, Apr 30, 2023

Re: A WordCount job using DataStream API but behave like the batch WordCount example

2023-04-30 Thread Ken Krugler
Hi Luke, Without more details, I don’t have a good explanation for why you’re seeing duplicate results. — Ken > On Apr 29, 2023, at 7:38 PM, Luke Xiong wrote: > > Hi Ken, > > My workflow looks like this: > > dataStream > .map(A -> B) > .flatMap(B -> some Tuple2.of(C, 1)) > .keyB

Re: A WordCount job using DataStream API but behave like the batch WordCount example

2023-04-29 Thread Ken Krugler
Hi Luke, What’s your workflow look like? The one I included in my email generates what I’d expect. — Ken > On Apr 29, 2023, at 7:22 PM, Luke Xiong wrote: > > Hi Marco and Ken, > > Thanks for the tips. I tried setting runtime mode to BATCH and it seems to > work. However, I notice there are

Re: A WordCount job using DataStream API but behave like the batch WordCount example

2023-04-29 Thread Marco Villalobos
Hi Luke, A batch has a beginning and an end. Although a stream has a beginning, it has no end. Thus, you'll have three choices: 1. You can collect your aggregated sum at a period interval 2. You can aggregate the sum in a time window. If your window is large enough, you'll get the same result.