Hi Ken,
Never mind. I did a map of Tuple2.of(C, 1) -> Tuple2.of(F, 1) before
keyBy, then replaced the map after sum with Tuple2.of(F, ) -> d . I
got the expected result. So it's very likely something is wrong with C in
determining equalness. Thank you all the same.
- Luke
On Sun, Apr 30, 2023
Hi Luke,
Without more details, I don’t have a good explanation for why you’re seeing
duplicate results.
— Ken
> On Apr 29, 2023, at 7:38 PM, Luke Xiong wrote:
>
> Hi Ken,
>
> My workflow looks like this:
>
> dataStream
> .map(A -> B)
> .flatMap(B -> some Tuple2.of(C, 1))
> .keyB
Hi Luke,
What’s your workflow look like?
The one I included in my email generates what I’d expect.
— Ken
> On Apr 29, 2023, at 7:22 PM, Luke Xiong wrote:
>
> Hi Marco and Ken,
>
> Thanks for the tips. I tried setting runtime mode to BATCH and it seems to
> work. However, I notice there are
Hi Luke,
A batch has a beginning and an end. Although a stream has a beginning, it has
no end.
Thus, you'll have three choices:
1. You can collect your aggregated sum at a period interval
2. You can aggregate the sum in a time window. If your window is large enough,
you'll get the same result.
Dear experts,
Is it possible to write a WordCount job that uses the DataStream API, but
make it behave like the batch version WordCount example?
More specifically, I hope the job can get a DataStream of the final (word,
count) records when fed a text file.
For example, given a text file:
```inpu