Hi,

I'm looking for the right way to do the following scheme:

1. Read data
2. Split it into partitions for parallel processing
3. In every partition group data in N elements batches
4. Process these batches

My first attempt was:
*dataStream.keyBy(_.key).countWindow(..)*
But countWindow groups by a key. I however want to group all elements in
partition.

Then I tried:
*dataStream.keyBy(_.key).countWindowAll(..)*
But apparently countWindowAll doesn't work on partitioned data.

So, my last version is:

*dataStream.keyBy(_.key.hashCode % 4).countWindow(..)*
But is looks kida hacky with hardcoded partitions number.

So, what's the right way of doing it?

Best regards,
Dmitry

Reply via email to