You could first transform each stream to a common format (in the worst
case, an ugly Either-like capturing all possible types), union those
streams, and then do a keyBy + window function.
This is how coGroup is implemented internally.
On 14/02/2022 16:08, Will Lauer wrote:
OK, here's what I hope is a stupid question: what's the most efficient
way to co-group more than 2 DataStreams together? I'm looking at
porting a pipeline from pig to flink, and in a couple of places I use
Pig's COGROUP functionality to simultaneously group 3 or 4 and
sometimes even more datasets on the same key simultaneously. Looking
at the Datastream API, I see how to group 2 datastreams, but I don't
see anything obvious for processing more than two simultaneously.
Obviously I could cogroup two, then cogroup the result with the next
one, etc adding each stream serially to the result, but that seems
inefficient. Is there a better way?
Will
*
*
Will Lauer
*
*
Senior Principal Architect, Audience & Advertising Reporting
Data Platforms & Systems Engineering
*
*
M 508 561 6427
Champaign Office
1908 S. First St
Champaign, IL 61822