Hi Vijay,

Option 1 is the right answer. `keyBy1` and `keyBy2` contain all data in
`inputStream`.
While option 2 replicate all data to each task and option 3 split data into
smaller groups without duplication.

Best, Hequn

On Fri, Oct 26, 2018 at 7:01 AM Vijay Balakrishnan <bvija...@gmail.com>
wrote:

> Hi,
> I need to broadcast/parallelize an incoming stream(inputStream) into 5
> streams with the same data. Each stream is keyed by different keys to do
> various grouping operations on the set.
>
> Do I just use inputStream.keyBy(5 diff keys) and then just use the
> DataStream to perform windowing/grouping operations ?
>
> *DataStream<Long> inputStream= ...*
> *DataStream<Long>  keyBy1 = inputStream.keyBy((d) -> d._1);*
> *DataStream<Long>  keyBy2 = inputStream.keyBy((d) -> d._2);*
>
> *DataStream<Long> out1Stream = keyBy1.flatMap(new Key1Function());// do
> windowing/grouping operations in this function*
> *DataStream<Long> out2Stream = keyBy2.flatMap(new Key2Function());// do
> windowing/grouping operations in this function*
>
> out1Stream.print();
> out2Stream.addSink(new Out2Sink());
>
> Will this work ?
>
> Or do I use the keyBy Stream with a broadcast function like this:
>
> *BroadcastStream<Long> broadCastStream = inputStream.broadcast(..);*
> *DataSTream out1Stream = keyBy1.connect(broadCastStream)*
> * .process(new KeyedBroadcastProcessFunction...)*
>
> *DataSTream out2Stream = keyBy2.connect(broadCastStream)*
> * .process(new KeyedBroadcastProcessFunction...)*
>
> Or do I need to use split:
>
> *SplitStream<Long> source = inputStream.split(new MyOutputSelector());*
> *source.select("").flatMap(new Key1Function()).addSink(out1Sink);*
> source.select("").flatMap(new Key2Function()).addSink(out2Sink);
>
>
> static final class MyOutputSelector implements OutputSelector<Long> {
> List<String> outputs = new ArrayList<String>();
> public Iterable<String> select(Long value) {
> outputs.add("");
> return outputs;
> }
> }
> TIA,
>

Reply via email to