Hi Averell, I am also in favor of option 2. Besides, you could use CoProcessFunction instead of CoFlatMapFunction and try to wrap elements of stream_A and stream_B using the `Either` class.
Best, Xingcan > On Aug 15, 2018, at 2:24 PM, vino yang <yanghua1...@gmail.com> wrote: > > Hi Averell, > > As far as these two solutions are concerned, I think you can only choose > option 2, because as you have stated, the current Flink DataStream API does > not support the replacement of one of the input stream types of > CoFlatMapFunction. Another choice: > > 1. Split it into two separate jobs. But in comparison, I still think that > Option 2 is better. > 2. Since you said that stream_c is slower and has fewer updates, if it is not > very large, you can store it in the RDBMS and then join it with stream_a and > stream_b respectively (using CoFlatMapFunction as well). > > I think you should give priority to your option 2. > > Thanks, vino. > > Averell <lvhu...@gmail.com <mailto:lvhu...@gmail.com>> 于2018年8月15日周三 下午1:51写道: > Hi, > > I have stream_A of type "Dog", which needs to be transformed using data from > stream_C of type "Name_Mapping". As stream_C is a slow one (data is not > being updated frequently), to do the transformation I connect two streams, > do a keyBy, and then use a RichCoFlatMapFunction in which mapping data from > stream_C is saved into a State (flatMap1 generates 1 output, while flatMap2 > is just to update State table, not generating any output). > > Now I have another stream B of type "Cat", which also needs to be > transformed using data from stream_C. After that transformation, > transformed_B will go through a completely different pipeline from > transformed A. > > I can see two approaches for this: > 1. duplicate stream_C and the RichCoFlatMapFunction and apply on stream_B > 2. create a new stream D of type "Animal", transform it with C, then split > the result into two streams using split/select using case class pattern > matching. > > My question is which option should I choose? > With option 1, at least I need to maintain two State tables, let alone the > cost for duplicating stream (I am not sure how expensive this is in term of > resource), and the requirement on duplicating the CoFlatMapFunction (*). > With option 2, there's additional cost coming from unioning, > splitting/selecting, and type-casting at the final streams. > Is there any better option for me? > > Thank you very much for your support. > Regards, > Averell > > (*) I am using Scala, and I tried to create a RichCoFlatMapFunction of type > [Animal, Name_Mapping] but it cannot be used for a stream of [Dog, > Name_Mapping] or [Cat, Name_Mapping]. Thus I needed to duplicate the > Function as well. > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>