Hi Gyula, I have a couple of operator on the pipeline. Filter, mapper, flatMap, and each of these operator contains some cache data.
So i think that means for every other operator on the pipeline, i will need to add a new stream to update each cache data. Cheers On Thu, Aug 20, 2015 at 5:33 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > Hi, > > I don't think I fully understand your question, could you please try to be > a little more specific? > > I assume by caching you mean that you keep the current model as an > operator state. Why would you need to add new streams in this case? > > I might be slow to answer as I am currently on vacation without stable > internet connection. > > Cheers, > Gyula > > On Thu, Aug 20, 2015 at 5:36 AM Welly Tambunan <if05...@gmail.com> wrote: > >> Hi Gyula, >> >> I have another question. So if i cache something on the operator, to keep >> it up to date, i will always need to add and connect another stream of >> changes to the operator ? >> >> Is this right for every case ? >> >> Cheers >> >> On Wed, Aug 19, 2015 at 3:21 PM, Welly Tambunan <if05...@gmail.com> >> wrote: >> >>> Hi Gyula, >>> >>> That's really helpful. The docs is improving so much since the last time >>> (0.9). >>> >>> Thanks a lot ! >>> >>> Cheers >>> >>> On Wed, Aug 19, 2015 at 3:07 PM, Gyula Fóra <gyula.f...@gmail.com> >>> wrote: >>> >>>> Hey, >>>> >>>> If it is always better to check the events against a more up-to-date >>>> model (even if the events we are checking arrived before the update) then >>>> it is fine to keep the model outside of the system. >>>> >>>> In this case we need to make sure that we can push the updates to the >>>> external system consistently. If you are using the PersistentKafkaSource >>>> for instance it can happen that some messages are replayed in case of >>>> failure. In this case you need to make sure that you remove duplicate >>>> updates or have idempotent updates. >>>> >>>> You can read about the checkpoint mechanism in the Flink website: >>>> https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html >>>> >>>> Cheers, >>>> Gyula >>>> >>>> On Wed, Aug 19, 2015 at 9:56 AM Welly Tambunan <if05...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Gyula, >>>>> >>>>> Another question i have.. >>>>> >>>>> > ... while external model updates would be *tricky *to keep >>>>> consistent. >>>>> Is that still the case if the Operator treat the external model as >>>>> read-only ? We create another stream that will update the external model >>>>> separately. >>>>> >>>>> Could you please elaborate more about this one ? >>>>> >>>>> Cheers >>>>> >>>>> On Wed, Aug 19, 2015 at 2:52 PM, Gyula Fóra <gyula.f...@gmail.com> >>>>> wrote: >>>>> >>>>>> In that case I would apply a map to wrap in some common type, like a >>>>>> n Either<t1,t2> before the union. >>>>>> >>>>>> And then in the coflatmap you can unwrap it. >>>>>> On Wed, Aug 19, 2015 at 9:50 AM Welly Tambunan <if05...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Gyula, >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> However update1 and update2 have a different type. Based on my >>>>>>> understanding, i don't think we can use union. How can we handle this >>>>>>> one ? >>>>>>> >>>>>>> We like to create our event strongly type to get the domain language >>>>>>> captured. >>>>>>> >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> On Wed, Aug 19, 2015 at 2:37 PM, Gyula Fóra <gyula.f...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hey, >>>>>>>> >>>>>>>> One input of your co-flatmap would be model updates and the other >>>>>>>> input would be events to check against the model if I understand >>>>>>>> correctly. >>>>>>>> >>>>>>>> This means that if your model updates come from more than one >>>>>>>> stream you need to union them into a single stream before connecting >>>>>>>> them >>>>>>>> with the event stream and applying the coatmap. >>>>>>>> >>>>>>>> DataStream updates1 = .... >>>>>>>> DataStream updates2 = .... >>>>>>>> DataStream events = .... >>>>>>>> >>>>>>>> events.connect(updates1.union(updates2).broadcast()).flatMap(...) >>>>>>>> >>>>>>>> Does this answer your question? >>>>>>>> >>>>>>>> Gyula >>>>>>>> >>>>>>>> >>>>>>>> On Wednesday, August 19, 2015, Welly Tambunan <if05...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Gyula, >>>>>>>>> >>>>>>>>> Thanks for your response. >>>>>>>>> >>>>>>>>> However the model can received multiple event for update. How can >>>>>>>>> we do that with co-flatmap as i can see the connect API only received >>>>>>>>> single datastream ? >>>>>>>>> >>>>>>>>> >>>>>>>>> > ... while external model updates would be tricky to keep >>>>>>>>> consistent. >>>>>>>>> Is that still the case if the Operator treat the external model as >>>>>>>>> read-only ? We create another stream that will update the external >>>>>>>>> model >>>>>>>>> separately. >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> >>>>>>>>> On Wed, Aug 19, 2015 at 2:05 PM, Gyula Fóra <gyf...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hey! >>>>>>>>>> >>>>>>>>>> I think it is safe to say that the best approach in this case is >>>>>>>>>> creating a co-flatmap that will receive updates on one input. The >>>>>>>>>> events >>>>>>>>>> should probably be broadcasted in this case so you can check in >>>>>>>>>> parallel. >>>>>>>>>> >>>>>>>>>> This approach can be used effectively with Flink's checkpoint >>>>>>>>>> mechanism, while external model updates would be tricky to keep >>>>>>>>>> consistent. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Gyula >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Aug 19, 2015 at 8:44 AM Welly Tambunan <if05...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> We have a streaming computation that required to validate the >>>>>>>>>>> data stream against the model provided by the user. >>>>>>>>>>> >>>>>>>>>>> Right now what I have done is to load the model into flink >>>>>>>>>>> operator and then validate against it. However the model can be >>>>>>>>>>> updated and >>>>>>>>>>> changed frequently. Fortunately we always publish this event to >>>>>>>>>>> RabbitMQ. >>>>>>>>>>> >>>>>>>>>>> I think we can >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 1. Create RabbitMq listener for model changed event from >>>>>>>>>>> inside the operator, then update the model if event arrived. >>>>>>>>>>> >>>>>>>>>>> But i think this will create race condition if not handle >>>>>>>>>>> correctly and it seems odd to keep this >>>>>>>>>>> >>>>>>>>>>> 2. We can move the model into external in external memory >>>>>>>>>>> cache storage and keep the model up to date using flink. So the >>>>>>>>>>> operator >>>>>>>>>>> will retrieve that from memory cache >>>>>>>>>>> >>>>>>>>>>> 3. Create two stream and using co operator for managing the >>>>>>>>>>> shared state. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What is your suggestion on keeping the state up to date from >>>>>>>>>>> external event ? Is there some kind of best practice for >>>>>>>>>>> maintaining model >>>>>>>>>>> up to date on streaming operator ? >>>>>>>>>>> >>>>>>>>>>> Thanks a lot >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Welly Tambunan >>>>>>>>>>> Triplelands >>>>>>>>>>> >>>>>>>>>>> http://weltam.wordpress.com >>>>>>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Welly Tambunan >>>>>>>>> Triplelands >>>>>>>>> >>>>>>>>> http://weltam.wordpress.com >>>>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Welly Tambunan >>>>>>> Triplelands >>>>>>> >>>>>>> http://weltam.wordpress.com >>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Welly Tambunan >>>>> Triplelands >>>>> >>>>> http://weltam.wordpress.com >>>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>>> >>>> >>> >>> >>> -- >>> Welly Tambunan >>> Triplelands >>> >>> http://weltam.wordpress.com >>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>> >> >> >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com <http://www.triplelands.com/blog/> >> > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com <http://www.triplelands.com/blog/>