Thanks Gyula
Cheers On Fri, Jul 3, 2015 at 6:19 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > Yes, you can think of it that way. Each Operator has parallel instances > and each parallel instance receives input from multiple channels (FIFO from > each) and produces output. > > Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, > 13:02): > >> Hi Gyula, >> >> Thanks a lot. That's enough for my case. >> >> I do really love Flink Streaming model compare to Spark Streaming. >> >> So is that true that i can think that Operator as an Actor model in this >> system ? Is that a right way to put it ? >> >> >> >> Cheers >> >> On Fri, Jul 3, 2015 at 5:29 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: >> >>> Hey, >>> >>> 1. >>> Yes, if you use partitionBy the same key will always go to the same >>> downstream operator instance. >>> >>> 2. >>> There is only partial ordering guarantee, meaning that data received >>> from one input is FIFO. This means that if the same key is coming from >>> multiple inputs than there is no ordering guarantee there, only inside one >>> input. >>> >>> Gyula >>> >>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, >>> 11:51): >>> >>>> Hi Gyula, >>>> >>>> Thanks for your response. >>>> >>>> So if i use partitionBy then data point with the same will receive >>>> exactly by the same instance of operator ? >>>> >>>> >>>> Another question is if i execute reduce() operator on after >>>> partitionBy, will that reduce operator guarantee ordering within the same >>>> key ? >>>> >>>> >>>> Cheers >>>> >>>> On Fri, Jul 3, 2015 at 4:14 PM, Gyula Fóra <gyula.f...@gmail.com> >>>> wrote: >>>> >>>>> Hey! >>>>> >>>>> Both groupBy and partitionBy will trigger a shuffle over the network >>>>> based on some key, assuring that elements with the same keys end up on the >>>>> same downstream processing operator. >>>>> >>>>> The difference between the two is that groupBy in addition to this >>>>> returns a GroupedDataStream which lets you execute some special >>>>> operations, >>>>> such as key based rolling aggregates. >>>>> >>>>> PartitionBy is useful when you are using simple operators but still >>>>> want to control the messages received by parallel instances (in a mapper >>>>> for example). >>>>> >>>>> Cheers, >>>>> Gyula >>>>> >>>>> tambunanw <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P, >>>>> 10:32): >>>>> >>>>>> Hi All, >>>>>> >>>>>> I'm trying to digest what's the difference between this two. From my >>>>>> experience in Spark GroupBy will cause shuffling on the network. Is >>>>>> that the >>>>>> same case in Flink ? >>>>>> >>>>>> I've watch videos and read a couple docs about Flink that's actually >>>>>> Flink >>>>>> will compile the user code into it's own optimized graph structure so >>>>>> i >>>>>> think Flink engine will take care of this one ? >>>>>> >>>>>> From the docs for Partitioning >>>>>> >>>>>> >>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#partitioning >>>>>> >>>>>> Is that true that GroupBy is more advanced than PartitionBy ? Can >>>>>> someone >>>>>> elaborate ? >>>>>> >>>>>> I think this one is really confusing for me that come from Spark >>>>>> world. Any >>>>>> help would be really appreciated. >>>>>> >>>>>> Cheers >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Streaming-PartitionBy-vs-GroupBy-differences-tp1927.html >>>>>> Sent from the Apache Flink User Mailing List archive. mailing list >>>>>> archive at Nabble.com. >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Welly Tambunan >>>> Triplelands >>>> >>>> http://weltam.wordpress.com >>>> http://www.triplelands.com <http://www.triplelands.com/blog/> >>>> >>> >> >> >> -- >> Welly Tambunan >> Triplelands >> >> http://weltam.wordpress.com >> http://www.triplelands.com <http://www.triplelands.com/blog/> >> > -- Welly Tambunan Triplelands http://weltam.wordpress.com http://www.triplelands.com <http://www.triplelands.com/blog/>