Re: Flink Streaming : PartitionBy vs GroupBy differences

Welly Tambunan Fri, 03 Jul 2015 04:28:15 -0700

Thanks Gyula


Cheers

On Fri, Jul 3, 2015 at 6:19 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:

> Yes, you can think of it that way. Each Operator has parallel instances
> and each parallel instance receives input from multiple channels (FIFO from
> each) and produces output.
>
> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
> 13:02):
>
>> Hi Gyula,
>>
>> Thanks a lot. That's enough for my case.
>>
>> I do really love Flink Streaming model compare to Spark Streaming.
>>
>> So is that true that i can think that Operator as an Actor model in this
>> system ? Is that a right way to put it ?
>>
>>
>>
>> Cheers
>>
>> On Fri, Jul 3, 2015 at 5:29 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
>>
>>> Hey,
>>>
>>> 1.
>>> Yes, if you use partitionBy the same key will always go to the same
>>> downstream operator instance.
>>>
>>> 2.
>>> There is only partial ordering guarantee, meaning that data received
>>> from one input is FIFO. This means that if the same key is coming from
>>> multiple inputs than there is no ordering guarantee there, only inside one
>>> input.
>>>
>>> Gyula
>>>
>>> Welly Tambunan <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
>>> 11:51):
>>>
>>>> Hi Gyula,
>>>>
>>>> Thanks for your response.
>>>>
>>>> So if i use partitionBy then data point with the same will receive
>>>> exactly by the same instance of operator ?
>>>>
>>>>
>>>> Another question is if i execute reduce() operator on after
>>>> partitionBy, will that reduce operator guarantee ordering within the same
>>>> key ?
>>>>
>>>>
>>>> Cheers
>>>>
>>>> On Fri, Jul 3, 2015 at 4:14 PM, Gyula Fóra <gyula.f...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey!
>>>>>
>>>>> Both groupBy and partitionBy will trigger a shuffle over the network
>>>>> based on some key, assuring that elements with the same keys end up on the
>>>>> same downstream processing operator.
>>>>>
>>>>> The difference between the two is that groupBy in addition to this
>>>>> returns a GroupedDataStream which lets you execute some special 
>>>>> operations,
>>>>> such as key based rolling aggregates.
>>>>>
>>>>> PartitionBy is useful when you are using simple operators but still
>>>>> want to control the messages received by parallel instances (in a mapper
>>>>> for example).
>>>>>
>>>>> Cheers,
>>>>> Gyula
>>>>>
>>>>> tambunanw <if05...@gmail.com> ezt írta (időpont: 2015. júl. 3., P,
>>>>> 10:32):
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I'm trying to digest what's the difference between this two. From my
>>>>>> experience in Spark GroupBy will cause shuffling on the network. Is
>>>>>> that the
>>>>>> same case in Flink ?
>>>>>>
>>>>>> I've watch videos and read a couple docs about Flink that's actually
>>>>>> Flink
>>>>>> will compile the user code into it's own optimized graph structure so
>>>>>> i
>>>>>> think Flink engine will take care of this one ?
>>>>>>
>>>>>> From the docs for Partitioning
>>>>>>
>>>>>>
>>>>>> http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#partitioning
>>>>>>
>>>>>> Is that true that GroupBy is more advanced than PartitionBy ? Can
>>>>>> someone
>>>>>> elaborate ?
>>>>>>
>>>>>> I think this one is really confusing for me that come from Spark
>>>>>> world. Any
>>>>>> help would be really appreciated.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Streaming-PartitionBy-vs-GroupBy-differences-tp1927.html
>>>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>>>> archive at Nabble.com.
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Re: Flink Streaming : PartitionBy vs GroupBy differences

Reply via email to