Hi all,
Thanks for all the feedback and comments.
Based on @Aljoscha Krettek 's suggestion, I have
created a new FLIP: "FLIP-44: Support Local Aggregation in Flink" and
started a new mail thread for it in the dev mailing list.
So any further feedback and discussion can be moved to the new threa
Hi Aljoscha,
I am happy to create a FLIP and have a voting process for this feature. I
have already sent a mail to apply for the wiki permissions.
Once I get the permission, will start the next step. When it is ready, I
will ping you again.
Best,
Vino
Aljoscha Krettek 于2019年6月11日周二 下午10:57写道:
Hi,
I think this proposed change is big enough to warrant a FLIP [1], which should
have a voting process as described in that link before the FLIP is accepted.
I’m writing this because such a bigger change has the possibility of
languishing for a long time due to lack of PMC/committer bandwidth
Hi all,
Thanks for all the feedback and comments.
Since the thread of this feature has been presented about one week in the
dev mailing list and has got much support from the community, I have
created a new JIRA feature issue[1] to track it and I will split subtasks
soon.
We can move further dis
+ 1 for this feature which is great helpful in product situations and look
forward to see it as soon as possible.
> 在 2019年6月6日,下午4:58,qianjin Xu 写道:
>
>>
>> hi
>
> +1 nice work
>
> best
>
> forwardxu
>
> hi
+1 nice work
best
forwardxu
hi
+1,nice work
best
forwardxu
boshu Zheng 于2019年6月6日周四 下午4:30写道:
> Hi,
>
>
> +1 from my side. Looking forward to this must-have feature :)
>
>
> Best,
> boshu
>
> At 2019-06-05 16:33:13, "vino yang" wrote:
> >Hi Aljoscha,
> >
> >What do you think about this feature and design document?
> >
Hi,
+1 from my side. Looking forward to this must-have feature :)
Best,
boshu
At 2019-06-05 16:33:13, "vino yang" wrote:
>Hi Aljoscha,
>
>What do you think about this feature and design document?
>
>Best,
>Vino
>
>vino yang 于2019年6月5日周三 下午4:18写道:
>
>> Hi Dian,
>>
>> I still think your imp
Hi Aljoscha,
What do you think about this feature and design document?
Best,
Vino
vino yang 于2019年6月5日周三 下午4:18写道:
> Hi Dian,
>
> I still think your implementation is similar to the window operator, you
> mentioned the scalable trigger mechanism, the window API also can customize
> trigger.
>
Hi Dian,
I still think your implementation is similar to the window operator, you
mentioned the scalable trigger mechanism, the window API also can customize
trigger.
Moreover, IMO, the design should guarantee a deterministic semantics, I
think based on memory availability is a non-deterministic
Hi Vino,
Thanks a lot for your reply.
> 1) When, Why and How to judge the memory is exhausted?
My point here is that the local aggregate operator can buffer the inputs in
memory and send out the results AT ANY TIME. i.e. element count or the time
interval reached a pre-configured value, the me
Hi Vino,
+1 for this feature. It's useful for data skew. And it could also reduce
shuffled datum.
I have some concerns about the API part. From my side, this feature should
be more like an improvement. I'm afraid the proposal is an overkill about
the API part. Many other systems support pre-aggre
Hi Litree,
>From an implementation level, the localKeyBy API returns a general
KeyedStream, you can call all the APIs which KeyedStream provides, we did
not restrict its usage, although we can do this (for example returns a new
stream object named LocalKeyedStream).
However, to achieve the goal o
Hi Dian,
The different opinion is fine for me, If there is a better solution or
there are obvious deficiencies in our design, we are very happy to accept
and improve it.
I agree with you that customized local aggregate operator is more scalable
in the way of the trigger mechanism. However, I have
Hi Vino,
I have read your design,something I want to know is the usage of these new
APIs.It looks like when I use localByKey,i must then use a window operator to
return a datastream,and then use keyby and another window operator to get the
final result?
thanks,
Litree
On 06/04/2019 17:22,
Hi Vino,
It may seem similar to window operator but there are also a few key
differences. For example, the local aggregate operator can send out the
results at any time and the window operator can only send out the results
at the end of window (without early fire). This means that the local
aggreg
Hi Dian,
Thanks for your reply.
I know what you mean. However, if you think deeply, you will find your
implementation need to provide an operator which looks like a window
operator. You need to use state and receive aggregation function and
specify the trigger time. It looks like a lightweight wi
Hi Vino,
> So if users want to use local aggregation, they should call the window API
> to build a local window that means users should (or say "can") specify the
> window length and other information based on their needs.
It sounds ok for me. It would have to be run against some API guys from th
Hi Vino,
Thanks a lot for starting this discussion. +1 to this feature as I think it
will be very useful.
Regarding to using window to buffer the input elements, personally I don't
think it's a good solution for the following reasons:
1) As we know that WindowOperator will store the accumulated
Hi Ken,
Thanks for your reply.
As I said before, we try to reuse Flink's state concept (fault tolerance
and guarantee "Exactly-Once" semantics). So we did not consider cache.
In addition, if we use Flink's state, the OOM related issue is not a key
problem we need to consider.
Best,
Vino
Ken Kr
Hi all,
Cascading implemented this “map-side reduce” functionality with an LLR cache.
That worked well, as then the skewed keys would always be in the cache.
The API let you decide the size of the cache, in terms of number of entries.
Having a memory limit would have been better for many of our
Hi Piotr,
The localKeyBy API returns an instance of KeyedStream (we just added an
inner flag to identify the local mode) which is Flink has provided before.
Users can call all the APIs(especially *window* APIs) which KeyedStream
provided.
So if users want to use local aggregation, they should cal
Hi,
+1 for the idea from my side. I’ve even attempted to add similar feature quite
some time ago, but didn’t get enough traction [1].
I’ve read through your document and I couldn’t find it mentioning anywhere,
when the pre aggregated result should be emitted down the stream? I think
that’s one
Excited and Big +1 for this feature.
SHI Xiaogang 于2019年6月3日周一 下午3:37写道:
> Nice feature.
> Looking forward to having it in Flink.
>
> Regards,
> Xiaogang
>
> vino yang 于2019年6月3日周一 下午3:31写道:
>
> > Hi all,
> >
> > As we mentioned in some conference, such as Flink Forward SF 2019 and
> QCon
> >
Nice feature.
Looking forward to having it in Flink.
Regards,
Xiaogang
vino yang 于2019年6月3日周一 下午3:31写道:
> Hi all,
>
> As we mentioned in some conference, such as Flink Forward SF 2019 and QCon
> Beijing 2019, our team has implemented "Local aggregation" in our inner
> Flink fork. This feature c
Hi all,
As we mentioned in some conference, such as Flink Forward SF 2019 and QCon
Beijing 2019, our team has implemented "Local aggregation" in our inner
Flink fork. This feature can effectively alleviate data skew.
Currently, keyed streams are widely used to perform aggregating operations
(e.g.
26 matches
Mail list logo