Re: Support minibatch for TopNFunction

Roman Boyko Sat, 23 Mar 2024 10:13:39 -0700

Hi Flink Community,

I tried to describe my idea about minibatch for TopNFunction in this doc -
https://docs.google.com/document/d/1YPHwxKfiGSUOUOa6bc68fIJHO_UojTwZEC29VVEa-Uk/edit?usp=sharing


Looking forward to your feedback, thank you

On Tue, 19 Mar 2024 at 12:24, Roman Boyko <ro.v.bo...@gmail.com> wrote:

> Hello Flink Community,
>
> The same problem with record amplification as described in FLIP-415: Introduce
> a new join operator to support minibatch[1] exists for most of
> implementations of AbstractTopNFunction. Especially when the rank is
> provided to output. For example, when calculating Top100 with rank output,
> every input record might produce 100 -U records and 100 +U records.
>
> According to my POC (which is similar to FLIP-415) the record
> amplification could be significantly reduced by using input or output
> buffer.
>
> What do you think if we implement such optimization for TopNFunctions?
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-415
> %3A+Introduce+a+new+join+operator+to+support+minibatch
>
> --
> Best regards,
> Roman Boyko
> e.: ro.v.bo...@gmail.com
> m.: +79059592443
> telegram: @rboyko
>


-- 
Best regards,
Roman Boyko
e.: ro.v.bo...@gmail.com
m.: +79059592443
telegram: @rboyko

Re: Support minibatch for TopNFunction

Reply via email to