Support minibatch for TopNFunction

Roman Boyko Mon, 18 Mar 2024 22:24:45 -0700

Hello Flink Community,

The same problem with record amplification as described in FLIP-415: Introduce
a new join operator to support minibatch[1] exists for most of
implementations of AbstractTopNFunction. Especially when the rank is
provided to output. For example, when calculating Top100 with rank output,
every input record might produce 100 -U records and 100 +U records.


According to my POC (which is similar to FLIP-415) the record amplification
could be significantly reduced by using input or output buffer.

What do you think if we implement such optimization for TopNFunctions?

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-415
%3A+Introduce+a+new+join+operator+to+support+minibatch

-- 
Best regards,
Roman Boyko
e.: ro.v.bo...@gmail.com
m.: +79059592443
telegram: @rboyko

Support minibatch for TopNFunction

Reply via email to