Hi Flink Community, I tried to describe my idea about minibatch for TopNFunction in this doc - https://docs.google.com/document/d/1YPHwxKfiGSUOUOa6bc68fIJHO_UojTwZEC29VVEa-Uk/edit?usp=sharing
Looking forward to your feedback, thank you On Tue, 19 Mar 2024 at 12:24, Roman Boyko <ro.v.bo...@gmail.com> wrote: > Hello Flink Community, > > The same problem with record amplification as described in FLIP-415: Introduce > a new join operator to support minibatch[1] exists for most of > implementations of AbstractTopNFunction. Especially when the rank is > provided to output. For example, when calculating Top100 with rank output, > every input record might produce 100 -U records and 100 +U records. > > According to my POC (which is similar to FLIP-415) the record > amplification could be significantly reduced by using input or output > buffer. > > What do you think if we implement such optimization for TopNFunctions? > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-415 > %3A+Introduce+a+new+join+operator+to+support+minibatch > > -- > Best regards, > Roman Boyko > e.: ro.v.bo...@gmail.com > m.: +79059592443 > telegram: @rboyko > -- Best regards, Roman Boyko e.: ro.v.bo...@gmail.com m.: +79059592443 telegram: @rboyko