Hi all, We would like to revive the discussion and provide a quick update on the recent work of the FLIP. We have implemented a POC[1], run cases in the flink-benchmarks[2] against the POC, and verified that many of the operators in the benchmark will enable object-reuse without code changes, while the global object-reuse is disabled.
Please let me know if you have any further comments on the FLIP. If there are no more comments, we will open the voting in 3 days. Best regards, Xuannan [1] https://github.com/apache/flink/pull/22897 [2] https://github.com/apache/flink-benchmarks On Fri, Jul 7, 2023 at 9:18 AM Dong Lin <lindon...@gmail.com> wrote: > > Hi Jing, > > Thank you for the suggestion. Yes, we can extend it to support null if in > the future we find any use-case for this flexibility. > > Best, > Dong > > On Thu, Jul 6, 2023 at 7:55 PM Jing Ge <j...@ververica.com> wrote: > > > Hi Dong, > > > > one scenario I could imagine is that users could enable global object > > reuse features but force deep copy for some user defined specific functions > > because of any limitations. But that is only my gut feeling. And agree, we > > could keep the solution simple for now as FLIP described and upgrade to 3VL > > once there are such real requirements that are rising. > > > > Best regards, > > Jing > > > > On Thu, Jul 6, 2023 at 12:30 PM Dong Lin <lindon...@gmail.com> wrote: > > > >> Hi Jing, > >> > >> Thank you for the detailed explanation. Please see my reply inline. > >> > >> On Thu, Jul 6, 2023 at 3:17 AM Jing Ge <j...@ververica.com> wrote: > >> > >>> Hi Xuannan, Hi Dong, > >>> > >>> Thanks for your clarification. > >>> > >>> @Xuannan > >>> > >>> A Jira ticket has been created for the doc update: > >>> https://issues.apache.org/jira/browse/FLINK-32546 > >>> > >>> @Dong > >>> > >>> I don't have a concrete example. I just thought about it from a > >>> conceptual or pattern's perspective. Since we have 1. coarse-grained > >>> global > >>> switch(CGS as abbreviation), i.e. the pipeline.object-reuse and 2. > >>> fine-grained local switch(FGS as abbreviation), i.e. the > >>> objectReuseCompliant variable for specific operators/functions, there will > >>> be the following patterns with appropriate combinations: > >>> > >>> pattern 1: coarse-grained switch only. Local object reuse will be > >>> controlled by the coarse-grained switch: > >>> 1.1 cgs == true -> local object reused enabled > >>> 1.2 cgs == true -> local object reused enabled > >>> 1.3 cgs == false -> local object reused disabled, i.e. deep copy enabled > >>> 1.4 cgs == false -> local object reused disabled, i.e. deep copy enabled > >>> > >>> afaiu, this is the starting point. I wrote 4 on purpose to make the > >>> regression check easier. We can consider it as the combinations with > >>> cgs(true/false) and fgs(true/false) while fgs is ignored. > >>> > >>> Now we introduce fine-grained switch. There will be two patterns: > >>> > >>> pattern 2: fine-grained switch over coarse-grained switch. > >>> Coarse-grained switch will be ignored when the local fine-grained switch > >>> has different value: > >>> 2.1 cgs == true and fgs == true -> local object reused enabled > >>> 2.2 cgs == true and fgs == false -> local object reused disabled, i.e. > >>> deep copy enabled > >>> 2.3 cgs == false and fgs == true -> local object reused enabled > >>> 2.4 cgs == false and fgs == false -> local object reused disabled, i.e. > >>> deep copy enabled > >>> > >>> cgs is actually ignored. > >>> > >>> Current FLIP is using a slightly different pattern: > >>> > >>> pattern 3: fine-grained switch over coarse-grained switch only when > >>> coarse-grained switch is off, i..e cgs OR fgs: > >>> 3.1 cgs == true and fgs == true -> local object reused enabled > >>> 3.2 cgs == true and fgs == false -> local object reused enabled > >>> 3.3 cgs == false and fgs == true -> local object reused enabled > >>> 3.4 cgs == false and fgs == false -> local object reused disabled, i.e. > >>> deep copy enabled > >>> > >>> All of those patterns are rational and each has different focus. It > >>> depends on the real requirement to choose one of them. > >>> > >>> As we can see, if fgs is using 2VL, there is a regression between > >>> pattern 1 and pattern 2. You are absolutely right in this case. That's why > >>> I suggested 3VL, i.e. fgs will have triple values: true, false, > >>> unknown(e.g. null) > >>> > >>> pattern 4: 3VL fgs with the null as init value (again, there are just > >>> two combination, I made it 4 on purpose): > >>> 4.1 cgs == true and fgs == null -> local object reused enabled > >>> 4.2 cgs == true and fgs == null -> local object reused enabled > >>> 4.3 cgs == false and fgs == null -> local object reused disabled, i.e. > >>> deep copy enabled > >>> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e. > >>> deep copy enabled > >>> > >>> Since the default value of fgs is null, pattern 4 is backward compatible > >>> with pattern 1, which means no regression. > >>> > >>> Now we will set value to fgs and follow the pattern 2: > >>> 4.5 cgs == true and fgs == true -> local object reused enabled > >>> 4.6 cgs == true and fgs == false -> local object reused disabled, i.e. > >>> deep copy enabled > >>> 4.7 cgs == false and fgs == true -> local object reused enabled > >>> 4.8 cgs == false and fgs == false -> local object reused disabled, i.e. > >>> deep copy enabled > >>> > >>> Pattern 4 contains pattern 3 with the following combinations(force > >>> enabling local object reuse): > >>> 4.5 cgs == true and fgs == true -> local object reused enabled > >>> 4.2 cgs == true and fgs == null -> local object reused enabled > >>> 4.7 cgs == false and fgs == true -> local object reused enabled > >>> 4.4 cgs == false and fgs == null -> local object reused disabled, i.e. > >>> deep copy enabled > >>> > >>> Comparing pattern 4 to pattern 3, user will have one additional > >>> flexibility to control(force disabling) the local object reuse capability > >>> because of 3VL, i.e. 4.2+4.6 vs. 3.2. > >>> > >> > >> I think you are suggesting to allow the user setting fgs to null so that > >> "user will have one additional flexibility to control(force disabling) the > >> local object reuse capability". > >> > >> In general, an API that only allows false/true is a bit more complex to > >> implement than an API that allows false/true/null. All things being equal, > >> I believe it is preferred to have a simpler public API. > >> > >> I understand you are coming from a conceptual perspective and trying to > >> make it similar to hierarchical RBAC. However, after thinking through this, > >> I still could not find a use-case where we actually need this flexibility. > >> In particular, for cases where a user has explicitly configured > >> pipeline.object-reuse to true, that means the user already knows (or takes > >> the responsibility of ensuring) that correctness can be achieved, why would > >> the user want to explicitly disable the object-reuse for an operator? > >> > >> Regards, > >> Dong > >> > >> > >>> > >>> It is commonly used in the hierarchical RBAC to enable more fine-grained > >>> access control of sub role. > >>> > >>> I hope I have been able to explain myself clearly. Looking forward to > >>> your feedback. > >>> > >>> Best regards, > >>> Jing > >>> > >>> > >>> > >>> On Wed, Jul 5, 2023 at 12:47 PM Dong Lin <lindon...@gmail.com> wrote: > >>> > >>>> Hi Jing, > >>>> > >>>> Thanks for the comments! Please find below my comments, which are based > >>>> on the offline discussion with Xuannan. > >>>> > >>>> On Wed, Jul 5, 2023 at 1:36 AM Jing Ge <j...@ververica.com> wrote: > >>>> > >>>>> Hi Xuannan, Hi Dong > >>>>> > >>>>> Thanks for the Proposal! After reading the FLIP, I'd like to ask some > >>>>> questions: > >>>>> > >>>>> 1. Naming convention for boolean variables. It is recommended to > >>>>> follow JavaBean [1], i.e. objectReuseCompliant as the variable name with > >>>>> isObjectReuseCompliant() and setObjectReuseCompliant() as the methods' > >>>>> name. > >>>>> > >>>>> > >>>> Good point. We have updated the FLIP as suggested. > >>>> > >>>> > >>>>> > >>>>> 2. > >>>>> > >>>>> - > >>>>> > >>>>> *If pipeline.object-reuse is set to true, records emitted by this > >>>>> operator will be re-used.* > >>>>> - > >>>>> > >>>>> *Otherwise, if getIsObjectReuseCompliant() returns true, records > >>>>> emitted by this operator will be re-used.* > >>>>> - > >>>>> > >>>>> *Otherwise, records emitted by this operator will be deep-copied > >>>>> before being given to the next operator in the chain.* > >>>>> > >>>>> > >>>>> If I understand you correctly, the hard coding objectReusedCompliant > >>>>> should have higher priority over the configuration, the checking logic > >>>>> should be: > >>>>> > >>>>> - > >>>>> > >>>>> *If getIsObjectReuseCompliant() returns true, records emitted by > >>>>> this operator will be re-used.* > >>>>> - > >>>>> > >>>>> *Otherwise, if pipeline.object-reuse is set to true, records > >>>>> emitted by this operator will be re-used.* > >>>>> - > >>>>> > >>>>> *Otherwise, records emitted by this operator will be deep-copied > >>>>> before being given to the next operator in the chain.* > >>>>> > >>>>> > >>>>> The results are the same but the checking logics are different, but > >>>>> there are some additional thoughts, which lead us to the next question. > >>>>> > >>>> > >>>>> > >>>> > >>>>> 3. Current design lets specific operators enable object reuse and > >>>>> ignore the global config. There could be another thought, on the > >>>>> contrary: > >>>>> if an operator has hard coded the objectReuseCompliant as false, i.e. > >>>>> disable object reuse on purpose, records should not be reused even if > >>>>> the > >>>>> global config pipeline.object-reused is set to true, which turns out > >>>>> that > >>>>> the objectReuseCompliant could be a triple value logic: ture: force > >>>>> object > >>>>> reusing; false: force deep-copying; unknown: depends on > >>>>> pipeline.object-reuse config. > >>>>> > >>>> > >>>> With the current proposal, if pipeline.object-reused == true and > >>>> operatorA's objectReuseCompliant == false, Flink will not deep-copy > >>>> operatorA's output. I think you are suggesting changing the behavior such > >>>> that Flink should deep-copy the operatorA's output. > >>>> > >>>> Could you explain what is the advantage of this approach compared to > >>>> the approach described in the FLIP? > >>>> > >>>> My concern with this approach is that it can cause performance > >>>> regression. This is an operator's objectReuseCompliant will be false by > >>>> default unless it is explicitly overridden. For those jobs which are > >>>> currently configured with pipeline.object-reused = true, these jobs will > >>>> likely start to have lower performance (due to object deep-copy) after > >>>> upgrading to the newer Flink version. > >>>> > >>>> Best, > >>>> Dong > >>>> > >>>> > >>>>> > >>>>> Best regards, > >>>>> Jing > >>>>> > >>>>> > >>>>> [1] https://en.wikipedia.org/wiki/JavaBeans > >>>>> > >>>>> On Mon, Jul 3, 2023 at 4:25 AM Xuannan Su <suxuanna...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> Dong(cc'ed) and I are opening this thread to discuss our proposal to > >>>>>> add operator attribute to allow operator to specify support for > >>>>>> object-reuse [1]. > >>>>>> > >>>>>> Currently, the default configuration for pipeline.object-reuse is set > >>>>>> to false to avoid data corruption, which can result in suboptimal > >>>>>> performance. We propose adding APIs that operators can utilize to > >>>>>> inform the Flink runtime whether it is safe to reuse the emitted > >>>>>> records. This enhancement would enable Flink to maximize its > >>>>>> performance using the default configuration. > >>>>>> > >>>>>> Please refer to the FLIP document for more details about the proposed > >>>>>> design and implementation. We welcome any feedback and opinions on > >>>>>> this proposal. > >>>>>> > >>>>>> Best regards, > >>>>>> > >>>>>> Dong and Xuannan > >>>>>> > >>>>>> [1] > >>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=255073749 > >>>>>> > >>>>>