Hi Baodi, Thanks for the reply and update of the PIP.
1. Pulsar Functions currently isn't integrated with the Transaction feature yet, so there's no EXACTLY_ONCE support. 2. And Yes, "EFFECTIVELY_ONCE = ATLEAST_ONCE + Message Deduplication" On Tue, May 31, 2022 at 9:16 AM 石宝迪 <wudixiaolong...@icloud.com.invalid> wrote: > >> If you fail to start the function, you immediately break people's > > functions when they upgrade to this version. How about notifying them > once > > via logger (WARN)? > > > I tend to fail. Although this breaks the current logic. but the current > implementation can be considered is a bug. > > > It will flood their logs if they used it wrong. Maybe write to log once? > > > Agree, I changed PIP. > > Thanks, > Baodi Shi > > > 2022年5月31日 23:5720,Asaf Mesika <asaf.mes...@gmail.com> 写道: > > > > Hi Baodi, > > > > Regarding > > > >> > >> 1. When the delivery semantic is ATMOST_ONCE, add verify autoAck must > >> be true. If the validation fails, let the function fail to start (This > >> temporarily resolves the configuration ambiguity). When autoAck is > >> subsequently removed, the message will be acked immediately after > receiving > >> the message. > >> > >> > >> If you fail to start the function, you immediately break people's > > functions when they upgrade to this version. How about notifying them > once > > via logger (WARN)? > > > > Regarding > > > >> > >> 1. > >> > >> > >> When user call record.ack() in function, just ProcessingGuarantees == > >> MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL, user > >> call record.ack() is invalid(print warn log). > >> > >> It will flood their logs if they used it wrong. Maybe write to log once? > > > > On Tue, May 31, 2022 at 12:24 PM Baozi <wudixiaolong...@icloud.com > .invalid> > > wrote: > > > >> Hi, Asaf. > >> > >> Thanks review. > >> > >>>> I'm not entirely sure that is accurate. The Effectively-Once as I > >>> understand it is achieved using transactions, thus the consumption of > >> that > >>> message and production of any messages, as a result, are considered one > >>> atomic unit - either message acknowledged and messages produced or > >> neither. > >> > >> > >> Not using transactions now, I understand: EFFECTIVELY_ONCE = > ATLEAST_ONCE > >> + Message Deduplication. > >> > >> @Neng Lu @Rui Fu Can help make sure? > >> > >>>> I would issue a WARN when reading configuring the function (thus > emitted > >>> once) when the user actively configured autoAck=false and warn them > that > >>> this configuration is deprecated and they should switch to the MANUAL > >>> ProcessingGuarantee configuration option. > >> > >> > >> Added to API Change(2) > >> > >>>> suggest you clarify what existing behavior remains for backward > >>> compatibility with the appropriate comment that this is deprecated and > >> will > >>> be removed. > >> > >> Yes, I have rewritten it, please see Implementation(1) > >> > >>> 5. Regarding Test Plan > >>> * I would add: Validate the test of autoAck=false still works (you > >> haven't > >>> broken anything) > >>> * I would add: Validate existing ProcessingGuarantee test for > AtMostOnce, > >>> AtLeastOnce, ExactlyOnce still works (when autoAck=true) > >> > >> > >> Nice, I added to PIP. > >> > >> > >> Thanks, > >> Baodi Shi > >> > >>> 2022年5月30日 22:0011,Asaf Mesika <asaf.mes...@gmail.com> 写道: > >>> > >>> Thanks for applying the fixes. > >>> > >>> 1. Regarding > >>> > >>>> > >>>> - EFFECTIVELY_ONCE: The message is acknowledged *after* the function > >>>> finished execution. Depends on pulsar deduplication, and provides > >>>> end-to-end effectively once processing. > >>>> > >>>> I'm not entirely sure that is accurate. The Effectively-Once as I > >>> understand it is achieved using transactions, thus the consumption of > >> that > >>> message and production of any messages, as a result, are considered one > >>> atomic unit - either message acknowledged and messages produced or > >> neither. > >>> > >>> 2. Regarding > >>> > >>>> > >>>> 1. Indication of autoAck is deprecated, and not use it in the code. > >>>> (and also Function.proto) > >>>> > >>>> * I would issue a WARN when reading configuring the function (thus > >> emitted > >>> once) when the user actively configured autoAck=false and warn them > that > >>> this configuration is deprecated and they should switch to the MANUAL > >>> ProcessingGuarantee configuration option. > >>> > >>> 3. Regarding > >>> > >>>> > >>>> 1. When the delivery semantic is ATMOST_ONCE, the message will be > >>>> acked immediately after receiving the message, no longer affected by > >> the > >>>> autoAck configuration. > >>>> > >>>> I suggest you clarify what existing behavior remains for backward > >>> compatibility with the appropriate comment that this is deprecated and > >> will > >>> be removed. > >>> > >>> 4. Regarding > >>> > >>>> > >>>> 1. > >>>> > >>>> When user call record.ack() in function, just ProcessingGuarantees == > >>>> MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL, > user > >>>> call record.ack() is invalid(print warn log). > >>>> > >>>> That might blast WARN messages to the user. Perhaps save the fact that > >> you > >>> have printed a WARN message once and only print when the variable is > not > >>> set? > >>> > >>> 5. Regarding Test Plan > >>> * I would add: Validate the test of autoAck=false still works (you > >> haven't > >>> broken anything) > >>> * I would add: Validate existing ProcessingGuarantee test for > AtMostOnce, > >>> AtLeastOnce, ExactlyOnce still works (when autoAck=true) > >>> > >>> > >>> > >>> On Mon, May 30, 2022 at 4:09 PM Baozi <wudixiaolong...@icloud.com > >> .invalid> > >>> wrote: > >>> > >>>> Hi, Mesika. > >>>> > >>>> Thanks review. > >>>> > >>>>>> 2. I suggest calling it `MANUAL` `ProcessingGuarantee` instead of > >> NONE. > >>>> As > >>>>>> you carefully explained, ProcessingGuarantee comes does to the fact > >> that > >>>>>> the function executor calls acknowledge, in specific timing: > >>>> > >>>> > >>>> Added, Refer to the latest pip. > >>>> https://github.com/apache/pulsar/issues/15560 > >>>> > >>>>>> 3. Removing autoAck from Function Config breaks backward > >> compatibility, > >>>>>> thus shouldn't this be strongly reflected in the PIP - how does > Pulsar > >>>>>> release handle breaking change? > >>>> > >>>> As suggested by @neng, They will be marked as deprecated first and > >> clearly > >>>> stated in the documentation. Remove it after 2~3 release. > >>>> > >>>>>> 4. Regarding Implementation (1), isn't the class itself > >>>>>> PulsarSinkAtLeastOnceProcessor encodes what to do? I'm not sure I > >>>>>> understand how you use that enum value *inside* the class/ > >>>> > >>>> I changed PIP, add new PulsarSinkManualProcessor. Refer to the latest > >> PIP > >>>> API Changes(3) > >>>> > >>>> Thanks, > >>>> Baodi Shi > >>>> > >>>>> 2022年5月30日 12:5128,Rui Fu <r...@apache.org> 写道: > >>>>> > >>>>> Hi Baodi, > >>>>> > >>>>> Nice work. Put some suggestions below, ptal. > >>>>> > >>>>> 1. API changes should also contain the changes of `Function.proto`, > >>>> including new `ProcessingGuarantees` option and `autoAck`. > >>>>> 2. Please be sure the other language runtimes (like Python, Golang) > do > >>>> support similar `record.ack()` function from the context, if no, it > >> might > >>>> have some API changes for different runtime we well. > >>>>> > >>>>> > >>>>> Best, > >>>>> > >>>>> Rui Fu > >>>>> 在 2022年5月29日 +0800 22:18,Asaf Mesika <asaf.mes...@gmail.com>,写道: > >>>>>> 1. "Added NONE delivery semantics and delete autoAck config." > >>>>>> - Added --> add > >>>>>> > >>>>>> 2. I suggest calling it `MANUAL` `ProcessingGuarantee` instead of > >> NONE. > >>>> As > >>>>>> you carefully explained, ProcessingGuarantee comes does to the fact > >> that > >>>>>> the function executor calls acknowledge, in specific timing: > >>>>>> - `AT_MOST_ONCE` - When the message is read by the client, it is > >>>>>> immediately acknowledged and only then the function is executed, > thus > >>>>>> guaranteeing it will not run more than once > >>>>>> - `AT_LEAST_ONCE` - Message is acknowledged *after* the function > >>>> finished > >>>>>> execution, thus it will be run at least once. > >>>>>> - `MANUAL` - Signals to the user that it is up to them to > acknowledge > >>>> the > >>>>>> message, inside the function. > >>>>>> > >>>>>> I think if you couple that change with adding the explanation I > wrote > >>>>>> above to the documentation it will become crystal clear (hopefully) > >>>> what is > >>>>>> a Processing Guarantee exactly and what each value signifies. > >>>>>> > >>>>>> 3. Removing autoAck from Function Config breaks backward > >> compatibility, > >>>>>> thus shouldn't this be strongly reflected in the PIP - how does > Pulsar > >>>>>> release handle breaking change? > >>>>>> > >>>>>> 4. Regarding Implementation (1), isn't the class itself > >>>>>> PulsarSinkAtLeastOnceProcessor encodes what to do? I'm not sure I > >>>>>> understand how you use that enum value *inside* the class/ > >>>>>> > >>>>>> > >>>>>> On Fri, May 27, 2022 at 1:08 AM Neng Lu <freen...@gmail.com> wrote: > >>>>>> > >>>>>>> Some suggestions: > >>>>>>> > >>>>>>> 1. Instead of deleting the `autoAck`, keep it but not use it in the > >>>> code. > >>>>>>> And documented clearly it's deprecated for the following 2~3 > release. > >>>> And > >>>>>>> then delete it. > >>>>>>> 2. For `PulsarSinkAtLeastOnceProcessor` and > >>>>>>> `PulsarSinkEffectivelyOnceProcessor`, if `NONE` is configured, it > >>>> defaults > >>>>>>> to ATLEAST_ONCE. > >>>>>>> 3. Need to let users know the behavior when they call > `record.ack()` > >>>> inside > >>>>>>> the function implementation. > >>>>>>> > >>>>>>> On Thu, May 12, 2022 at 1:52 AM Baozi <wudixiaolong...@icloud.com > >>>> .invalid> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Pulsar community, > >>>>>>>> > >>>>>>>> I open a https://github.com/apache/pulsar/issues/15560 for > Function > >>>> add > >>>>>>>> NONE delivery semantics > >>>>>>>> > >>>>>>>> Let me know what you think. > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Baodi Shi > >>>>>>>> > >>>>>>>> > >>>>>>>> ## Motivation > >>>>>>>> > >>>>>>>> Currently Function supports three delivery semantics, and also > >>>> provides > >>>>>>>> autoAck to control whether to automatically ack. > >>>>>>>> Because autoAck affects the delivery semantics of Function, it can > >> be > >>>>>>>> confusing for users to understand the relationship between these > two > >>>>>>>> parameters. > >>>>>>>> > >>>>>>>> For example, when the user configures `Guarantees == ATMOST_ONCE` > >> and > >>>>>>>> `autoAck == false`, then the framework will not help the user to > ack > >>>>>>>> messages, and the processing semantics may become `ATLEAST_ONCE`. > >>>>>>>> > >>>>>>>> The delivery semantics provided by Function should be clear. When > >> the > >>>>>>> user > >>>>>>>> sets the guarantees, the framework should ensure point-to-point > >>>> semantic > >>>>>>>> processing and cannot be affected by other parameters. > >>>>>>>> > >>>>>>>> ## Goal > >>>>>>>> > >>>>>>>> Added `NONE` delivery semantics and delete `autoAck` config. > >>>>>>>> > >>>>>>>> The original intention of `autoAck` semantics is that users want > to > >>>>>>>> control the timing of ack by themselves. When autoAck == false, > the > >>>>>>>> processing semantics provided by the framework should be invalid. > >>>> Then we > >>>>>>>> can add `NONE` processing semantics to replace the autoAck == > false > >>>>>>>> scenario. > >>>>>>>> > >>>>>>>> When the user configuration `ProcessingGuarantees == NONE`, the > >>>> framework > >>>>>>>> does not help the user to do any ack operations, and the ack is > left > >>>> to > >>>>>>> the > >>>>>>>> user to handle. In other cases, the framework guarantees > processing > >>>>>>>> semantics. > >>>>>>>> > >>>>>>>> ## API Changes > >>>>>>>> 1. Add `NONE` type to ProcessingGuarantees > >>>>>>>> ``` java > >>>>>>>> public enum ProcessingGuarantees { > >>>>>>>> ATLEAST_ONCE, > >>>>>>>> ATMOST_ONCE, > >>>>>>>> EFFECTIVELY_ONCE, > >>>>>>>> NONE > >>>>>>>> } > >>>>>>>> ``` > >>>>>>>> > >>>>>>>> 2. Delete autoAck config in FunctionConfig > >>>>>>>> ``` java > >>>>>>>> public class FunctionConfig { > >>>>>>>> - private Boolean autoAck; > >>>>>>>> } > >>>>>>>> ``` > >>>>>>>> > >>>>>>>> ## Implementation > >>>>>>>> > >>>>>>>> 1. In `PulsarSinkAtLeastOnceProcessor` and > >>>>>>>> `PulsarSinkEffectivelyOnceProcessor`, when `ProcessingGuarantees > != > >>>> NONE` > >>>>>>>> can be ack. > >>>>>>>> > >>>>>>>> < > >>>>>>>> > >>>>>>> > >>>> > >> > https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/sink/PulsarSink.java#L274-L276 > >>>>>>>>> > >>>>>>>> > >>>>>>>> 2. When the delivery semantic is `ATMOST_ONCE`, the message will > be > >>>> acked > >>>>>>>> immediately after receiving the message, no longer affected by the > >>>>>>> autoAck > >>>>>>>> configuration. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >> > https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java#L271-L276 > >>>>>>>> > >>>>>>>> 3. When user call `record.ack()` in function, just > >>>> `ProcessingGuarantees > >>>>>>>> == NONE` can be work. > >>>>>>>> > >>>>>>>> ## Plan test > >>>>>>>> The main test and assert is that when ProcessingGuarantees == > NONE, > >>>> the > >>>>>>>> function framework will not do any ack operations for the user. > >>>>>>>> > >>>>>>>> ## Compatibility > >>>>>>>> 1. This change will invalidate the user's setting of autoAck, > which > >>>>>>> should > >>>>>>>> be explained in the documentation and provide parameter > verification > >>>> to > >>>>>>>> remind the user. > >>>>>>>> 2. Runtimes of other languages need to maintain consistent > >> processing > >>>>>>>> logic (python, go). > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Best Regards, > >>>>>>> Neng > >>>>>>> > >>>> > >>>> > >> > >> > > -- Best Regards, Neng