Re: [PROPOSAL] Support level increment delay for ReconsumerLater interface

2021-09-28 Thread PengHui Li
Hi Xiaolong,

Currently, in the Pulsar client, we have ack timeout, negative ack, and
reconsume later method to achieve diverse message redelivery requirements.
I agree with the client side flexible message redelivery controlling but I
have a few concerns with the new API.

1. The new API looks very similar to the existing delay-queue based
implementation but It's very different in nature, which might confuse users.
2. Does the redelivery level can be specified by users? In my opinion, we
can provide a default exponentially backed off policy for users and we'd
better support customize it.
3. I think if make some enhancements for the ack timeout is more
reasonable, the ack timeout handling will not use the delay queue such as
we have an AckTimePolicy there
And by default, we can support an ExponentiallyBackoffAckTimePolicy,
and XXXAckTimeoutPolicy and YYYAckTimeoutPolicy can be implemented by users.

Thanks,
Penghui.

On Fri, Sep 10, 2021 at 4:33 PM r...@apache.org 
wrote:

> Hello everyone:
>
> I wrote a proposal to enhance the functionality of ReconsumeLater, the
> specific content is as follows:
>
> ---
>
> # PIP 94: Support level increment delay for ReconsumerLater interface
>
> - Status: Draft
> - Author: Xiaolong Ran
> - Pull request:
> - Mailing list discussion:
> - Release:
>
> The purpose of this proposal is mainly to add ReconsumerLater on the
> consumer side to retry in an incremental level
>
> ## Motivation
>
> At present, ReconsumrLater only supports specifying a specific delay time
> for distribution processing. The usage is as follows:
>
> ```
> while (true) {
>  Message msg = consumer.receive();
>
>  try {
>   // Process message...
>
>   consumer.acknowledge(msg);
>  } catch (Throwable t) {
>   log.warn("Failed to process message");
>   consumer.reconsumeLater(msg, 1000 , TimeUnit.MILLISECONDS);
>  }
>  }
> ```
>
> Its implementation principle is to use Pulsar's built-in delay message to
> pass in the specified time as the parameter
> of deliverAfter(), and then push the message to the consumer side again
> after the time arrives.
>
> This is a good idea, which allows users to flexibly define their own delay
> time in a specific scenario. But assuming
> that the message is not processed correctly within the time specified by
> the user, the behavior of ReconsumerLater has
> ended at this time. Whether we can consider adding a retry scheme according
> to the time level. Then when the first
> specified time range is not processed correctly, ReconsumerLater() can
> automatically retry according to the time level
> until the user correctly processes the specific message.
>
> ## Implementation
>
> As mentioned above, if we can here follow a certain delay level from low to
> high and allow it to automatically retry,
> it is a more user-friendly way. For example, we can define the following
> delay level:
>
> ```
> MESSAGE_DELAYLEVEL = "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m
> 1h 2h"
> ```
>
>
> In this PIP, we mainly introduce two new API interfaces to users:
>
> 1. Specify the delay level
>
> ```
> reconsumeLater(Message message, int delayLevel)
> ```
>
> This implementation method is consistent with the current reconsumeLater
> interface, but instead of specifying the
> delay level, specify the specific delay time. For example, level `1`
> corresponds to `1s`, and level `3` corresponds to `10s`.
>
>
> 2. Retry with increasing level
>
> ```
> reconsumeLater(Message message)
> ```
>
> Different from the above two, it is a back-off retry, that is, the retry
> interval after the first failure is 1 second,
> and the retry interval after the second failure is 5 seconds, and so on,
> the more the number of times, the longer the
> interval.
>
> This kind of retry mechanism often has more practical applications in
> business scenarios. If the consumption fails,
> the general service will not be restored immediately. It is more reasonable
> to use this gradual retry method.
>
>
> ## Compatibility
>
> The current proposal will not cause damage to compatibility. It exposes two
> new API interfaces based on the
> original API for users to use, so there will be no compatibility-related
> issues here.
>
> --
> Thanks
> Xiaolong Ran
>


Re: Correct semantics of producer close

2021-09-28 Thread Joe F
Clients should not depend on any of this behaviour, since the broker is at
the other end of an unreliable  network connection. The
semantic differences are kind of meaningless from a usability point, since
flushing on close =/= published.  What exactly does "graceful" convey
here?  Flush the  buffer on the client end and hope it makes it to the
server.

Is there a  difference whether you flush(or process) pending messages  or
not? There is no guarantee that either case will ensure the message is
published.

The only way to ensure that messages are published is to wait for the ack.
The correct model should be to wait for return on the blocking API, or wait
for future completion of the async API, then handle any publish errors and
then only close the producer.


On Mon, Sep 27, 2021 at 8:50 PM Yunze Xu 
wrote:

> Hi all,
>
> Recently I found a PR (https://github.com/apache/pulsar/pull/12195 <
> https://github.com/apache/pulsar/pull/12195>) that
> modifies the existing semantics of producer close. There're already some
> communications in this PR, but I think it's better to start a discussion
> here
> to let more know.
>
> The existing implementation of producer close is:
> 1. Cancel all timers, including send and batch container
> (`batchMessageContainer`).
> 2. Complete all pending messages (`pendingMessages`) with
> `AlreadyCloseException`.
>
> See `ProducerImpl#closeAsync` for details.
>
> But the JavaDoc of `Producer#closeAsync` is:
>
> > No more writes will be accepted from this producer. Waits until all
> pending write request are persisted.
>
> Anyway, the document and implementation are inconsistent. But specifically,
> we need to define the behavior for how to process `pendingMessages` and
> `batchMessageContainer` when producer call `closeAsync`.
>
> 1. batchMessageContainer: contains the buffered single messages
> (`Message`).
> 2. pendingMessages: all inflight messages (`OpSendMsg`) in network.
>
> IMO, from the JavaDoc, only `pendingMessages` should be processed and the
> messages in `batchMessageContainer` should be discarded.
>
> Since other clients might have already implemented the similar semantics of
> Java clients. If we changed the semantics now, the behaviors among
> different
> clients might be inconsistent.
>
> Should we add a configuration to support graceful close to follow the
> docs? Or
> just change the current behavior?
>
> Thanks,
> Yunze


Re: Correct semantics of producer close

2021-09-28 Thread Yunze Xu
I can’t agree more, just like what I’ve said in PR 12195:

> At any case, when you choose `sendAsync`, you should always make use of the 
> returned future to confirm the result of all messages. In Kafka, it's the 
> send callback.

But I found many users are confused about the current behavior, especially
those are used to Kafka’s close semantics. They might expect a simple try
to flush existing messages, which works at a simple test environment, even
there's no guarantee for exception cases.



> 2021年9月28日 下午4:37,Joe F  写道:
> 
> Clients should not depend on any of this behaviour, since the broker is at
> the other end of an unreliable  network connection. The
> semantic differences are kind of meaningless from a usability point, since
> flushing on close =/= published.  What exactly does "graceful" convey
> here?  Flush the  buffer on the client end and hope it makes it to the
> server.
> 
> Is there a  difference whether you flush(or process) pending messages  or
> not? There is no guarantee that either case will ensure the message is
> published.
> 
> The only way to ensure that messages are published is to wait for the ack.
> The correct model should be to wait for return on the blocking API, or wait
> for future completion of the async API, then handle any publish errors and
> then only close the producer.
> 
> 
> On Mon, Sep 27, 2021 at 8:50 PM Yunze Xu 
> wrote:
> 
>> Hi all,
>> 
>> Recently I found a PR (https://github.com/apache/pulsar/pull/12195 <
>> https://github.com/apache/pulsar/pull/12195>) that
>> modifies the existing semantics of producer close. There're already some
>> communications in this PR, but I think it's better to start a discussion
>> here
>> to let more know.
>> 
>> The existing implementation of producer close is:
>> 1. Cancel all timers, including send and batch container
>> (`batchMessageContainer`).
>> 2. Complete all pending messages (`pendingMessages`) with
>> `AlreadyCloseException`.
>> 
>> See `ProducerImpl#closeAsync` for details.
>> 
>> But the JavaDoc of `Producer#closeAsync` is:
>> 
>>> No more writes will be accepted from this producer. Waits until all
>> pending write request are persisted.
>> 
>> Anyway, the document and implementation are inconsistent. But specifically,
>> we need to define the behavior for how to process `pendingMessages` and
>> `batchMessageContainer` when producer call `closeAsync`.
>> 
>> 1. batchMessageContainer: contains the buffered single messages
>> (`Message`).
>> 2. pendingMessages: all inflight messages (`OpSendMsg`) in network.
>> 
>> IMO, from the JavaDoc, only `pendingMessages` should be processed and the
>> messages in `batchMessageContainer` should be discarded.
>> 
>> Since other clients might have already implemented the similar semantics of
>> Java clients. If we changed the semantics now, the behaviors among
>> different
>> clients might be inconsistent.
>> 
>> Should we add a configuration to support graceful close to follow the
>> docs? Or
>> just change the current behavior?
>> 
>> Thanks,
>> Yunze



[GitHub] [pulsar-client-node] massakam commented on issue #122: npm install fails on win10

2021-09-28 Thread GitBox


massakam commented on issue #122:
URL: 
https://github.com/apache/pulsar-client-node/issues/122#issuecomment-929050029


   > IMHO, the solution is to have this project publish the c++ binaries. You 
can see from the following output that npm/node-gyp is trying to fetch the 
prebuilt.
   
   Unfortunately, there are no plans to publish pre-built C++ binaries at this 
time. Building and publishing binaries for different operating systems, 
NODE_MODULE_VERSIONs, and instruction set architectures can be costly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: Adding a new token and GH secret for the doc bot

2021-09-28 Thread Enrico Olivelli
I have created a INFRA ticket
https://issues.apache.org/jira/browse/INFRA-22368

I will provide a personal token

If anyone has a better idea please chime in

Any help is really appreciated

Enrico

Il giorno gio 23 set 2021 alle ore 08:59 Enrico Olivelli <
eolive...@gmail.com> ha scritto:

> Hello,
> it looks like we need to add a new GH token as GH secret in order to make
> the documentation bot work.
> the current token has not write privileges (is this statement correct?)
> and cannot add labels and comments.
>
> My plan is to:
> 1) create a new token
> 2) add it as GH secret
> 3) Allow the Doc bot to be enabled
>
> for 1), is it okay if I use one personal token ? I am not sure I can
> restrict it only to Pulsar and related repositories (we will have to add
> this to every active Pulsar repo, like adapters, nodejs, go client),
> otherwise, shall I ask ASF INFRA to generate one ?
>
> for 2) If I remember correctly I will have to create a JIRA ticket to ASF
> INFRA and they will add the secret
>
> for 3) we already have a PR for the Pulsar repo, we will update it in
> order to use the new token, then we can merge it and create new PRs for all
> the other repositories
>
> Any guidance is well appreciated
>
> Cheers
> Enrico
>


Re: [VOTE] PIP-99 Pulsar Proxy Extensions

2021-09-28 Thread Yunze Xu
+1 (non binding)

Thanks,
Yunze


[PIP 100] Add seekByIndex for consumer api

2021-09-28 Thread JiangHaiting
Hi Pulsar Community,


I'm glad to have this opportunity to propose this PIP.


Currently we can reset the read position of a cursor by message id or 
timestamp. 
Since we formerly introduced index in broker metadata since 2.9.0, 
reset cursor by index is very helpful in other protocol handler (KoP or RoP).


Also, as @BewareMyPower pointed out that 
"users might want to seek to 1 messages before. Currently they cannot 
achieve this goal.". 
And this PIP will make it possible.


I've already created a PR, see details at 
https://github.com/apache/pulsar/pull/12032



Thanks,
Haiting Jiang (Github:Jason918)

Re: [PIP 100] Add seekByIndex for consumer api

2021-09-28 Thread Yunze Xu
You need to create an issue first to start a discussion for your PIP.

Here’s the process of PIP:

1. The author(s) of the proposal will create a GitHub issue ticket choosing the
   template for PIP proposals.
2. The author(s) will send a note to the dev@pulsar.apache.org 
 mailing list
   to start the discussion, using subject prefix `[PIP] xxx`.
3. Based on the discussion and feedback, some changes might be applied by
   authors to the text of the proposal.
4. Once some consensus is reached, there will be a vote to formally approve
   the proposal.
   The vote will be held on the dev@pulsar.apache.org 
 mailing list. Everyone
   is welcome to vote on the proposal, though it will considered to be binding
   only the vote of PMC members.
   I would be required to have a lazy majority of at least 3 binding +1s votes.
   The vote should stay open for at least 48 hours.
5. When the vote is closed, if the outcome is positive, the state of the
   proposal is updated and the Pull Requests associated with this proposal can
   start to get merged into the master branch.

Thanks,
Yunze

> 2021年9月28日 下午10:04,JiangHaiting  写道:
> 
> Hi Pulsar Community,
> 
> 
> I'm glad to have this opportunity to propose this PIP.
> 
> 
> Currently we can reset the read position of a cursor by message id or 
> timestamp. 
> Since we formerly introduced index in broker metadata since 2.9.0, 
> reset cursor by index is very helpful in other protocol handler (KoP or RoP).
> 
> 
> Also, as @BewareMyPower pointed out that 
> "users might want to seek to 1 messages before. Currently they cannot 
> achieve this goal.". 
> And this PIP will make it possible.
> 
> 
> I've already created a PR, see details at 
> https://github.com/apache/pulsar/pull/12032
> 
> 
> 
> Thanks,
> Haiting Jiang (Github:Jason918)



Re: [PIP 100] Add seekByIndex for consumer api

2021-09-28 Thread Yunze Xu
I have edited the Wiki page for PIP
 https://github.com/apache/pulsar/wiki/Pulsar-Improvement-Proposal-(PIP) 
 

Thanks,
Yunze

Re: [PROPOSAL] Support level increment delay for ReconsumerLater interface

2021-09-28 Thread Chris Kellogg
It seems like an application can already do something like this without any
changes to the clients?

An application could read the message properties and then figure out the
next time interval it should use the for the ReconsumeLater method. The
leveling concept could be a small piece of code the application implements
to figure out the next time to reconsume the message.

I think this API is a little confusing as well. I'll for more flexibility
if users need that. However, in my opinion if this feature is really needed
an interface should be exposed as an API as opposed to a specific
implementation.



On Tue, Sep 28, 2021 at 4:01 AM PengHui Li  wrote:

> Hi Xiaolong,
>
> Currently, in the Pulsar client, we have ack timeout, negative ack, and
> reconsume later method to achieve diverse message redelivery requirements.
> I agree with the client side flexible message redelivery controlling but I
> have a few concerns with the new API.
>
> 1. The new API looks very similar to the existing delay-queue based
> implementation but It's very different in nature, which might confuse
> users.
> 2. Does the redelivery level can be specified by users? In my opinion, we
> can provide a default exponentially backed off policy for users and we'd
> better support customize it.
> 3. I think if make some enhancements for the ack timeout is more
> reasonable, the ack timeout handling will not use the delay queue such as
> we have an AckTimePolicy there
> And by default, we can support an ExponentiallyBackoffAckTimePolicy,
> and XXXAckTimeoutPolicy and YYYAckTimeoutPolicy can be implemented by
> users.
>
> Thanks,
> Penghui.
>
> On Fri, Sep 10, 2021 at 4:33 PM r...@apache.org 
> wrote:
>
> > Hello everyone:
> >
> > I wrote a proposal to enhance the functionality of ReconsumeLater, the
> > specific content is as follows:
> >
> > ---
> >
> > # PIP 94: Support level increment delay for ReconsumerLater interface
> >
> > - Status: Draft
> > - Author: Xiaolong Ran
> > - Pull request:
> > - Mailing list discussion:
> > - Release:
> >
> > The purpose of this proposal is mainly to add ReconsumerLater on the
> > consumer side to retry in an incremental level
> >
> > ## Motivation
> >
> > At present, ReconsumrLater only supports specifying a specific delay time
> > for distribution processing. The usage is as follows:
> >
> > ```
> > while (true) {
> >  Message msg = consumer.receive();
> >
> >  try {
> >   // Process message...
> >
> >   consumer.acknowledge(msg);
> >  } catch (Throwable t) {
> >   log.warn("Failed to process message");
> >   consumer.reconsumeLater(msg, 1000 , TimeUnit.MILLISECONDS);
> >  }
> >  }
> > ```
> >
> > Its implementation principle is to use Pulsar's built-in delay message to
> > pass in the specified time as the parameter
> > of deliverAfter(), and then push the message to the consumer side again
> > after the time arrives.
> >
> > This is a good idea, which allows users to flexibly define their own
> delay
> > time in a specific scenario. But assuming
> > that the message is not processed correctly within the time specified by
> > the user, the behavior of ReconsumerLater has
> > ended at this time. Whether we can consider adding a retry scheme
> according
> > to the time level. Then when the first
> > specified time range is not processed correctly, ReconsumerLater() can
> > automatically retry according to the time level
> > until the user correctly processes the specific message.
> >
> > ## Implementation
> >
> > As mentioned above, if we can here follow a certain delay level from low
> to
> > high and allow it to automatically retry,
> > it is a more user-friendly way. For example, we can define the following
> > delay level:
> >
> > ```
> > MESSAGE_DELAYLEVEL = "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m
> 30m
> > 1h 2h"
> > ```
> >
> >
> > In this PIP, we mainly introduce two new API interfaces to users:
> >
> > 1. Specify the delay level
> >
> > ```
> > reconsumeLater(Message message, int delayLevel)
> > ```
> >
> > This implementation method is consistent with the current reconsumeLater
> > interface, but instead of specifying the
> > delay level, specify the specific delay time. For example, level `1`
> > corresponds to `1s`, and level `3` corresponds to `10s`.
> >
> >
> > 2. Retry with increasing level
> >
> > ```
> > reconsumeLater(Message message)
> > ```
> >
> > Different from the above two, it is a back-off retry, that is, the retry
> > interval after the first failure is 1 second,
> > and the retry interval after the second failure is 5 seconds, and so on,
> > the more the number of times, the longer the
> > interval.
> >
> > This kind of retry mechanism often has more practical applications in
> > business scenarios. If the consumption fails,
> > the general service will not be restored immediately. It is more
> reasonable
> > to use this gradual retry method.
> >
> >
> > ## Compatibility
> >
> > The current proposal will not cause 

[GitHub] [pulsar-client-node] massakam commented on issue #122: npm install fails on win10

2021-09-28 Thread GitBox


massakam commented on issue #122:
URL: 
https://github.com/apache/pulsar-client-node/issues/122#issuecomment-929050029


   > IMHO, the solution is to have this project publish the c++ binaries. You 
can see from the following output that npm/node-gyp is trying to fetch the 
prebuilt.
   
   Unfortunately, there are no plans to publish pre-built C++ binaries at this 
time. Building and publishing binaries for different operating systems, 
NODE_MODULE_VERSIONs, and instruction set architectures can be costly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




PIP-100: Pulsar pluggable topic factory

2021-09-28 Thread Rajan Dhabalia
Hi,

I would like to propose a Pluggable topic factory in pulsar. Users select
the topic type (persistent/non-persistent) based on the application use
case and requirement of topic behavior. However, in some circumstances,
users need some additional behavior on top of the existing implementation
and even would like to inject custom workflow in existing topic behavior.
Such special circumstances are mostly needed when users would like to do
smooth migrations of topics or pulsar clusters without impacting producer
and consumer applications. In such scenarios, users can override publish or
dispatch behavior of the topic and plug in the additional workflow. For
example: perform dual write on multiple topics while migration or skip
messages published from the specific source without explicit publish
failures, ignore specific subscription source without generating a
client-side error, or without impacting client applications. This feature
will be useful for any kind of migration where the pulsar cluster is
serving live topics and require custom topic level treatment for flawless
server-side migration and without impacting client applications, especially
legacy applications which are hard to change.

I have added a proposal to wiki page:
https://github.com/apache/pulsar/wiki/PIP-100:-Pulsar-pluggable-topic-factory


Thanks,
Rajan


Re: Correct semantics of producer close

2021-09-28 Thread Michael Marshall
Thanks for bringing this thread to the mailing list, Yunze.

I think the right change is to update the `closeAsync` method to first
flush `batchMessageContainer` and to then asynchronously wait for the
`pendingMessages` queue to drain. We could add a new timeout or rely
on the already implemented `sendTimeout` config to put an upper time
limit on `closeAsync`. My reasoning as well as responses to Joe and
Yunze follow:

> we need to define the behavior for how to process `pendingMessages`
> and `batchMessageContainer` when producer call `closeAsync`.

Yes, this is exactly the clarification required, and I agree that the
Javadoc is ambiguous and that the implementation doesn't align with
the Javadoc.

If we view the Javadoc as binding, then the fundamental question is
what messages are "pending". The `pendingMessages` seem pretty easy to
classify as "pending" given that they are already in flight on the
network.

I also consider `batchMessageContainer` to be "pending" because a
client application already has callbacks for the messages in this
container. These callbacks are expected to complete when the batch
message delivery completes. Since the client application already has a
reference to a callback, it isn't a problem that the producer
implementation initiates the flush logic. (Note that the current
design fails the `pendingMessages` but does not fail the
`batchMessageContainer` when `closeAsync` is called, so the callbacks
for that container are currently left incomplete forever if the client
is closed with an unsent batch. We will need to address this design in
the work that comes from this discussion.)

Further, the `ProducerImpl#failPendingMessages` method includes logic
to call `ProducerImpl#failPendingBatchMessages`, which implies that
these batched, but not sent, messages have been historically
considered "pending".

If we view the Javadoc as non-binding, I think my guiding influence
for the new design would be that the `closeAsync` method should result
in a "graceful" shutdown of the client.

> What exactly does "graceful" convey here?

This is a great question, and will likely drive the design here. I
view graceful to mean that the producer attempts to avoid artificial
failures. That means trying to drain the queue instead of
automatically failing all of the queue's callbacks. The tradeoff is
that closing the producer takes longer. This reasoning would justify
my claim that we should first flush the `batchMessageContainer`
instead of failing the batch without any effort at delivery, as that
would be artificial.

> There is no guarantee that either case will ensure the message
> is published.

I don't think that implementing `closeAsync` with graceful shutdown
logic implies a guarantee of message publishing. Rather, it guarantees
that failures will be the result of a real exception or a timeout.
Since calling `closeAsync` prevents additional messages from
delivering, users leveraging this functionality might be operating
with "at most once" delivery semantics where they'd prefer to deliver
the messages if possible, but they aren't going to delay application
shutdown indefinitely to deliver its last messages. If users need
stronger guarantees about whether their messages are delivered, they
are probably already using the flush methods to ensure that the
producer's queues are empty before calling `closeAsync`.

I also agree that in all of these cases, we're assuming that users are
capturing references to the async callbacks and then making business
logic decisions based on the results of those callbacks.

Thanks,
Michael

On Tue, Sep 28, 2021 at 4:58 AM Yunze Xu  wrote:
>
> I can’t agree more, just like what I’ve said in PR 12195:
>
> > At any case, when you choose `sendAsync`, you should always make use of the 
> > returned future to confirm the result of all messages. In Kafka, it's the 
> > send callback.
>
> But I found many users are confused about the current behavior, especially
> those are used to Kafka’s close semantics. They might expect a simple try
> to flush existing messages, which works at a simple test environment, even
> there's no guarantee for exception cases.
>
>
>
> > 2021年9月28日 下午4:37,Joe F  写道:
> >
> > Clients should not depend on any of this behaviour, since the broker is at
> > the other end of an unreliable  network connection. The
> > semantic differences are kind of meaningless from a usability point, since
> > flushing on close =/= published.  What exactly does "graceful" convey
> > here?  Flush the  buffer on the client end and hope it makes it to the
> > server.
> >
> > Is there a  difference whether you flush(or process) pending messages  or
> > not? There is no guarantee that either case will ensure the message is
> > published.
> >
> > The only way to ensure that messages are published is to wait for the ack.
> > The correct model should be to wait for return on the blocking API, or wait
> > for future completion of the async API, then handle any publish errors and

Re: Problems with PIP ID Management

2021-09-28 Thread Michael Marshall
Thanks for your feedback Enrico.

I'd appreciate any other thoughts on this process since I am proposing
a change to a recently implemented PIP.

Also, if we like this new direction, I think we should update our
GitHub issue template to communicate this new process. I already have
a PR for my proposed changes:
https://github.com/apache/pulsar/pull/12176. Please take a look, if
you're able.

Thanks,
Michael

On Sat, Sep 25, 2021 at 1:29 AM Enrico Olivelli  wrote:
>
> Michael,
>
> Il Ven 24 Set 2021, 19:47 Michael Marshall  ha
> scritto:
>
> > Hi Enrico,
> >
> > Thank you for raising this concern. Apparently acquiring a distributed
> > lock on a PIP number is non-trivial :)
> >
>
> We could use the first 3 digits of a hash computed on the title of the PIP.
> Just joking:)
>
>
> > > What about sending an email to dev in order to ask for a new PIP id ?
> >
> > In this design, is it up to a committer (someone with write access to
> > the Wiki page) to reply to the request with a PIP number? I think this
> > could lead to confusion, as no one person would have the job and there
> > could be delays in responding.
> >
> > Instead of receiving a PIP number, I think we could ask users to
> > inspect the mailing list archive to determine the next PIP number.
> >
>
> This is a good idea.
> And it perfectly matches the rule that in the ASF is the main source of
> truth.
> So I totally support this idea
>
>
>
>
> Then, the mailing list is the source of truth for PIP numbers, and
> > there isn't any delay. This protocol can still have collisions if
> > multiple people send PIPs at the same time. However, I think these
> > collisions could be resolved quickly: the earliest email in the
> > archive wins, and others need to pick the next number(s).
> >
> > Also, this raises a secondary question for me. In our new PIP process,
> > are we copying PIPs from GitHub issues into the GitHub wiki?
> >
> > In order to push this conversation forward, I wrote a PR with my
> > proposed solution to the problem:
> > https://github.com/apache/pulsar/pull/12176. I am not attached to my
> > solution per se, but I do think a solution to this problem should
> > include an update to the GitHub template to help new PIP writers
> > easily understand the process.
> >
>
> Agreed
>
> Thanks
> Enrico
>
>
> > > Does anyone volunteer to clean up the PIP list page ?
> > Once we settle on a protocol, we will need to clean up, as now we have
> > two PIP 97s.
> >
> > Thanks,
> > Michael
> >
> > On Thu, Sep 23, 2021 at 9:51 AM Enrico Olivelli 
> > wrote:
> > >
> > > Hello,
> > > Today I had to rename my PIP from PIP-93 to PIP-95 because PIP-93 was
> > > picked up.
> > > Then I realised that PIP95 already exists on GitHub, then I moved to
> > PIP-97.
> > >
> > > There are multiple PIP-95 on the Wiki page.
> > >
> > > https://github.com/apache/pulsar/wiki
> > >
> > > We need a way to assign new PIP ids
> > >
> > > What about sending an email to dev in order to ask for a new PIP id ?
> > >
> > > Only committers have write privileges to the Wiki pages, so any PIP id
> > > assignment needs a sponsor that assigns the PIP id and creates the wiki
> > > page.
> > >
> > > I propose to add a link to the PIP issue in the wiki page until the PIP
> > is
> > > accepted,
> > > because we recently decided to run PIP discussions as GH issues
> > >
> > > Does anyone volunteer to clean up the PIP list page ?
> > >
> > > Regards
> > > Enrico
> >


Re: Correct semantics of producer close

2021-09-28 Thread Yunze Xu
It’s a good point that `ProducerImpl#failPendingBatchMessages` treats
messages in batch container also as pending messages.

I agree with your definition of "graceful close”. It’s more like a “at most 
once”
semantics, like the original JavaDoc said

> pending writes will not be retried

Thanks,
Yunze

> 2021年9月29日 上午5:24,Michael Marshall  写道:
> 
> Thanks for bringing this thread to the mailing list, Yunze.
> 
> I think the right change is to update the `closeAsync` method to first
> flush `batchMessageContainer` and to then asynchronously wait for the
> `pendingMessages` queue to drain. We could add a new timeout or rely
> on the already implemented `sendTimeout` config to put an upper time
> limit on `closeAsync`. My reasoning as well as responses to Joe and
> Yunze follow:
> 
>> we need to define the behavior for how to process `pendingMessages`
>> and `batchMessageContainer` when producer call `closeAsync`.
> 
> Yes, this is exactly the clarification required, and I agree that the
> Javadoc is ambiguous and that the implementation doesn't align with
> the Javadoc.
> 
> If we view the Javadoc as binding, then the fundamental question is
> what messages are "pending". The `pendingMessages` seem pretty easy to
> classify as "pending" given that they are already in flight on the
> network.
> 
> I also consider `batchMessageContainer` to be "pending" because a
> client application already has callbacks for the messages in this
> container. These callbacks are expected to complete when the batch
> message delivery completes. Since the client application already has a
> reference to a callback, it isn't a problem that the producer
> implementation initiates the flush logic. (Note that the current
> design fails the `pendingMessages` but does not fail the
> `batchMessageContainer` when `closeAsync` is called, so the callbacks
> for that container are currently left incomplete forever if the client
> is closed with an unsent batch. We will need to address this design in
> the work that comes from this discussion.)
> 
> Further, the `ProducerImpl#failPendingMessages` method includes logic
> to call `ProducerImpl#failPendingBatchMessages`, which implies that
> these batched, but not sent, messages have been historically
> considered "pending".
> 
> If we view the Javadoc as non-binding, I think my guiding influence
> for the new design would be that the `closeAsync` method should result
> in a "graceful" shutdown of the client.
> 
>> What exactly does "graceful" convey here?
> 
> This is a great question, and will likely drive the design here. I
> view graceful to mean that the producer attempts to avoid artificial
> failures. That means trying to drain the queue instead of
> automatically failing all of the queue's callbacks. The tradeoff is
> that closing the producer takes longer. This reasoning would justify
> my claim that we should first flush the `batchMessageContainer`
> instead of failing the batch without any effort at delivery, as that
> would be artificial.
> 
>> There is no guarantee that either case will ensure the message
>> is published.
> 
> I don't think that implementing `closeAsync` with graceful shutdown
> logic implies a guarantee of message publishing. Rather, it guarantees
> that failures will be the result of a real exception or a timeout.
> Since calling `closeAsync` prevents additional messages from
> delivering, users leveraging this functionality might be operating
> with "at most once" delivery semantics where they'd prefer to deliver
> the messages if possible, but they aren't going to delay application
> shutdown indefinitely to deliver its last messages. If users need
> stronger guarantees about whether their messages are delivered, they
> are probably already using the flush methods to ensure that the
> producer's queues are empty before calling `closeAsync`.
> 
> I also agree that in all of these cases, we're assuming that users are
> capturing references to the async callbacks and then making business
> logic decisions based on the results of those callbacks.
> 
> Thanks,
> Michael
> 
> On Tue, Sep 28, 2021 at 4:58 AM Yunze Xu  wrote:
>> 
>> I can’t agree more, just like what I’ve said in PR 12195:
>> 
>>> At any case, when you choose `sendAsync`, you should always make use of the 
>>> returned future to confirm the result of all messages. In Kafka, it's the 
>>> send callback.
>> 
>> But I found many users are confused about the current behavior, especially
>> those are used to Kafka’s close semantics. They might expect a simple try
>> to flush existing messages, which works at a simple test environment, even
>> there's no guarantee for exception cases.
>> 
>> 
>> 
>>> 2021年9月28日 下午4:37,Joe F  写道:
>>> 
>>> Clients should not depend on any of this behaviour, since the broker is at
>>> the other end of an unreliable  network connection. The
>>> semantic differences are kind of meaningless from a usability point, since
>>> flushing on close =/= published.  What exactly does "grac

Re: [PROPOSAL] Support level increment delay for ReconsumerLater interface

2021-09-28 Thread r...@apache.org
> I think if make some enhancements for the ack timeout is more
> reasonable, the ack timeout handling will not use the delay queue such as
> we have an AckTimePolicy

Thanks PenghuiLi,

Acktimeout has its own problems, such as how long should we set a
reasonable time. So in the Go SDK, we abandoned this method and used Nack's
logic instead of AckTimeout. If this is the case, can we also consider
adding an interface similar to NackPolicy in the logic of Nack to expose it
to users, and users can customize their own Nack policy.

> 3. I think if make some enhancements for the ack timeout is more
> reasonable, the ack timeout handling will not use the delay queue such as
> we have an AckTimePolicy there

AckTimeout or Delay Message is essentially the implementation mechanism of
Time Tracker internally. From this point of view, I think AckTimeout or
DelayMessage-based methods are both feasible. Most of the code logic can be
reused using the existing DelayMessage solution. If you use AckTimeout to
implement it, we need to repackage the Time Tracker here

--
Thanks
Xiaolong Ran

Chris Kellogg  于2021年9月28日周二 下午11:45写道:

> It seems like an application can already do something like this without any
> changes to the clients?
>
> An application could read the message properties and then figure out the
> next time interval it should use the for the ReconsumeLater method. The
> leveling concept could be a small piece of code the application implements
> to figure out the next time to reconsume the message.
>
> I think this API is a little confusing as well. I'll for more flexibility
> if users need that. However, in my opinion if this feature is really needed
> an interface should be exposed as an API as opposed to a specific
> implementation.
>
>
>
> On Tue, Sep 28, 2021 at 4:01 AM PengHui Li  wrote:
>
> > Hi Xiaolong,
> >
> > Currently, in the Pulsar client, we have ack timeout, negative ack, and
> > reconsume later method to achieve diverse message redelivery
> requirements.
> > I agree with the client side flexible message redelivery controlling but
> I
> > have a few concerns with the new API.
> >
> > 1. The new API looks very similar to the existing delay-queue based
> > implementation but It's very different in nature, which might confuse
> > users.
> > 2. Does the redelivery level can be specified by users? In my opinion, we
> > can provide a default exponentially backed off policy for users and we'd
> > better support customize it.
> > 3. I think if make some enhancements for the ack timeout is more
> > reasonable, the ack timeout handling will not use the delay queue such as
> > we have an AckTimePolicy there
> > And by default, we can support an ExponentiallyBackoffAckTimePolicy,
> > and XXXAckTimeoutPolicy and YYYAckTimeoutPolicy can be implemented by
> > users.
> >
> > Thanks,
> > Penghui.
> >
> > On Fri, Sep 10, 2021 at 4:33 PM r...@apache.org  >
> > wrote:
> >
> > > Hello everyone:
> > >
> > > I wrote a proposal to enhance the functionality of ReconsumeLater, the
> > > specific content is as follows:
> > >
> > > ---
> > >
> > > # PIP 94: Support level increment delay for ReconsumerLater interface
> > >
> > > - Status: Draft
> > > - Author: Xiaolong Ran
> > > - Pull request:
> > > - Mailing list discussion:
> > > - Release:
> > >
> > > The purpose of this proposal is mainly to add ReconsumerLater on the
> > > consumer side to retry in an incremental level
> > >
> > > ## Motivation
> > >
> > > At present, ReconsumrLater only supports specifying a specific delay
> time
> > > for distribution processing. The usage is as follows:
> > >
> > > ```
> > > while (true) {
> > >  Message msg = consumer.receive();
> > >
> > >  try {
> > >   // Process message...
> > >
> > >   consumer.acknowledge(msg);
> > >  } catch (Throwable t) {
> > >   log.warn("Failed to process message");
> > >   consumer.reconsumeLater(msg, 1000 , TimeUnit.MILLISECONDS);
> > >  }
> > >  }
> > > ```
> > >
> > > Its implementation principle is to use Pulsar's built-in delay message
> to
> > > pass in the specified time as the parameter
> > > of deliverAfter(), and then push the message to the consumer side again
> > > after the time arrives.
> > >
> > > This is a good idea, which allows users to flexibly define their own
> > delay
> > > time in a specific scenario. But assuming
> > > that the message is not processed correctly within the time specified
> by
> > > the user, the behavior of ReconsumerLater has
> > > ended at this time. Whether we can consider adding a retry scheme
> > according
> > > to the time level. Then when the first
> > > specified time range is not processed correctly, ReconsumerLater() can
> > > automatically retry according to the time level
> > > until the user correctly processes the specific message.
> > >
> > > ## Implementation
> > >
> > > As mentioned above, if we can here follow a certain delay level from
> low
> > to
> > > high and allow it to automatically retr

Re: [PROPOSAL] Support level increment delay for ReconsumerLater interface

2021-09-28 Thread r...@apache.org
Thanks Chris:

> It seems like an application can already do something like this without
any
> changes to the clients?

Now we can't implement such logic. Although we now provide the interface of
reconsumerLater, the real situation is that if there are some problems in
the system, it may be difficult to receive the message correctly after a
retry. We need to manually trigger the logic of ReconsumerLater multiple
times, so whether we can implement ReconsumerLater In the form of automatic
fallback, allowing users to retry multiple times, instead of the current
solution that only retryes once.

> An application could read the message properties and then figure out the
> next time interval it should use the for the ReconsumeLater method. The
> leveling concept could be a small piece of code the application implements
> to figure out the next time to reconsume the message.

Yes, In the implementation of the Go SDK, we also put the time of the next
retry in the message properties, and call ReconsumerLater again by reading
the time in the message properties. Now our point of disagreement is
whether this part of the code logic is whether we encapsulate a new
interface and expose it to the user, or whether we let the user implement
this part of the code logic. I prefer the first option.

--

Thanks
Xiaolong Ran

Chris Kellogg  于2021年9月28日周二 下午11:45写道:

> It seems like an application can already do something like this without any
> changes to the clients?
>
> An application could read the message properties and then figure out the
> next time interval it should use the for the ReconsumeLater method. The
> leveling concept could be a small piece of code the application implements
> to figure out the next time to reconsume the message.
>
> I think this API is a little confusing as well. I'll for more flexibility
> if users need that. However, in my opinion if this feature is really needed
> an interface should be exposed as an API as opposed to a specific
> implementation.
>
>
>
> On Tue, Sep 28, 2021 at 4:01 AM PengHui Li  wrote:
>
> > Hi Xiaolong,
> >
> > Currently, in the Pulsar client, we have ack timeout, negative ack, and
> > reconsume later method to achieve diverse message redelivery
> requirements.
> > I agree with the client side flexible message redelivery controlling but
> I
> > have a few concerns with the new API.
> >
> > 1. The new API looks very similar to the existing delay-queue based
> > implementation but It's very different in nature, which might confuse
> > users.
> > 2. Does the redelivery level can be specified by users? In my opinion, we
> > can provide a default exponentially backed off policy for users and we'd
> > better support customize it.
> > 3. I think if make some enhancements for the ack timeout is more
> > reasonable, the ack timeout handling will not use the delay queue such as
> > we have an AckTimePolicy there
> > And by default, we can support an ExponentiallyBackoffAckTimePolicy,
> > and XXXAckTimeoutPolicy and YYYAckTimeoutPolicy can be implemented by
> > users.
> >
> > Thanks,
> > Penghui.
> >
> > On Fri, Sep 10, 2021 at 4:33 PM r...@apache.org  >
> > wrote:
> >
> > > Hello everyone:
> > >
> > > I wrote a proposal to enhance the functionality of ReconsumeLater, the
> > > specific content is as follows:
> > >
> > > ---
> > >
> > > # PIP 94: Support level increment delay for ReconsumerLater interface
> > >
> > > - Status: Draft
> > > - Author: Xiaolong Ran
> > > - Pull request:
> > > - Mailing list discussion:
> > > - Release:
> > >
> > > The purpose of this proposal is mainly to add ReconsumerLater on the
> > > consumer side to retry in an incremental level
> > >
> > > ## Motivation
> > >
> > > At present, ReconsumrLater only supports specifying a specific delay
> time
> > > for distribution processing. The usage is as follows:
> > >
> > > ```
> > > while (true) {
> > >  Message msg = consumer.receive();
> > >
> > >  try {
> > >   // Process message...
> > >
> > >   consumer.acknowledge(msg);
> > >  } catch (Throwable t) {
> > >   log.warn("Failed to process message");
> > >   consumer.reconsumeLater(msg, 1000 , TimeUnit.MILLISECONDS);
> > >  }
> > >  }
> > > ```
> > >
> > > Its implementation principle is to use Pulsar's built-in delay message
> to
> > > pass in the specified time as the parameter
> > > of deliverAfter(), and then push the message to the consumer side again
> > > after the time arrives.
> > >
> > > This is a good idea, which allows users to flexibly define their own
> > delay
> > > time in a specific scenario. But assuming
> > > that the message is not processed correctly within the time specified
> by
> > > the user, the behavior of ReconsumerLater has
> > > ended at this time. Whether we can consider adding a retry scheme
> > according
> > > to the time level. Then when the first
> > > specified time range is not processed correctly, ReconsumerLater() can
> > > automatically retry according to the time lev

Re: [PROPOSAL] Support level increment delay for ReconsumerLater interface

2021-09-28 Thread r...@apache.org
> 1. The new API looks very similar to the existing delay-queue based
> implementation but It's very different in nature, which might confuse
users.

Hello PenghuiLi:

In response to this problem, I am also a bit confused. The current
implementation of ReconsumerLater is also the solution of calling
DelayMessage. This PIP is an enhancement of the current ReconsumerLater, so
the implementation here is whether we continue to use DelayMessage or we
Repackage Time Tracker by yourself, stripping a clean API interface.

--

Thanks
Xiaolong Ran

PengHui Li  于2021年9月28日周二 下午4:02写道:

> Hi Xiaolong,
>
> Currently, in the Pulsar client, we have ack timeout, negative ack, and
> reconsume later method to achieve diverse message redelivery requirements.
> I agree with the client side flexible message redelivery controlling but I
> have a few concerns with the new API.
>
> 1. The new API looks very similar to the existing delay-queue based
> implementation but It's very different in nature, which might confuse
> users.
> 2. Does the redelivery level can be specified by users? In my opinion, we
> can provide a default exponentially backed off policy for users and we'd
> better support customize it.
> 3. I think if make some enhancements for the ack timeout is more
> reasonable, the ack timeout handling will not use the delay queue such as
> we have an AckTimePolicy there
> And by default, we can support an ExponentiallyBackoffAckTimePolicy,
> and XXXAckTimeoutPolicy and YYYAckTimeoutPolicy can be implemented by
> users.
>
> Thanks,
> Penghui.
>
> On Fri, Sep 10, 2021 at 4:33 PM r...@apache.org 
> wrote:
>
> > Hello everyone:
> >
> > I wrote a proposal to enhance the functionality of ReconsumeLater, the
> > specific content is as follows:
> >
> > ---
> >
> > # PIP 94: Support level increment delay for ReconsumerLater interface
> >
> > - Status: Draft
> > - Author: Xiaolong Ran
> > - Pull request:
> > - Mailing list discussion:
> > - Release:
> >
> > The purpose of this proposal is mainly to add ReconsumerLater on the
> > consumer side to retry in an incremental level
> >
> > ## Motivation
> >
> > At present, ReconsumrLater only supports specifying a specific delay time
> > for distribution processing. The usage is as follows:
> >
> > ```
> > while (true) {
> >  Message msg = consumer.receive();
> >
> >  try {
> >   // Process message...
> >
> >   consumer.acknowledge(msg);
> >  } catch (Throwable t) {
> >   log.warn("Failed to process message");
> >   consumer.reconsumeLater(msg, 1000 , TimeUnit.MILLISECONDS);
> >  }
> >  }
> > ```
> >
> > Its implementation principle is to use Pulsar's built-in delay message to
> > pass in the specified time as the parameter
> > of deliverAfter(), and then push the message to the consumer side again
> > after the time arrives.
> >
> > This is a good idea, which allows users to flexibly define their own
> delay
> > time in a specific scenario. But assuming
> > that the message is not processed correctly within the time specified by
> > the user, the behavior of ReconsumerLater has
> > ended at this time. Whether we can consider adding a retry scheme
> according
> > to the time level. Then when the first
> > specified time range is not processed correctly, ReconsumerLater() can
> > automatically retry according to the time level
> > until the user correctly processes the specific message.
> >
> > ## Implementation
> >
> > As mentioned above, if we can here follow a certain delay level from low
> to
> > high and allow it to automatically retry,
> > it is a more user-friendly way. For example, we can define the following
> > delay level:
> >
> > ```
> > MESSAGE_DELAYLEVEL = "1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m
> 30m
> > 1h 2h"
> > ```
> >
> >
> > In this PIP, we mainly introduce two new API interfaces to users:
> >
> > 1. Specify the delay level
> >
> > ```
> > reconsumeLater(Message message, int delayLevel)
> > ```
> >
> > This implementation method is consistent with the current reconsumeLater
> > interface, but instead of specifying the
> > delay level, specify the specific delay time. For example, level `1`
> > corresponds to `1s`, and level `3` corresponds to `10s`.
> >
> >
> > 2. Retry with increasing level
> >
> > ```
> > reconsumeLater(Message message)
> > ```
> >
> > Different from the above two, it is a back-off retry, that is, the retry
> > interval after the first failure is 1 second,
> > and the retry interval after the second failure is 5 seconds, and so on,
> > the more the number of times, the longer the
> > interval.
> >
> > This kind of retry mechanism often has more practical applications in
> > business scenarios. If the consumption fails,
> > the general service will not be restored immediately. It is more
> reasonable
> > to use this gradual retry method.
> >
> >
> > ## Compatibility
> >
> > The current proposal will not cause damage to compatibility. It exposes
> two
> > new API interfaces based on the
> >

Re: [PIP 100] Add seekByIndex for consumer api

2021-09-28 Thread JiangHaiting
Hi Yunze,


Thanks for the reminding. 
I will come up with the issue and new mail for this pip.


-- Original --
From:   
 "dev"  
  
mailto:dev@pulsar.apache.org>; mailing list
 to start the discussion, using subject prefix `[PIP] xxx`.
3. Based on the discussion and feedback, some changes might be applied by
 authors to the text of the proposal.
4. Once some consensus is reached, there will be a vote to formally approve
 the proposal.
 The vote will be held on the dev@pulsar.apache.org 
; mailing list. Everyone
 is welcome to vote on the proposal, though it will considered to be binding
 only the vote of PMC members.
 I would be required to have a lazy majority of at least 3 binding +1s votes.
 The vote should stay open for at least 48 hours.
5. When the vote is closed, if the outcome is positive, the state of the
 proposal is updated and the Pull Requests associated with this proposal can
 start to get merged into the master branch.

Thanks,
Yunze

> 2021??9??28?? 10:04??JiangHaiting https://github.com/apache/pulsar/pull/12032
> 
> 
> 
> Thanks,
> Haiting Jiang (Github:Jason918)





---
Thanks,
Haiting Jiang (Github: Jason918)

[PIP] Add seek by index feature for consumer

2021-09-28 Thread JiangHaiting
Hi Pulsar Community,

I would like to propose a new feature for consumer, seek by index.
Currently we can reset the read position of a cursor by message id or 
timestamp. 
Since we formerly introduced index in broker metadata since 2.8.0, 
reset cursor by index is very useful in some use cases. 
eg. used in other protocol handler (KoP or RoP), 
or "users might want to seek to 1 messages before."

Here is the issue and PR for this PIP
https://github.com/apache/pulsar/issues/12234
https://github.com/apache/pulsar/pull/12032
---
Thanks,
Haiting Jiang (Github: Jason918)

Re: Correct semantics of producer close

2021-09-28 Thread Joe F
>I don't think that implementing `closeAsync` with graceful shutdown
logic implies a guarantee of message publishing. Rather, it guarantees
that failures will be the result of a real exception or a timeout.

I think that's beside the point. There is no definition of "real"
exceptions.   At that point the app is publishing on a best effort basis,
and there are no guarantees anywhere in client or server.

There is no concept  of  "maybe published". OR
"published-if-no_real_errors".  What does that even mean?  That is only a
can of worms which is going to add to developer confusion and lead to
Pulsar users finding in the worst possible way that something got lost
because it never got published.  It's a poor experience when you find it.
I have a real life experience where a user used async APIs (in a lambda),
which hummed along fine.  One day much later, the cloud had a hitch, and
they discovered a message was  not delivered.

I am more concerned about developers discovering at the worst possible time
that  ""published-if-no_real_errors"  is a concept.

My suggestion is to make this simple for developers.

The sync/async nature of the close() [ or any other API, for that
matter ]  is completely orthogonal to the API semantics, and is just a
programmatic choice to deal with  how resources are managed within the
program. That's not material here.---

A close() is an action that is shutting down the producer right now, not
even waiting for any acks of inflight messages. A willingness to lose
pending/inflight messages is explicit in that call.  The producer will  not
be around to deal with errors or to retry failed messages once close() is
invoked.

On the contrary, if the client does not want to deal with message loss,
then flush(), stick around to gather the acks, deal with errors and retries
etc and then do close() . Then close() will be just a resource management
action on the client.

So update the documentation to reflect that. ---> if close() is called on a
producer with messages pending acks, those messages are left indoubt. Avoid
all mention of flushes, best effort etc.  Users must buy into  uncertainty,
without any qualifications.

I would at all costs avoid using the term "graceful" anywhere.  That word
has specific semantics associated with it in the systems/storage domain ,
and what is being proposed here is nothing like that.

-j


On Tue, Sep 28, 2021 at 7:05 PM Yunze Xu 
wrote:

> It’s a good point that `ProducerImpl#failPendingBatchMessages` treats
> messages in batch container also as pending messages.
>
> I agree with your definition of "graceful close”. It’s more like a “at
> most once”
> semantics, like the original JavaDoc said
>
> > pending writes will not be retried
>
> Thanks,
> Yunze
>
> > 2021年9月29日 上午5:24,Michael Marshall  写道:
> >
> > Thanks for bringing this thread to the mailing list, Yunze.
> >
> > I think the right change is to update the `closeAsync` method to first
> > flush `batchMessageContainer` and to then asynchronously wait for the
> > `pendingMessages` queue to drain. We could add a new timeout or rely
> > on the already implemented `sendTimeout` config to put an upper time
> > limit on `closeAsync`. My reasoning as well as responses to Joe and
> > Yunze follow:
> >
> >> we need to define the behavior for how to process `pendingMessages`
> >> and `batchMessageContainer` when producer call `closeAsync`.
> >
> > Yes, this is exactly the clarification required, and I agree that the
> > Javadoc is ambiguous and that the implementation doesn't align with
> > the Javadoc.
> >
> > If we view the Javadoc as binding, then the fundamental question is
> > what messages are "pending". The `pendingMessages` seem pretty easy to
> > classify as "pending" given that they are already in flight on the
> > network.
> >
> > I also consider `batchMessageContainer` to be "pending" because a
> > client application already has callbacks for the messages in this
> > container. These callbacks are expected to complete when the batch
> > message delivery completes. Since the client application already has a
> > reference to a callback, it isn't a problem that the producer
> > implementation initiates the flush logic. (Note that the current
> > design fails the `pendingMessages` but does not fail the
> > `batchMessageContainer` when `closeAsync` is called, so the callbacks
> > for that container are currently left incomplete forever if the client
> > is closed with an unsent batch. We will need to address this design in
> > the work that comes from this discussion.)
> >
> > Further, the `ProducerImpl#failPendingMessages` method includes logic
> > to call `ProducerImpl#failPendingBatchMessages`, which implies that
> > these batched, but not sent, messages have been historically
> > considered "pending".
> >
> > If we view the Javadoc as non-binding, I think my guiding influence
> > for the new design would be that the `closeAsync` method should result
> > in a "graceful" shutdown of the client.