> However, with this optional field approach, we don't need to introduce a
> new protocol version, which could be arguably beneficial.

I don't see versioning the protocol as a negative as long as there is
sufficient benefit. In this case, the new command serves an explicit
purpose to make the protocol more reactive (the client knows what is
happening as it happens without any need to poll) and less wasteful
(no unnecessary data transfer).

One reason I want to push on this design is because I think it will be
harder to change once it is in place. Adding the new broker URL to the
close producer and close consumer commands necessarily couples the
events. Using separate commands allows them to be decoupled, and in my
opinion, that matches the reality of the problem. We know we want to
close the producer/consumer before we know where they should connect
next.

> I will leave the decision to the community.

I do not speak for the community (hopefully that's obvious), but I am
a community member advocating for a slightly more complex solution
that, in my opinion, will solve one part of the transfer problem. :) I
don't think anyone is going to say "you have to use new commands". I
think the goal of this PIP discussion is to come to a consensus so we
can move forward confidently.

> I will be replying to the PR.

Thanks, that discussion has been very productive!

Thanks,
Michael

On Thu, Aug 10, 2023 at 7:29 PM Heesung Sohn
<heesung.s...@streamnative.io.invalid> wrote:
>
> On Tue, Aug 8, 2023 at 9:56 PM Michael Marshall <mmarsh...@apache.org>
> wrote:
>
> > Thanks for your proposal, Heesung. Sorry for my late response. Just
> > want to make a few points to hopefully improve the implementation.
> > Overall, I think this feature is absolutely the right direction for
> > Pulsar and will be a great improvement. I remember discussing this at
> > some community meetings last year. See the last paragraph of [0].
> >
> > In general, my primary question is about the protocol changes you're
> > proposing, both the commands and the way the commands are sent.
> >
> > > I am worried about the backward compatibility if we add a new proto
> > message
> > > -- how can the old versions of clients handle this new proto message?
> >
> > The protocolVersion [1] object in the proto definition is how we
> > guarantee backwards and forwards compatibility for all combinations of
> > brokers/clients. The client and the broker each share their version
> > during the handshake and then they only use commands known to both
> > peers.
> >
> > We can introduce a new command here if we want/need. The nuance is
> > that the server logic will branch depending on the client's
> > protocolVersion.
> >
>
> Good Point. We can introduce new commands here instead of the optional
> fields by branching the protocolVersion.
> If the community thinks the new commands are cleaner, I can create new
> commands for this, as the protocolVersion guarantees the
> backward-compatbility.
> However, with this optional field approach, we don't need to introduce a
> new protocol version, which could be arguably beneficial.
> I will leave the decision to the community. Let me know if I have to use
> new commands and increase the protocolVersion for this change.
>
>
>
> > Next, I am wondering if there are any opportunities to decrease the
> > latency even more than are proposed in the PIP by introducing some
> > concurrency to decouple operations like creating new connections and
> > initializing resources used by each topic (like a managed ledger).
> > Because I used a mermaid diagram, I put the majority of this point in
> > a PR comment [2].
> >
>
>  I will be replying to the PR.
>
>
> > Thanks,
> > Michael
> >
> > [0] https://lists.apache.org/thread/ghlsprmqgkt8lfv634groy93c5dm1k1g
> > [1]
> > https://github.com/apache/pulsar/blob/15453964f6e6086b0c5e34736958cf749f16f5e1/pulsar-common/src/main/proto/PulsarApi.proto#L242-L268
> > [2] https://github.com/apache/pulsar/pull/20748#discussion_r1287927655
> >
> > On Fri, Aug 4, 2023 at 3:41 PM Heesung Sohn
> > <heesung.s...@streamnative.io.invalid> wrote:
> > >
> > > Hi,
> > >
> > > I updated the PIP to add more details about the recent discussion.
> > >
> > > Thanks,
> > > Heesung
> > >
> > > On Thu, Aug 3, 2023 at 6:46 PM Heesung Sohn <
> > heesung.s...@streamnative.io>
> > > wrote:
> > >
> > > >
> > > >
> > > > On Wed, Aug 2, 2023 at 8:02 PM PengHui Li <peng...@apache.org> wrote:
> > > >
> > > >> > One of the advantages of this proposal (add these optional
> > dstBrokerUrl
> > > >> fields in the existing CommandCloseProducer and CommandCloseConsumer)
> > is
> > > >> the backward compatibility
> > > >>
> > > >> Ah, yes. I missed this point. It's a good convince.
> > > >>
> > > >> > After the client connects to the destination broker, the next
> > > >> command(e.g.
> > > >> > ProducerCommand) requires
> > > >> > the destination broker to check the ownership again against its
> > local
> > > >> table
> > > >> > view of the bundle state channel.
> > > >>
> > > >> > Upon this local ownership check, there could be the following
> > scenarios:
> > > >>
> > > >> > Happy case:
> > > >> > - If the ownership state is `owned ` and the current broker is
> > indeed
> > > >> the
> > > >> > owner, the command completes.
> > > >> > Unhappy cases:
> > > >> > - If the ownership state is `owned ` and the current broker is not
> > the
> > > >> > owner, the command fails
> > > >> > (the broker returns an error to the client), and the client tries to
> > > >> find
> > > >> > the true new owner by lookups.
> > > >> > The global bundle state channel is eventually consistent, and the
> > > >> lookups
> > > >> > should eventually converge.
> > > >> > - if the ownership change is still in progress(`releasing`,
> > > >> `assigning`),
> > > >> > this check will be deferred
> > > >> > until the state becomes `owned` with a timeout.
> > > >>
> > > >> Could you please also update the explanation in the proposal?
> > > >> It looks like what is guaranteed and what is not in the new proposal.
> > > >> And why the mechanism can work with the unhappy case.
> > > >>
> > > >> Both the existing mechanism and the new mechanism follow the
> > > >> eventual consistency. The nuance is that the new mechanism will change
> > > >> the bundle owner first and then unload the topics. For the broker
> > logs,
> > > >> will more error messages related to the fenced ledger be introduced?
> > > >> Or the client reconnects will happen after the owner cache is updated?
> > > >>
> > > >>
> > > > As described in the protocol change in the PIP, the goal is to have
> > fewer
> > > > fenced ledger errors because the protocol change helps the client
> > > > reconnection(open ledger)
> > > >  happen at the very end.
> > > >
> > > >
> > > >> If the load manager crashed before unloading the topics (close
> > producers
> > > >> and consumers).
> > > >> Will the old broker provide service for this topic until the client
> > > >> triggers a
> > > >> new lookup?
> > > >>
> > > >
> > > > Sorry, I am not sure if I understood your question here, but
> > > > When the destination or source broker crashes in the middle,
> > > > the leader will find the orphan state and clean the ownership by
> > selecting
> > > > a new owner, and the client will reconnect to it.
> > > > During this transfer process, if alive, the source broker will serve
> > the
> > > > topic according to the protocol described in the PIP.
> > > >
> > > >
> > > >> Sorry, Another question about PIP-192. Is it required by the new
> > > >> mechanism?
> > > >> Will the situation also be improved without enabling PIP-192?
> > > >> We have added the new owner broker to the close producer and consumer
> > > >> command.
> > > >> We can also update the owner and then close the producer and consumers
> > > >> with
> > > >> the new owner.
> > > >>
> > > >
> > > > This PIP is based on PIP-192 because the proposed protocol change is
> > based
> > > > on the bundle states from PIP-192.
> > > > I assume the community could raise other PIPs to utilize the close
> > > > producer and consumer command changes for other logic too.
> > > >
> > > >
> > > >>
> > > >> Regards,
> > > >> Penghui
> > > >>
> > > >> On Thu, Aug 3, 2023 at 4:38 AM Heesung Sohn
> > > >> <heesung.s...@streamnative.io.invalid> wrote:
> > > >>
> > > >> > Hi PengHui,
> > > >> > Thank you for your questions. Please find my answers inline.
> > > >> >
> > > >> > On Wed, Aug 2, 2023 at 3:47 AM PengHui Li <peng...@apache.org>
> > wrote:
> > > >> >
> > > >> > > Hi Heesung,
> > > >> > >
> > > >> > > I'm sorry for not getting back to you sooner.
> > > >> > > The motivation and the solution look good to me.
> > > >> > >
> > > >> > > I just want to leave some comments to make the proposal clear
> > > >> > > for users and developers.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > ----------------------------------------------------------------------------------------------------
> > > >> > >
> > > >> > > We should have a section to describe the values that will be added
> > > >> > > by this proposal.
> > > >> > >
> > > >> > > - Reduced the publish latency spike and e2e latency spike during
> > the
> > > >> > > reconnecting
> > > >> > > - Mitigated bottleneck(lookup) of a large number of topics in a
> > > >> cluster
> > > >> > >
> > > >> > > Maybe more, I can only think about the above two values.
> > > >> > >
> > > >> > >
> > > >> > Good point. I will clarify this part in the PIP.
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > ----------------------------------------------------------------------------------------------------
> > > >> > >
> > > >> > > For the proto changes, the client side can also use the HTTP
> > lookup
> > > >> > > service.
> > > >> > > We should also mention it in this proposal. Instead of adding
> > four new
> > > >> > > fields to
> > > >> > > the CommandCloseProducer and CommandCloseConsumer, maybe adding a
> > new
> > > >> > > message BrokerIdentifier, or others is better.
> > > >> > >
> > > >> >
> > > >> > I am worried about the backward compatibility if we add a new proto
> > > >> message
> > > >> > -- how can the old versions of clients handle this new proto
> > message?
> > > >> >
> > > >> > One of the advantages of this proposal (add these optional
> > dstBrokerUrl
> > > >> > fields in the existing CommandCloseProducer and
> > CommandCloseConsumer) is
> > > >> > the backward compatibility -- older clients can still work in the
> > older
> > > >> way
> > > >> > by going through the lookup requests.
> > > >> >
> > > >> > Ref: https://protobuf.dev/overview/#updating-defs
> > > >> > Updating Proto Definitions Without Updating Code
> > > >> > <https://protobuf.dev/overview/#updating-defs>
> > > >> >
> > > >> > It’s standard for software products to be backward compatible, but
> > it is
> > > >> > less common for them to be forward compatible. As long as you follow
> > > >> > some simple
> > > >> > practices <https://protobuf.dev/programming-guides/proto3/#updating
> > >
> > > >> when
> > > >> > updating .proto definitions, old code will read new messages without
> > > >> > issues, ignoring any newly added fields. ...
> > > >> >
> > > >> >
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > ----------------------------------------------------------------------------------------------------
> > > >> > >
> > > >> > > There are two points we need to clarify in the proposal.
> > > >> > >
> > > >> > > - If the client reconnects before the new owner takes ownership.
> > If
> > > >> the
> > > >> > > producer closed
> > > >> > >   after the new broker took ownership, would it be a potential
> > > >> > consistency
> > > >> > > issue?
> > > >> >
> > > >> > - If the owner changed again before the client reconnection. How did
> > > >> Pulsar
> > > >> > > handle this case?
> > > >> > >    As I understand, it's fine because the broker will check the
> > > >> ownership
> > > >> > > again when handling
> > > >> > >    producer and consumer creation. If it can be explained in the
> > > >> > proposal,
> > > >> > > it should be great
> > > >> > >    for developers to understand it deeply.
> > > >> > >
> > > >> >
> > > >> > After the client connects to the destination broker, the next
> > > >> command(e.g.
> > > >> > ProducerCommand) requires
> > > >> > the destination broker to check the ownership again against its
> > local
> > > >> table
> > > >> > view of the bundle state channel.
> > > >> >
> > > >> > Upon this local ownership check, there could be the following
> > scenarios:
> > > >> >
> > > >> > Happy case:
> > > >> > - If the ownership state is `owned ` and the current broker is
> > indeed
> > > >> the
> > > >> > owner, the command completes.
> > > >> > Unhappy cases:
> > > >> > - If the ownership state is `owned ` and the current broker is not
> > the
> > > >> > owner, the command fails
> > > >> > (the broker returns an error to the client), and the client tries to
> > > >> find
> > > >> > the true new owner by lookups.
> > > >> > The global bundle state channel is eventually consistent, and the
> > > >> lookups
> > > >> > should eventually converge.
> > > >> > - if the ownership change is still in progress(`releasing`,
> > > >> `assigning`),
> > > >> > this check will be deferred
> > > >> > until the state becomes `owned` with a timeout.
> > > >> >
> > > >> >
> > > >> > > Regards,
> > > >> > > Penghui
> > > >> > >
> > > >> > > On Wed, Aug 2, 2023 at 1:18 AM Heesung Sohn
> > > >> > > <heesung.s...@streamnative.io.invalid> wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > It has been over three weeks since this discussion started, and
> > I
> > > >> > replied
> > > >> > > > to the questions in the PIP.
> > > >> > > > I will send out a vote email tomorrow if looking good.
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > > Heesung
> > > >> > > >
> > > >> > > > On Tue, Jul 25, 2023 at 1:12 PM Heesung Sohn <
> > > >> > > heesung.s...@streamnative.io
> > > >> > > > >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Hi,
> > > >> > > > >
> > > >> > > > > I replied to the comments from Kai and Ran in the PIP.
> > > >> > > > >
> > > >> > > > > Please review the PIP by any chance.
> > > >> > > > >
> > > >> > > > > Thank you,
> > > >> > > > > Heesung
> > > >> > > > >
> > > >> > > > > On Thu, Jul 6, 2023 at 4:32 PM Heesung Sohn <
> > > >> > > > heesung.s...@streamnative.io>
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > >> Hi dev,
> > > >> > > > >>
> > > >> > > > >> I proposed this PIP,
> > https://github.com/apache/pulsar/pull/20748
> > > >> ,
> > > >> > to
> > > >> > > > >> make unloaded clients directly and gracefully connect to new
> > > >> owner
> > > >> > > > brokers
> > > >> > > > >> without lookups.
> > > >> > > > >>
> > > >> > > > >> Please take a look and let me know what you think.
> > > >> > > > >>
> > > >> > > > >> Regards,
> > > >> > > > >> Heesung
> > > >> > > > >>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> >

Reply via email to