Re: [DISCUSS] Planning for Apache Pulsar 3.0

Massimiliano Mirelli Thu, 13 Oct 2022 05:01:18 -0700

Similarly to Lari, I hear your concerns about not breaking client APIs
compatibility, but I share his view of being playful about the changes.
IMO, this mindset is essential for brainstorming. When delivering we should
then do that responsibly and according to a plan. The plan is the one
ensuring that the compatibility is not at stake and describing the rollout
phases. So, I read Lari's thread as "Pulsar Community, let's *responsibly*
play".


But now, let's continue with the brainstorming. I am not sure my suggestion
is appropriate or whether pulsar-perf already (perhaps partially) supports
this, so feedback and / or pointers to relevant material are very
appreciated. In my wishlist of pulsar tooling is a pulsar-perf subcommand
allowing us to do more advanced E2E performance validation of the platform.
Basically, the development of E2E client-side metrics, for example:
1. percentile latencies
2. message loss
3. message re-ordering

Thank you,
Max

On Wed, 12 Oct 2022 at 13:18, Enrico Olivelli <eolive...@gmail.com> wrote:

> Il giorno mer 12 ott 2022 alle ore 00:40 Matteo Merli
> <matteo.me...@gmail.com> ha scritto:
> >
> > Agree, though let's make separate discussions. Putting all random
> > ideas into the same cauldron is a good recipe for making no one able
> > to follow or see a common line.
> >
> > That's what I meant when I started the proposal of having 3.0
> > completely detached from "features".
> >
> > If you start making a big container, you're going to fill it up with
> > all the "breaking changes" that you want to include, because "hey,
> > that's the only window of opportunity". That in turn is the surest way
> > to not ship anything for the next 24 months, as all the changes are
> > unavoidably going to get delayed and will take a long time to
> > stabilize.
> >
> > Going back to API breakages:
> >  1. We never break wire protocol compatibility
> >  2. We try to never break client API
> >  3. We need a very good & compelling reason in order to break client API
> >  4. When we do so, we need to provide a clear path for users (eg:
> > pulsar-client-1.x compatibly drop-in).
>
>
> I totally agree with Matteo and Joe,
> We cannot break compatibility.
>
> and if we need to introduce some new form of inter-broker communication
> protocol
> we must support the current protocol and provide a smooth upgrade path.
>
> Now there are many Pulsar clusters around the world that cannot
> tolerate stop-the-world upgrades
> and we MUST also allow some sort of rollback in case of problems.
>
>
>
> Enrico
>
> >
> >
> >
> >
> >
> >
> > --
> > Matteo Merli
> > <matteo.me...@gmail.com>
> >
> > On Tue, Oct 11, 2022 at 11:58 AM Dave Fisher <w...@apache.org> wrote:
> > >
> > > Let’s discuss any and all ideas for improvement. As each is discussed
> we can figure out how to make them non-breaking, We all want Pulsar to
> improve.
> > >
> > > We should encourage an open discussion where no idea is automatically
> bad or wrong. They can just be discussed without fear.
> > >
> > > Thanks,
> > > Dave
> > >
> > > > On Oct 10, 2022, at 3:05 PM, Joe F <joefranc...@gmail.com> wrote:
> > > >
> > > > I would prefer that we avoid using the term “breaking changes”,
> which is
> > > > too vague to convey any specific meaning. So let me try to bring some
> > > > clarity.
> > > >
> > > >
> > > > There have been many changes to implementations, APIs and data
> storage
> > > > formats in Pulsar (and book keeper also). I have deployed many of
> these
> > > > changes to production. And I know  that Matteo and Rajan  (and
> others too,
> > > > about  whom I’m not up to date  on) have implemented and deployed
> many such
> > > > changes.  But  none of those changes ever required taking the system
> > > > offline. NONE.
> > > >
> > > >
> > > > Pulsar was developed as a 24x7x365 system, and rolling upgrades and
> > > > rollbacks were a given. Like “this is water”,  there was no special
> callout
> > > > needed for declaring this reality. No change, including enhancements
> to
> > > > wire protocols, broke client compatibility.  Existing clients
> continued to
> > > > work; they may not be able to use all the new features. Use of new
> features
> > > > would require the app to be rebuilt anyway.  (Checksums, e2e
> encryption are
> > > > examples)
> > > >
> > > >
> > > > We have even succeeded in getting Pulsar adopted for some use
> cases,  just
> > > > because the complexity of upgrading from K’s old clients to new ones
> were
> > > > costly enough to allow consideration of an alternative like Pulsar.
> The
> > > > business cost of forcing a client upgrade can be significant,  to
> the point
> > > > of this being unviable for business.   That just cannot be
> hand-waved over
> > > >
> > > >
> > > > There have also been changes in storage formats(the ZK metadata
> change from
> > > > text to binary is an example). But through all such changes,
> compatibility
> > > > and upgradeability has been a given. There has never been a
> situation where
> > > > a live Pulsar upgrade was not possible, and   a coordinated  client
> upgrade
> > > > was mandatory.
> > > >
> > > >
> > > > So the question should not  be about whether “signifcant”  changes
> should
> > > > be made or not.  Changes can be made and released in a way that
> breaks
> > > > *business*, or  they can be made in a way that lets businesses sail
> > > > smoothly through that change. So the question is about  how such
> changes
> > > > gets rolled out.
> > > >
> > > >
> > > > And to that question, my strong opinion is that any change that does
> not
> > > > allow a live/rolling upgrade or rollback, or anything that forces a
> client
> > > > to upgrade just to continue functioning,   is a non-starter.   All
> changes
> > > > can be made in a compatible, phased manner, and in a way that does
> not
> > > > penalise older versions ( older versions doing worse  on new
> releases is
> > > > also not an acceptable way of making changes)  Changes can be made
> in a
> > > > manner that make l A/B testing possible by the user, with limited
> risk, and
> > > > then choosing to a not go back. It has all been done in Pulsar
> before.
> > > >
> > > >
> > > > Would that be harder than just breaking stuff? Yes.  But that is
> far more
> > > > preferable than forcing users to take a hit.
> > > >
> > > >
> > > > -joe
> > > >
> > > > On Sat, Oct 8, 2022 at 1:25 PM Rajan Dhabalia <rdhaba...@apache.org>
> wrote:
> > > >
> > > >> I would say first we should gather a list of changes which we want
> to
> > > >> target and find out which improvements really need major version
> release.
> > > >> We can take the Pulsar-1.0 to Pulsar-2.0 upgrade example to avoid
> major
> > > >> interruption and impact on existing systems and still achieve our
> goal. So,
> > > >> the first step is discovery of such features and then we can
> discuss how to
> > > >> introduce them in Pulsar with minimum impact on existing systems.
> > > >>
> > > >> Thanks,
> > > >> Rajan
> > > >>
> > > >> On Sat, Oct 8, 2022 at 1:05 PM Devin Bost <devin.b...@gmail.com>
> wrote:
> > > >>
> > > >>> I'm noticing some pushback on the idea of pre-emptively proposing
> any
> > > >> kind
> > > >>> of breaking upgrade that would necessitate cutting a 3.0 release.
> > > >>> I do understand the concern about introducing a breaking change...
> For a
> > > >>> distributed messaging application like Pulsar, if clients needed
> to be
> > > >>> simultaneously upgraded with brokers, that could be extremely
> difficult
> > > >> or
> > > >>> infeasible for companies to coordinate without treating it like a
> > > >> migration
> > > >>> to a new technology.
> > > >>>
> > > >>> At the same time, do we want to be completely closed to the
> possibility
> > > >>> that a breaking change could be required at some point in the
> future? If
> > > >> a
> > > >>> circumstance like that appears, those are the kinds of situations
> that
> > > >> can
> > > >>> lead to a fork. Are there certain kinds of breaking changes that
> are more
> > > >>> acceptable than others?
> > > >>>
> > > >>> Also, if the forward looking plan is to never introduce breaking
> changes,
> > > >>> when *would* we ever cut a Pulsar 3.x release?  Do we have any
> criteria
> > > >> on
> > > >>> what kinds of changes would necessitate cutting a new major
> release but
> > > >>> would still be considered acceptable by the community?
> > > >>>
> > > >>> --
> > > >>> Devin Bost
> > > >>> Sent from mobile
> > > >>> Cell: 801-400-4602
> > > >>>
> > > >>> On Sat, Oct 8, 2022, 2:14 PM Rajan Dhabalia <rdhaba...@apache.org>
> > > >> wrote:
> > > >>>
> > > >>>> This sounds like the current state of Apache Pulsar has a lot of
> issues
> > > >>> and
> > > >>>> it requires fundamental design changes to make it promising which
> is
> > > >>>> definitely not true and I disagree with it. And I would be careful
> > > >>>> comparing with Kafka as I still don't think the Kafka release has
> > > >>> anything
> > > >>>> to do with Pulsar's improvement. I would still recommend to list
> down
> > > >> all
> > > >>>> the changes at one place so we can bring everyone on the same
> page.
> > > >>> discuss
> > > >>>> as a community and we make sure existing usecases continue using
> Pulsar
> > > >>> and
> > > >>>> not try to find Pulsar alternatives with incorrect disruption
> > > >> impression
> > > >>>> and efforts they might have to put to upgrade or maintain pulsar.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Rajan
> > > >>>>
> > > >>>> On Fri, Oct 7, 2022 at 7:49 PM Lari Hotari <lhot...@apache.org>
> wrote:
> > > >>>>
> > > >>>>> We could all have our own favorite names for this work. :)
> > > >>>>>
> > > >>>>> There's advice that you should disrupt yourself before someone
> > > >> disrupts
> > > >>>>> you.
> > > >>>>> Shouldn't we follow that advice for Apache Pulsar? We can disrupt
> > > >>> Pulsar
> > > >>>>> together with our Apache hats on. The catch is that since we are
> > > >> doing
> > > >>>>> this, we will be able to learn and improve Pulsar so that we stay
> > > >> ahead
> > > >>>> of
> > > >>>>> competition. Pulsar was long ways ahead of competition for so
> many
> > > >>> years,
> > > >>>>> but Kafka is finally catching up. Did Kafka surpass Pulsar in
> some
> > > >>>> aspects
> > > >>>>> with the recent 3.3 release, where Kraft became GA? That's a
> question
> > > >>>> that
> > > >>>>> many might be asking. Why wouldn't we rev up Pulsar's engine and
> show
> > > >>> the
> > > >>>>> tail lights to Kafka?
> > > >>>>>
> > > >>>>> We don't have to have deadlines or any restrictions like that
> right
> > > >>> now.
> > > >>>>> The sky's the limit.
> > > >>>>> Linus Torvalds has written a book called "Just for fun". I got my
> > > >> copy
> > > >>> of
> > > >>>>> this book signed by Linus himself in year 2000 at an event that
> the
> > > >>> book
> > > >>>>> publisher had organized in Finland.
> > > >>>>>
> > > >>>>> What if we did this "just for fun"? The intention could also be
> to
> > > >> beat
> > > >>>>> Kafka, but that could be a boring goal for many. What if we could
> > > >>> unleash
> > > >>>>> some talent that is among us and hasn't had a chance to show its
> full
> > > >>>>> potential? Opensource is about joy. It is about welcoming
> everyone to
> > > >>>> join.
> > > >>>>> Opensource should be egoless, although we must all admit that we
> > > >> don't
> > > >>>>> succeed in that aspect. We must fight our biases.
> > > >>>>>
> > > >>>>> Jarek Potiuk explains the importance of being welcoming for
> success
> > > >> at
> > > >>>>> Apache, in a 3-minute YouTube interview:
> > > >>>>> https://www.youtube.com/watch?v=Dx5kQnVFo7E
> > > >>>>> This interview is about Jarek's blog post "Success at Apache:
> > > >> Welcoming
> > > >>>>> communities strengthens the Apache way":
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> https://news.apache.org/foundation/entry/success-at-apache-welcoming-communities
> > > >>>>> I was pleased to meet Jarek at ApacheCon among so many other
> > > >> welcoming
> > > >>>>> personalities of the Apache community and the Apache Pulsar
> > > >> community.
> > > >>>>>
> > > >>>>> Goals have to be ambitious. What if we set the bar really high?
> > > >>>>> Apache Pulsar with 10 million topics in a cluster?
> > > >>>>> Why not go up to 100 million topics?
> > > >>>>> Just for fun. :)
> > > >>>>>
> > > >>>>> -Lari
> > > >>>>>
> > > >>>>> On 2022/10/07 22:53:59 Matteo Merli wrote:
> > > >>>>>> I actually disagree with the term "Pulsar Next Gen", because I
> > > >>> haven't
> > > >>>>>> seen any proposal for which that would make sense to me to be
> > > >> called
> > > >>>>>> so.
> > > >>>>>>
> > > >>>>>> Rajan: That's the whole point of breaking it down. If you
> > > >> accumulate
> > > >>>>>> many "big" changes it introduces a lot of risk for instabilities
> > > >> and
> > > >>>>>> incompatibilities. Breaking it down in multiple steps helps to
> see
> > > >>> the
> > > >>>>>> incremental changes and introduced them in a phased manner.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Matteo Merli
> > > >>>>>> <matteo.me...@gmail.com>
> > > >>>>>>
> > > >>>>>> On Fri, Oct 7, 2022 at 3:37 PM Rajan Dhabalia <
> > > >> rdhaba...@apache.org>
> > > >>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> Can we get the list of changes at one place which we are
> planning
> > > >>> to
> > > >>>>> get as
> > > >>>>>>> part of 3.0. One thing I would like to see as a part of a major
> > > >>>>> release, it
> > > >>>>>>> CAN NOT impact existing usecases and users in any way which can
> > > >>> force
> > > >>>>> them
> > > >>>>>>> to upgrade the client library. Applications using < 3.0 version
> > > >>>> should
> > > >>>>>>> continue getting all the client and server side enhancements
> and
> > > >>> bug
> > > >>>>> fixes.
> > > >>>>>>> Failing to provide bug-fixes and features to client < 3.0 means
> > > >> we
> > > >>>> are
> > > >>>>>>> forcing them to upgrade client version by putting efforts to
> > > >> handle
> > > >>>> all
> > > >>>>>>> incompatibility. and that's something we should definitely
> > > >> prevent
> > > >>>>> because
> > > >>>>>>> Apache Pulsar is used by many large scale business usecases and
> > > >> we
> > > >>>>> should
> > > >>>>>>> accommodate and motivate them to continue using Apache Pulsar.
> > > >>>>>>> I understand as a Pulsar community we should always try to
> > > >> progress
> > > >>>> and
> > > >>>>>>> build better but not at the cost of losing or reducing the
> Apache
> > > >>>>> Pulsar
> > > >>>>>>> community.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Rajan
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Fri, Oct 7, 2022 at 12:41 PM Lari Hotari <
> lhot...@apache.org>
> > > >>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thank you, Matteo. I agree that features should be delivered
> > > >>>>> continuously
> > > >>>>>>>> when that is possible. In this case, that might not apply.
> > > >>>>>>>>
> > > >>>>>>>> I also agree that calling this Pulsar 3.0 isn't necessarily
> > > >>> aligned
> > > >>>>> with
> > > >>>>>>>> PIP-175 since an LTS release is when the major version is
> > > >> bumped.
> > > >>>>> I'm fine
> > > >>>>>>>> in calling this "Pulsar Next Gen" or something that calls out
> > > >>> that
> > > >>>>> this is
> > > >>>>>>>> planning for making a major leap in Pulsar.
> > > >>>>>>>>
> > > >>>>>>>> There are several unresolved issues with PIP-45 and the Pulsar
> > > >>> Load
> > > >>>>>>>> balancer. The previously referred email threads contain a lot
> > > >> of
> > > >>>>> context to
> > > >>>>>>>> this. Resolving the issues efficiently will most likely result
> > > >> in
> > > >>>>> breaking
> > > >>>>>>>> changes, which will be the reason why it deserves a major
> > > >> version
> > > >>>>> upgrade.
> > > >>>>>>>>
> > > >>>>>>>> We have discussed it before that it's crucial to have a path
> to
> > > >>>>> migrate
> > > >>>>>>>> users when there are breaking changes. This should be covered
> > > >> in
> > > >>>> any
> > > >>>>> of the
> > > >>>>>>>> solutions that are introduced. Optimally, users of Pulsar
> would
> > > >>> be
> > > >>>>> able to
> > > >>>>>>>> upgrade seamlessly to Pulsar Next Gen / Pulsar 3.0, but
> rolling
> > > >>>> back
> > > >>>>> might
> > > >>>>>>>> not be directly supported.
> > > >>>>>>>>
> > > >>>>>>>> I am welcoming everyone to join this planning for the Apache
> > > >>> Pulsar
> > > >>>>> Next
> > > >>>>>>>> Gen architecture. Please check the first email in this thread
> > > >> for
> > > >>>>> details
> > > >>>>>>>> of context, and start participating and contributing today.
> The
> > > >>>> best
> > > >>>>> way to
> > > >>>>>>>> contribute is to participate in the email threads, since they
> > > >>>> contain
> > > >>>>>>>> details with better context.
> > > >>>>>>>>
> > > >>>>>>>> -Lari
> > > >>>>>>>>
> > > >>>>>>>> On 2022/10/07 18:03:00 Matteo Merli wrote:
> > > >>>>>>>>> Given the past experiences and the discussions that already
> > > >>>>> happened
> > > >>>>>>>>> around "PIP-175: Extend time based release process", the idea
> > > >>> is
> > > >>>> to
> > > >>>>>>>>> detach the 3.0 from "big-features" items or "incompatible
> > > >>>> changes".
> > > >>>>>>>>>
> > > >>>>>>>>> The changes are going to get included as they are ready,
> > > >> within
> > > >>>>>>>>> feature releases, and in a fully compatible way. We don't
> > > >> need
> > > >>> to
> > > >>>>>>>>> group them together and create unnecessary risk for the
> > > >> release
> > > >>>>>>>>> schedule and the users.
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Matteo Merli
> > > >>>>>>>>> <matteo.me...@gmail.com>
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Oct 7, 2022 at 10:47 AM Lari Hotari <
> > > >>> lhot...@apache.org>
> > > >>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Hi all,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Greeting from ApacheCon North America 2022 from New
> > > >> Orleans!
> > > >>>>>>>>>> We had a great conference with a dedicated Pulsar track.
> > > >>> Thanks
> > > >>>>> to all
> > > >>>>>>>> presenters and everyone who attended. The talks weren't
> > > >> recorded,
> > > >>>>> but the
> > > >>>>>>>> slides will be later on posted on the conference website [1].
> > > >>>>>>>>>>
> > > >>>>>>>>>> At ApacheCon there were several presentations about "the
> > > >>> Apache
> > > >>>>> way"
> > > >>>>>>>> and what that means in practice. Based on that, we all know
> > > >> that
> > > >>> no
> > > >>>>> person
> > > >>>>>>>> is nominated as the CTO of Apache Pulsar who decides on Pulsar
> > > >>> 3.0
> > > >>>>> and when
> > > >>>>>>>> that happens. It's us, the community, that serve that role
> > > >>>> together.
> > > >>>>> We
> > > >>>>>>>> come together as individuals with the Apache hat on. Everyone
> > > >> is
> > > >>>>> equal in
> > > >>>>>>>> the community, regardless of whether they are contributors,
> > > >>>>> committers or
> > > >>>>>>>> PMC members.
> > > >>>>>>>>>> We welcome everyone to participate. The small detail about
> > > >>>> voting
> > > >>>>>>>> shouldn't stop anyone from participating in any aspects of the
> > > >>>>> planning for
> > > >>>>>>>> the roadmap.
> > > >>>>>>>>>>
> > > >>>>>>>>>> I'll like to get the discussions going for Pulsar 3.0. We
> > > >>> don't
> > > >>>>> need a
> > > >>>>>>>> separate decision to start planning that. Please correct me if
> > > >>> I'm
> > > >>>>> wrong or
> > > >>>>>>>> if you have a different opinion.
> > > >>>>>>>>>>
> > > >>>>>>>>>> There are a few previous discussion threads that are
> > > >> related
> > > >>> to
> > > >>>>> Pulsar
> > > >>>>>>>> 3.0 planning.
> > > >>>>>>>>>> If you are interested in getting involved with Apache
> > > >> Pulsar
> > > >>>> 3.0
> > > >>>>>>>> planning, I think that it makes sense for you to read these
> > > >>> threads
> > > >>>>>>>> carefully and reply to them. Please also suggest what you
> think
> > > >>>>> makes sense.
> > > >>>>>>>>>>
> > > >>>>>>>>>> PIP-45 related:
> > > >>>>>>>>
> > > >> https://lists.apache.org/thread/tvco1orf0hsyt59pjtfbwoq0vf6hfrcj
> > > >>>>>>>>>> Pulsar Load balancer / namespace bundle related:
> > > >>>>>>>>>>
> > > >>>> https://lists.apache.org/thread/roohoc9h2gthvmd7t81do4hfjs2gphpk
> > > >>>>>>>>>> renaming topics:
> > > >>>>>>>>>>
> > > >>>> https://lists.apache.org/thread/vrr75rrh4trqlp14objh3snlfvmzdrp2
> > > >>>>>>>>>> backpressure:
> > > >>>>>>>>
> > > >> https://lists.apache.org/thread/v7xy57qfzbhopoqbm75s6ng8xlhbr2q6
> > > >>>>>>>>>>
> > > >>>>>>>>>> Long list of Metadata inconsistency issues by Zac Bentley:
> > > >>>>>>>>>> https://github.com/apache/pulsar/issues/12555
> > > >>>>>>>>>> That would be a good starting point to understanding the
> > > >> data
> > > >>>>>>>> inconsistency issues related to current PIP-45 design. Perhaps
> > > >>>> those
> > > >>>>> could
> > > >>>>>>>> be addressed already before Pulsar 3.0?
> > > >>>>>>>>>>
> > > >>>>>>>>>> I'm looking forward to everyone's participation in the
> > > >> Apache
> > > >>>>> Pulsar
> > > >>>>>>>> 3.0 planning discussions.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best Regards,
> > > >>>>>>>>>>
> > > >>>>>>>>>> -Lari
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1 - https://www.apachecon.com/acna2022/schedule.html
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
>

Re: [DISCUSS] Planning for Apache Pulsar 3.0

Reply via email to