Thanks Michael for sending out the notes. Recording is available here:
https://streamnative.zoom.us/rec/share/Eg2E7WfSOfPaHMdSphlrP-fN2NBjh4aT06eVTxv6TbBk4ujTltCcPNvq9kwHqMT4.mBdaRHY5eUXJM5bz
Passcode: .H?wa4WM


--
Matteo Merli
<matteo.me...@gmail.com>

On Thu, Apr 14, 2022 at 10:27 AM Michael Marshall <mmarsh...@apache.org> wrote:
>
> Hi Pulsar Community,
>
> Below are the meeting notes from today's community meeting.
>
> Disclaimer: I am the primary author of these notes. I took the notes
> while participating in the meeting discussions. It is possible that I
> missed or misunderstood information. If something is misattributed or
> misrepresented, please send a correction to this list and consider
> updating the Google doc.
>
> Source google doc:
> https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE
>
> Thanks,
> Michael
>
> 2022/04/14, (8:30 AM PST)
> -   Attendees:
> -   Matteo Merli
> -   Enrico Olivelli
> -   Andrey Yegorov
> -   Michael Marshall
> -   Dave Fisher
> -   Lari Hotari
> -   Massimiliano Mirelli
> -   Chris Bartholomew
> -   Hang Chen
> -   Aaron Williams
> -   Nicolò Boschi
> -   Leolinchen
> -   Penghui Li
>
> -   Discussions
>
> -   Enrico: 2.10 release process. Took a while. Do we want to talk
> about this? For 2.11, we should try to apply the new process. Matteo:
> 3 months from now we can release 2.11, we’ll create the branch in 2
> months. Matteo plans to set a date (by discussion on the mailing list)
> and wants more scrutiny on the mailing list. Dave: we should slow down
> cherry picking to 2.8 and 2.9, as well. Enrico: we are finding many
> fixes though, and for example, 2.8 has many users and many bug fixes.
> The cherry picked commits are all bug fixes. Michael: we should add
> some documentation about this to help new committers. Matteo: this
> documentation would help inform contributors too. Dave: where should
> we put this? Website? Matteo: we could also put it in the PR template.
>
> -   Michael: is 2.7.5 the last 2.7 release? Matteo: could keep it open
> for security bug fixes, like log4shell type fixes. Lari: 2.7.5 rc 1
> has test failures, so we’ll need an rc 2. The tests that are failing
> on 2.7.5 are passing on 2.7.4. Matteo: thinking through LTS and the
> cost of users to do the upgrades. There is a tension between shipping
> new features and how frequently users have to upgrade. One issue: the
> upgrade/downgrade compatibility is only guaranteed for one minor
> version. An LTS could help to support those users without adding
> features. We could offer guarantees from one LTS to the next LTS. We’d
> define support so users could stick with a version without worrying
> about getting left behind. What if we did 3.0 and 4.0 and so on are
> LTS, then 3.x is just for features? The guarantee then is that you can
> go 3.x to 4.0. Dave: what about for current users using the 2.x
> versions? Matteo: we can discuss how to deal with existing versions,
> but we also need to figure out our preferred long term solution for
> how to work in the future. Dave: I like the idea of guaranteeing
> upgrade paths. Matteo: we could try to set a timeline for major
> releases, not just for minor releases, e.g. every 2 years for a major
> release. Discusses reasons for major releases and the nuance for how
> we could use this. Dave: are bookkeeper upgrade and transactions the
> major upgrade? Matteo: I didn’t have any feature in mind. I want to
> give people an upgrade path and create clarity. Michael: clarifies
> that you could upgrade from 3.0 to 4.0 then downgrade and it’d work.
> Matteo: yes. Feature defaults won’t be able to change because of this.
> Dave: relates well to creating a road map and telling people what is
> coming. Enrico: creating a road map is very hard in open source. We
> commit things that people contribute. In the ASF projects that I work,
> contributions are hard to predict. Matteo: I agree it is hard to know.
> These major releases would be loosely timed. For example, auto
> partitioning is a major feature, but it is a bunch of work.
> Unpredictability is bad for the users. Michael: and you don’t want to
> create a hard upgrade path. Is it possible to use geo-replication (or
> something like it) to migrate clusters to simplify upgrades? Matteo:
> there was a green-blue deployment work in progress proposal to spin up
> a new cluster to slow migrate producers and consumers to new cluster.
> The coordination would be topic termination to switch new cluster. Not
> sure that it is a general solution. Michael: how would breaking
> changes work for the major version upgrade? Matteo: we would do a
> compatibility layer. Also, the pulsar protocol hasn’t broken, and we
> version the api in such a way that the broker/client determine if the
> peer supports that feature.
>
> -   PRs
>
> -   Lari: Merged PR (https://github.com/apache/pulsar/pull/15067) to
> fix ManagedCursorImpl’s mark delete update logic, but asked for
> Matteo’s review. Lari plans to add more tests in the coming weeks to
> catch regressions associated with the change.
>
> -   Andrey: https://github.com/apache/pulsar/pull/15142 WIP pulsar +
> bk 4.15-ish. Requests review of preliminary work, mentions that there
> is a test failure he’s still investigating. Switched CI to use
> Bookkeeper 4.16-SNAPSHOT to identify needed changes. Worked on tests
> that broke. Some test classes were copied from bookkeeper, so he
> replaced those with copy/pasted new ones. The work is iterative, and
> there are still tests failing. Discussion with Matteo about tradeoffs
> for test base classes and ways to improve testing classes. Matteo says
> don’t worry about synchronizing tests between Pulsar/Bookkeeper. The
> test utilities in bookkeeper are different. Pulsar testing assumes
> that bookkeeper works and are meant to test usage of bookkeeper.
> Matteo: how far do you think you are from completion? Andrey: hard to
> say, tests are passing locally, but failing on remote CI.
>
> -   Hang: https://github.com/apache/pulsar/issues/15111 Bookie lost
> data when skip write journal, Hang Chen says he has seen this many
> times in production. Enrico: if you don’t write to journal, this is a
> possible behavior. The next bookkeeper release will include a code
> change. Andrey: if you want to run without journal, increase write
> quorum. Matteo: use different racks to increase durability and
> decrease chance of catastrophic failure. Enrico: there are some
> problems in bk protocol, even if you have multiple replicas, you are
> going to lose data. 4.15 includes a change to the protocol for how the
> bookkeeper responds. This improves a fix for a specific edge case. The
> only fix is to upgrade. Andrey: reminder that 4.15 is in the process
> of being released. Matteo: is there any failure that happened during
> this time? Hang Chen: no failure during this time. Enrico: during
> recovery, the recovery tries to find missing entries in the ledger.
> Went on to discuss technical details of the improvement for 4.15.
> Matteo: the error appears strange, and the missing entries don’t seem
> to make sense. Mentions that rebuilding the index could be helpful.
> (Missed some technical details about bookkeeper, see issue for more
> context and discussion.)

Reply via email to