Thanks Michael for sending out the notes. Recording is available here: https://streamnative.zoom.us/rec/share/Eg2E7WfSOfPaHMdSphlrP-fN2NBjh4aT06eVTxv6TbBk4ujTltCcPNvq9kwHqMT4.mBdaRHY5eUXJM5bz Passcode: .H?wa4WM
-- Matteo Merli <matteo.me...@gmail.com> On Thu, Apr 14, 2022 at 10:27 AM Michael Marshall <mmarsh...@apache.org> wrote: > > Hi Pulsar Community, > > Below are the meeting notes from today's community meeting. > > Disclaimer: I am the primary author of these notes. I took the notes > while participating in the meeting discussions. It is possible that I > missed or misunderstood information. If something is misattributed or > misrepresented, please send a correction to this list and consider > updating the Google doc. > > Source google doc: > https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE > > Thanks, > Michael > > 2022/04/14, (8:30 AM PST) > - Attendees: > - Matteo Merli > - Enrico Olivelli > - Andrey Yegorov > - Michael Marshall > - Dave Fisher > - Lari Hotari > - Massimiliano Mirelli > - Chris Bartholomew > - Hang Chen > - Aaron Williams > - Nicolò Boschi > - Leolinchen > - Penghui Li > > - Discussions > > - Enrico: 2.10 release process. Took a while. Do we want to talk > about this? For 2.11, we should try to apply the new process. Matteo: > 3 months from now we can release 2.11, we’ll create the branch in 2 > months. Matteo plans to set a date (by discussion on the mailing list) > and wants more scrutiny on the mailing list. Dave: we should slow down > cherry picking to 2.8 and 2.9, as well. Enrico: we are finding many > fixes though, and for example, 2.8 has many users and many bug fixes. > The cherry picked commits are all bug fixes. Michael: we should add > some documentation about this to help new committers. Matteo: this > documentation would help inform contributors too. Dave: where should > we put this? Website? Matteo: we could also put it in the PR template. > > - Michael: is 2.7.5 the last 2.7 release? Matteo: could keep it open > for security bug fixes, like log4shell type fixes. Lari: 2.7.5 rc 1 > has test failures, so we’ll need an rc 2. The tests that are failing > on 2.7.5 are passing on 2.7.4. Matteo: thinking through LTS and the > cost of users to do the upgrades. There is a tension between shipping > new features and how frequently users have to upgrade. One issue: the > upgrade/downgrade compatibility is only guaranteed for one minor > version. An LTS could help to support those users without adding > features. We could offer guarantees from one LTS to the next LTS. We’d > define support so users could stick with a version without worrying > about getting left behind. What if we did 3.0 and 4.0 and so on are > LTS, then 3.x is just for features? The guarantee then is that you can > go 3.x to 4.0. Dave: what about for current users using the 2.x > versions? Matteo: we can discuss how to deal with existing versions, > but we also need to figure out our preferred long term solution for > how to work in the future. Dave: I like the idea of guaranteeing > upgrade paths. Matteo: we could try to set a timeline for major > releases, not just for minor releases, e.g. every 2 years for a major > release. Discusses reasons for major releases and the nuance for how > we could use this. Dave: are bookkeeper upgrade and transactions the > major upgrade? Matteo: I didn’t have any feature in mind. I want to > give people an upgrade path and create clarity. Michael: clarifies > that you could upgrade from 3.0 to 4.0 then downgrade and it’d work. > Matteo: yes. Feature defaults won’t be able to change because of this. > Dave: relates well to creating a road map and telling people what is > coming. Enrico: creating a road map is very hard in open source. We > commit things that people contribute. In the ASF projects that I work, > contributions are hard to predict. Matteo: I agree it is hard to know. > These major releases would be loosely timed. For example, auto > partitioning is a major feature, but it is a bunch of work. > Unpredictability is bad for the users. Michael: and you don’t want to > create a hard upgrade path. Is it possible to use geo-replication (or > something like it) to migrate clusters to simplify upgrades? Matteo: > there was a green-blue deployment work in progress proposal to spin up > a new cluster to slow migrate producers and consumers to new cluster. > The coordination would be topic termination to switch new cluster. Not > sure that it is a general solution. Michael: how would breaking > changes work for the major version upgrade? Matteo: we would do a > compatibility layer. Also, the pulsar protocol hasn’t broken, and we > version the api in such a way that the broker/client determine if the > peer supports that feature. > > - PRs > > - Lari: Merged PR (https://github.com/apache/pulsar/pull/15067) to > fix ManagedCursorImpl’s mark delete update logic, but asked for > Matteo’s review. Lari plans to add more tests in the coming weeks to > catch regressions associated with the change. > > - Andrey: https://github.com/apache/pulsar/pull/15142 WIP pulsar + > bk 4.15-ish. Requests review of preliminary work, mentions that there > is a test failure he’s still investigating. Switched CI to use > Bookkeeper 4.16-SNAPSHOT to identify needed changes. Worked on tests > that broke. Some test classes were copied from bookkeeper, so he > replaced those with copy/pasted new ones. The work is iterative, and > there are still tests failing. Discussion with Matteo about tradeoffs > for test base classes and ways to improve testing classes. Matteo says > don’t worry about synchronizing tests between Pulsar/Bookkeeper. The > test utilities in bookkeeper are different. Pulsar testing assumes > that bookkeeper works and are meant to test usage of bookkeeper. > Matteo: how far do you think you are from completion? Andrey: hard to > say, tests are passing locally, but failing on remote CI. > > - Hang: https://github.com/apache/pulsar/issues/15111 Bookie lost > data when skip write journal, Hang Chen says he has seen this many > times in production. Enrico: if you don’t write to journal, this is a > possible behavior. The next bookkeeper release will include a code > change. Andrey: if you want to run without journal, increase write > quorum. Matteo: use different racks to increase durability and > decrease chance of catastrophic failure. Enrico: there are some > problems in bk protocol, even if you have multiple replicas, you are > going to lose data. 4.15 includes a change to the protocol for how the > bookkeeper responds. This improves a fix for a specific edge case. The > only fix is to upgrade. Andrey: reminder that 4.15 is in the process > of being released. Matteo: is there any failure that happened during > this time? Hang Chen: no failure during this time. Enrico: during > recovery, the recovery tries to find missing entries in the ledger. > Went on to discuss technical details of the improvement for 4.15. > Matteo: the error appears strange, and the missing entries don’t seem > to make sense. Mentions that rebuilding the index could be helpful. > (Missed some technical details about bookkeeper, see issue for more > context and discussion.)