Well said Josh. You’ve pretty much summarized my thoughts on this as well.
+1 to moving forward with this > On Apr 11, 2019, at 10:15 PM, Joshua McKenzie <jmcken...@apache.org> wrote: > > As one of the two people that re-wrote all our unit tests to try and help > Sylvain get 8099 out the door, I think it's inaccurate to compare the scope > and potential stability impact of this work to the truly sweeping work that > went into 8099 (not to downplay the scope and extent of this work here). > > TBH, one of the big reasons we tend to drop such large PRs is the fact that >> Cassandra's code is highly intertwined and it makes it hard to precisely >> change things. We need to iterate towards interfaces that allow us to >> iterate quickly and reduce the amount of highly intertwined code. It helps >> with testing as well. I want us to have a meaningful discussion around it >> before we drop a big PR. > > This has been a huge issue with our codebase since at least back when I > first encountered it five years ago. To date, while we have made progress > on this front, it's been nowhere near sufficient to mitigate the issues in > the codebase and allow for large, meaningful changes in smaller incremental > patches or commits. Having yet another discussion around this (there have > been many, many of them over the years) as a blocker for significant work > to go into the codebase is an unnecessary and dangerous blocker. Not to say > we shouldn't formalize a path to try and make incremental progress to > improve the situation, far from it, but blocking other progress on a > decade's worth of accumulated hygiene problems isn't going to make the > community focus on fixing those problems imo, it'll just turn away > contributions. > > So let me second jd (and many others') opinion here: "it makes sense to get > it right the first time, rather than applying bandaids to 4.0 and rewriting > things for 4.next". And fwiw, asking people who have already done a huge > body of work to reformat that work into a series of commits or to break up > that work in a fashion that's more to the liking of people not involved in > either the writing of the patch or reviewing of it doesn't make much sense > to me. As I am neither an assignee nor reviewer on this contribution, I > leave it up to the parties involved to do things professionally and with a > high standard of quality. Admittedly, a large code change merging in like > this has implications for rebasing on anyone else's work that's in flight, > but be it one commit merged or 50, or be it one JIRA ticket or ten, the > end-result is the same; any large contribution in any format will ripple > outwards and require re-work from others in the community. > > The one thing I *would* strongly argue for is performance benchmarking of > the new messaging code on a representative sample of different > general-purpose queries, LWT's, etc, preferably in a 3 node RF=3 cluster, > plus a healthy suite of jmh micro-benches (assuming they're not already in > the diff. If they are, disregard / sorry). From speaking with Aleksey > offline about this work, my understanding is that that's something they > plan on doing before putting a bow on things. > > In the balance between "fear change because it destabilizes" and "go forth > blindly into that dark night, rewriting All The Things", I think the > Cassandra project's willingness to jettison the old and introduce the new > has served it well in keeping relevant as the years have gone by. I'd hate > to see that culture of progress get mired in a dogmatic adherence to > requirements on commit counts, lines of code allowed / expected on a given > patch set, or any other metrics that might stymie the professional needs of > some of the heaviest contributors to the project. > > On Wed, Apr 10, 2019 at 5:03 PM Oleksandr Petrov <oleksandr.pet...@gmail.com> > wrote: > >> Sorry to pick only a few points to address, but I think these are ones >> where I can contribute productively to the discussion. >> >>> In principle, I agree with the technical improvements you >> mention (backpressure / checksumming / etc). These things should be there. >> Are they a hard requirement for 4.0? >> >> One thing that comes to mind is protocol versioning and consistency. If >> changes adding checksumming and handshake do not make it to 4.0, we grow >> the upgrade matrix and have to put changes to the separate protocol >> version. I'm not sure how many other internode protocol changes we have >> planned for 4.next, but this is definitely something we should keep in >> mind. >> >>> 2. We should not be measuring complexity in LoC with the exception that >> all 20k lines *do need to be review* (not just the important parts and >> because code refactoring tools have bugs too) and more lines take more >> time. >> >> Everything should definitely be reviewed. But with different rigour: one >> thing is to review byte arithmetic and protocol formats and completely >> different thing is to verify that Verb moved from one place to the other is >> still used. Concentrating on a smaller subset doesn't make a patch smaller, >> just helps to guide reviewers and observers, so my primary goal was to help >> people, hence my reference to the jira comment I'm working on. >> >> >> On Wed, Apr 10, 2019 at 6:13 PM Sankalp Kohli <kohlisank...@gmail.com> >> wrote: >> >>> I think we should wait for testing doc on confluence to come up and >>> discuss what all needs to be added there to gain confidence. >>> >>> If the work is more to split the patch into smaller parts and delays 4.0 >>> even more, can we use time in adding more test coverage, documentation >> and >>> design docs to this component? Will that be a middle ground here ? >>> >>> Examples of large changes not going well is due to lack of testing, >>> documentation and design docs not because they were big. Being big adds >> to >>> the complexity and increased the total bug count but small changes can be >>> equally worse in terms of impact. >>> >>> >>>> On Apr 10, 2019, at 8:53 AM, Jordan West <jorda...@gmail.com> wrote: >>>> >>>> There is a lot of discuss here so I’ll try to keep my opinions brief: >>>> >>>> 1. The bug fixes are a requirement in order to have a stable 4.0. >> Whether >>>> they come from this patch or the original I have less of an opinion. I >> do >>>> think its important to minimize code changes at this time in the >>>> development/freeze cycle — including large refactors which add risk >>> despite >>>> how they are being discussed here. From that perspective, I would >> prefer >>> to >>>> see more targeted fixes but since we don’t have them and we have this >>> patch >>>> the decision is different. >>>> >>>> 2. We should not be measuring complexity in LoC with the exception that >>> all >>>> 20k lines *do need to be review* (not just the important parts and >>> because >>>> code refactoring tools have bugs too) and more lines take more time. >>>> Otherwise, its a poor metric for how long this will take to review. >>>> Further, it seems odd that the authors are projecting how long it will >>> take >>>> to review — this should be the charge of the reviewers and I would like >>> to >>>> hear from them on this. Its clear this a complex patch — as risky as >>>> something like 8099 (and the original rewrite by Jason). We should >> treat >>> it >>>> as such and not try to merge it in quickly for the sake of it, >> repeating >>>> past mistakes. The goal of reviewing the messaging service work was to >> do >>>> just that. It would be a disservice to rush in these changes now. If >> the >>>> goal is to get the patch in that should be the priority, not completing >>> it >>>> “in two weeks”. Its clear several community members have pushed back on >>>> that and are not comfortable with the time frame. >>>> >>>> 3. If we need to add special forks of Netty classes to the code because >>> of >>>> “how we use Netty” that is a concern to me w.r.t to quality, stability, >>> and >>>> maintenance. Is there a place that documents/justifies our >>> non-traditional >>>> usage (I saw some in JavaDocs but found it lacking in *why* but had a >> lot >>>> of how/what was changed). Given folks in the community have decent >>>> relationships with the Netty team perhaps we should leverage that as >>> well. >>>> Have we reached out to them? >>>> >>>> 4. In principle, I agree with the technical improvements you mention >>>> (backpressure / checksumming / etc). These things should be there. Are >>> they >>>> a hard requirement for 4.0? In my opinion no and Dinesh has done a good >>> job >>>> of providing some reasons as to why so I won’t go into much detail >> here. >>> In >>>> short, a bug and a missing safety mechanism are not the same thing. Its >>>> also important to consider how many users a change like that covers and >>> for >>>> what risk — we found a bug in 13304 late in review, had it slipped >>> through >>>> it would have subjected users to silent corruption they thought >> couldn’t >>>> occur anymore because we included the feature in a prod Cassandra >>> release. >>>> >>>> 5. The patch could seriously benefit from some commit hygiene that >> would >>>> make it easier for folks to review. Had this been done not only would >>>> review be easier but also the piecemeal breakup of features/fixes could >>>> have been done more easily. I don’t buy the premise that this wasn’t >>>> possible. If we had to add the feature/fix later it would have been >>>> possible. I’m sure there was a smart way we could have organized it, if >>> it >>>> was a priority. >>>> >>>> 6. Have any upgrade tests been done/added? I would also like to see >> some >>>> performance benchmarks before merging so that the patch is in a similar >>>> state as 14503 in terms of performance testing. >>>> >>>> I’m sure I have more thoughts but these seem like the important ones >> for >>>> now. >>>> >>>> Jordan >>>> >>>> On Wed, Apr 10, 2019 at 8:21 AM Dinesh Joshi >> <djos...@icloud.com.invalid >>>> >>>> wrote: >>>> >>>>> Here's are my 2¢. >>>>> >>>>> I think the general direction of this work is valuable but I have a >> few >>>>> concerns I’d like to address. More inline. >>>>> >>>>>> On Apr 4, 2019, at 1:13 PM, Aleksey Yeschenko <alek...@apache.org> >>>>> wrote: >>>>>> >>>>>> I would like to propose CASSANDRA-15066 [1] - an important set of bug >>>>> fixes >>>>>> and stability improvements to internode messaging code that Benedict, >>> I, >>>>>> and others have been working on for the past couple of months. >>>>>> >>>>>> First, some context. This work started off as a review of >>>>> CASSANDRA-14503 >>>>>> (Internode connection management is race-prone [2]), CASSANDRA-13630 >>>>>> (Support large internode messages with netty) [3], and a pre-4.0 >>>>>> confirmatory review of such a major new feature. >>>>>> >>>>>> However, as we dug in, we realized this was insufficient. With more >>> than >>>>> 50 >>>>>> bugs uncovered [4] - dozens of them critical to correctness and/or >>>>>> stability of the system - a substantial rework was necessary to >>>>> guarantee a >>>>>> solid internode messaging subsystem for the 4.0 release. >>>>>> >>>>>> In addition to addressing all of the uncovered bugs [4] that were >>> unique >>>>> to >>>>>> trunk + 13630 [3] + 14503 [2], we used this opportunity to correct >> some >>>>>> long-existing, pre-4.0 bugs and stability issues. For the complete >> list >>>>> of >>>>>> notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d >>> like >>>>>> to highlight a few. >>>>> >>>>> Do you have regression tests that will fail if these bugs are >>> reintroduced >>>>> at a later point? >>>>> >>>>>> # Lack of message integrity checks >>>>>> >>>>>> It’s known that TCP checksums are too weak [5] and Ethernet CRC >> cannot >>> be >>>>>> relied upon [6] for integrity. With sufficient scale or time, you >> will >>>>> hit >>>>>> bit flips. Sadly, most of the time these go undetected. Cassandra’s >>>>>> replication model makes this issue much more serious, as the faulty >>> data >>>>>> can infect the cluster. >>>>>> >>>>>> We recognised this problem, and recently introduced a fix for >>>>> server-client >>>>>> messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to >> the >>>>>> native protocol) [7]. >>>>> >>>>> This was discussed in my review and Jason created a ticket[1] to track >>> it. >>>>> We explicitly decided to defer this work not only due to the feature >>> freeze >>>>> in the community but also for technical reasons detailed below. >>>>> >>>>> Regarding new features during the feature freeze window, we have had >>> such >>>>> discussions in the past. The most recent being the one I initiated on >>> Zstd >>>>> Compressor which went positively and we have moved forward after >>> assessing >>>>> risk & community consensus. >>>>> >>>>> Regarding checksumming, please scroll down to the comments section in >>> the >>>>> link[2] you provided. You'll notice this discussion – >>>>> >>>>>>> Daniel Fox Franke: >>>>>>> Please don't design new network protocols that don't either run over >>>>> TLS or do some other kind of cryptographic authentication. If you have >>>>> cryptographic authentication, then CRC is redundant. >>>>>> >>>>>> Evan Jones: >>>>>> Good point. Even internal applications inside a data center should be >>>>> using encryption, and today the performance impact is probably small >> (I >>>>> haven't actually measured it myself these days). >>>>> >>>>> Enabling TLS & internode compression are mitigation strategies to >> avoid >>>>> data corruption in transit. By your own admission in >>> CASSANDRA-13304[3], we >>>>> don't require checksumming if TLS is present. Here's your full quote – >>>>> >>>>>> Aleksey Yeschenko: >>>>>> >>>>> >>>>>> Checksums and TLS are orthogonal. It just so happens that you don't >>> need >>>>> the former if you already have the latter. >>>>> >>>>> I want to be fair, later you did say that we don't want to force >> people >>> to >>>>> pay the cost of TLS overhead. However, I would also like to point out >>> that >>>>> with introduction of Netty for internode communication, we have 4-5x >> the >>>>> TLS performance thanks to OpenSSL JNI bindings. You can refer to >> Norman >>> or >>>>> my talks on the topic. So TLS is practical & compression is necessary >>> for >>>>> performance. Both strategies work fine to protect against data >>> corruption >>>>> making checksumming redundant. With SSL certificate hot reloading, it >>> also >>>>> avoids unnecessary cluster restarts providing maximum availability. >>>>> >>>>> In the same vein, it's 2019 and if people are not using TLS for >>> internode, >>>>> then it is really really bad for data security in our industry and we >>>>> should not be encouraging it. In fact, I would go so far as to make >> TLS >>> as >>>>> the default. >>>>> >>>>> Managing TLS infrastructure is beyond Cassandra's scope and operators >>>>> should figure it out by now for their & their user's sake. Cassandra >>> makes >>>>> it super easy & performant to have TLS enabled. People should be using >>> it. >>>>> >>>>>> But until CASSANDRA-15066 [1] lands, this is also a critical flaw >>>>>> internode. We have addressed it by ensuring that no matter what, >>> whether >>>>>> you use SSL or not, whether you use internode compression or not, a >>>>>> protocol level CRC is always present, for every message frame. It’s >> our >>>>>> deep and sincere belief that shipping a new implementation of the >>>>> messaging >>>>>> protocol without application-level data integrity checks would be >>>>>> unacceptable for a modern database. >>>>> >>>>> My previous technical arguments have provided enough evidence that >>>>> protocol level checksumming is not a show stopper. >>>>> >>>>> The only reason I see for adding checksums in the protocol is when >> some >>>>> user doesn't want to enable TLS and internode compression. As it >> stands, >>>>> from your comments[7] it seems to be mandatory and adds unnecessary >>>>> overhead when TLS and/or Compression is enabled. Frankly I don't think >>> we >>>>> need to risk destabilizing trunk for these use-cases. I want to >>> reiterate >>>>> that I believe in doing the right thing but we have to make acceptable >>>>> tradeoffs – as a community. >>>>> >>>>>> # Lack of back-pressure and memory usage constraints >>>>>> >>>>>> As it stands today, it’s far too easy for a single slow node to >> become >>>>>> overwhelmed by messages from its peers. Conversely, multiple >>>>> coordinators >>>>>> can be made unstable by the backlog of messages to deliver to just >> one >>>>>> struggling node. >>>>> >>>>> This is a known issue and it could have been addressed a separate bug >>> fix >>>>> – one that could be independently verified. >>>>> >>>>>> To address this problem, we have introduced strict memory usage >>>>> constraints >>>>>> that apply TCP-level back-pressure, on both outbound and inbound. It >>> is >>>>>> now impossible for a node to be swamped on inbound, and on outbound >> it >>> is >>>>>> made significantly harder to overcommit resources. It’s a simple, >>>>> reliable >>>>>> mechanism that drastically improves cluster stability under load, and >>>>>> especially overload. >>>>>> >>>>>> Cassandra is a mature system, and introducing an entirely new >> messaging >>>>>> implementation without resolving this fundamental stability issue is >>>>>> difficult to justify in our view. >>>>> >>>>> This is great in theory. I would really like to see objective >>> measurements >>>>> like Chris did in CASSANDRA-14654[4]. Netflix engineers tested the >> Netty >>>>> refactor with a 200 node cluster[5], Zero Copy Streaming[6] and >> reported >>>>> their results. It's invaluable work. It would be great to see >> something >>>>> similar. >>>>> >>>>>> # State of the patch, feature freeze and 4.0 timeline concerns >>>>>> >>>>>> The patch is essentially complete, with much improved unit tests all >>>>>> passing, dtests green, and extensive fuzz testing underway - with >>> initial >>>>>> results all positive. We intend to further improve in-code >>> documentation >>>>>> and test coverage in the next week or two, and do some minor >> additional >>>>>> code review, but we believe it will be basically good to commit in a >>>>> couple >>>>>> weeks. >>>>> >>>>> A 20K LoC patch is unverifiable especially without much documentation. >>> It >>>>> places undue burden on reviewers. It also messes up everyone's >> branches >>>>> once you commit such a large refactor. Let's be considerate to others >> in >>>>> the community. It is a good engineering practice to limit patches to a >>> size >>>>> that is reasonable. >>>>> >>>>> More importantly such large refactors lend themselves to new bugs that >>>>> cannot be caught easily unless you have a very strong regression test >>> suite >>>>> which Cassandra arguably lacks. Therefore I am of the opinion that the >>> bug >>>>> fixes can be applied piecemeal into the codebase. They should be small >>>>> enough that can be individually reviewed and independently verified. >>>>> >>>>> I also noticed that you have replaced Netty's classes[7]. I am of the >>>>> opinion that they should be upstreamed if they're better so the wider >>> Netty >>>>> community benefits from it and we don't have to maintain our own >>> classes. >>>>> >>>>>> P.S. I believe that once it’s committed, we should cut our first 4.0 >>>>> alpha. >>>>>> It’s about time we started this train (: >>>>> >>>>> +1 on working towards an alpha but that is a separate discussion from >>> this >>>>> issue. >>>>> >>>>> Thanks, >>>>> >>>>> Dinesh >>>>> >>>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-14578 >>>>> [2] https://www.evanjones.ca/tcp-and-ethernet-checksums-fail.html >>>>> [3] >>>>> >>> >> https://issues.apache.org/jira/browse/CASSANDRA-13304?focusedCommentId=16183034&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16183034 >>>>> [4] https://issues.apache.org/jira/browse/CASSANDRA-14654 >>>>> [5] https://issues.apache.org/jira/browse/CASSANDRA-14747 >>>>> [6] https://issues.apache.org/jira/browse/CASSANDRA-14765 >>>>> [7] >>>>> >>> >> https://issues.apache.org/jira/browse/CASSANDRA-15066?focusedCommentId=16801277&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16801277 >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >>> >> >> -- >> alex p >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org