+1 Thanks for articulating that so well Josh. Sam
> On 12 Apr 2019, at 16:19, Blake Eggleston <beggles...@apple.com.INVALID> > wrote: > > Well said Josh. You’ve pretty much summarized my thoughts on this as well. > > +1 to moving forward with this > >> On Apr 11, 2019, at 10:15 PM, Joshua McKenzie <jmcken...@apache.org> wrote: >> >> As one of the two people that re-wrote all our unit tests to try and help >> Sylvain get 8099 out the door, I think it's inaccurate to compare the scope >> and potential stability impact of this work to the truly sweeping work that >> went into 8099 (not to downplay the scope and extent of this work here). >> >> TBH, one of the big reasons we tend to drop such large PRs is the fact that >>> Cassandra's code is highly intertwined and it makes it hard to precisely >>> change things. We need to iterate towards interfaces that allow us to >>> iterate quickly and reduce the amount of highly intertwined code. It helps >>> with testing as well. I want us to have a meaningful discussion around it >>> before we drop a big PR. >> >> This has been a huge issue with our codebase since at least back when I >> first encountered it five years ago. To date, while we have made progress >> on this front, it's been nowhere near sufficient to mitigate the issues in >> the codebase and allow for large, meaningful changes in smaller incremental >> patches or commits. Having yet another discussion around this (there have >> been many, many of them over the years) as a blocker for significant work >> to go into the codebase is an unnecessary and dangerous blocker. Not to say >> we shouldn't formalize a path to try and make incremental progress to >> improve the situation, far from it, but blocking other progress on a >> decade's worth of accumulated hygiene problems isn't going to make the >> community focus on fixing those problems imo, it'll just turn away >> contributions. >> >> So let me second jd (and many others') opinion here: "it makes sense to get >> it right the first time, rather than applying bandaids to 4.0 and rewriting >> things for 4.next". And fwiw, asking people who have already done a huge >> body of work to reformat that work into a series of commits or to break up >> that work in a fashion that's more to the liking of people not involved in >> either the writing of the patch or reviewing of it doesn't make much sense >> to me. As I am neither an assignee nor reviewer on this contribution, I >> leave it up to the parties involved to do things professionally and with a >> high standard of quality. Admittedly, a large code change merging in like >> this has implications for rebasing on anyone else's work that's in flight, >> but be it one commit merged or 50, or be it one JIRA ticket or ten, the >> end-result is the same; any large contribution in any format will ripple >> outwards and require re-work from others in the community. >> >> The one thing I *would* strongly argue for is performance benchmarking of >> the new messaging code on a representative sample of different >> general-purpose queries, LWT's, etc, preferably in a 3 node RF=3 cluster, >> plus a healthy suite of jmh micro-benches (assuming they're not already in >> the diff. If they are, disregard / sorry). From speaking with Aleksey >> offline about this work, my understanding is that that's something they >> plan on doing before putting a bow on things. >> >> In the balance between "fear change because it destabilizes" and "go forth >> blindly into that dark night, rewriting All The Things", I think the >> Cassandra project's willingness to jettison the old and introduce the new >> has served it well in keeping relevant as the years have gone by. I'd hate >> to see that culture of progress get mired in a dogmatic adherence to >> requirements on commit counts, lines of code allowed / expected on a given >> patch set, or any other metrics that might stymie the professional needs of >> some of the heaviest contributors to the project. >> >> On Wed, Apr 10, 2019 at 5:03 PM Oleksandr Petrov <oleksandr.pet...@gmail.com> >> wrote: >> >>> Sorry to pick only a few points to address, but I think these are ones >>> where I can contribute productively to the discussion. >>> >>>> In principle, I agree with the technical improvements you >>> mention (backpressure / checksumming / etc). These things should be there. >>> Are they a hard requirement for 4.0? >>> >>> One thing that comes to mind is protocol versioning and consistency. If >>> changes adding checksumming and handshake do not make it to 4.0, we grow >>> the upgrade matrix and have to put changes to the separate protocol >>> version. I'm not sure how many other internode protocol changes we have >>> planned for 4.next, but this is definitely something we should keep in >>> mind. >>> >>>> 2. We should not be measuring complexity in LoC with the exception that >>> all 20k lines *do need to be review* (not just the important parts and >>> because code refactoring tools have bugs too) and more lines take more >>> time. >>> >>> Everything should definitely be reviewed. But with different rigour: one >>> thing is to review byte arithmetic and protocol formats and completely >>> different thing is to verify that Verb moved from one place to the other is >>> still used. Concentrating on a smaller subset doesn't make a patch smaller, >>> just helps to guide reviewers and observers, so my primary goal was to help >>> people, hence my reference to the jira comment I'm working on. >>> >>> >>> On Wed, Apr 10, 2019 at 6:13 PM Sankalp Kohli <kohlisank...@gmail.com> >>> wrote: >>> >>>> I think we should wait for testing doc on confluence to come up and >>>> discuss what all needs to be added there to gain confidence. >>>> >>>> If the work is more to split the patch into smaller parts and delays 4.0 >>>> even more, can we use time in adding more test coverage, documentation >>> and >>>> design docs to this component? Will that be a middle ground here ? >>>> >>>> Examples of large changes not going well is due to lack of testing, >>>> documentation and design docs not because they were big. Being big adds >>> to >>>> the complexity and increased the total bug count but small changes can be >>>> equally worse in terms of impact. >>>> >>>> >>>>> On Apr 10, 2019, at 8:53 AM, Jordan West <jorda...@gmail.com> wrote: >>>>> >>>>> There is a lot of discuss here so I’ll try to keep my opinions brief: >>>>> >>>>> 1. The bug fixes are a requirement in order to have a stable 4.0. >>> Whether >>>>> they come from this patch or the original I have less of an opinion. I >>> do >>>>> think its important to minimize code changes at this time in the >>>>> development/freeze cycle — including large refactors which add risk >>>> despite >>>>> how they are being discussed here. From that perspective, I would >>> prefer >>>> to >>>>> see more targeted fixes but since we don’t have them and we have this >>>> patch >>>>> the decision is different. >>>>> >>>>> 2. We should not be measuring complexity in LoC with the exception that >>>> all >>>>> 20k lines *do need to be review* (not just the important parts and >>>> because >>>>> code refactoring tools have bugs too) and more lines take more time. >>>>> Otherwise, its a poor metric for how long this will take to review. >>>>> Further, it seems odd that the authors are projecting how long it will >>>> take >>>>> to review — this should be the charge of the reviewers and I would like >>>> to >>>>> hear from them on this. Its clear this a complex patch — as risky as >>>>> something like 8099 (and the original rewrite by Jason). We should >>> treat >>>> it >>>>> as such and not try to merge it in quickly for the sake of it, >>> repeating >>>>> past mistakes. The goal of reviewing the messaging service work was to >>> do >>>>> just that. It would be a disservice to rush in these changes now. If >>> the >>>>> goal is to get the patch in that should be the priority, not completing >>>> it >>>>> “in two weeks”. Its clear several community members have pushed back on >>>>> that and are not comfortable with the time frame. >>>>> >>>>> 3. If we need to add special forks of Netty classes to the code because >>>> of >>>>> “how we use Netty” that is a concern to me w.r.t to quality, stability, >>>> and >>>>> maintenance. Is there a place that documents/justifies our >>>> non-traditional >>>>> usage (I saw some in JavaDocs but found it lacking in *why* but had a >>> lot >>>>> of how/what was changed). Given folks in the community have decent >>>>> relationships with the Netty team perhaps we should leverage that as >>>> well. >>>>> Have we reached out to them? >>>>> >>>>> 4. In principle, I agree with the technical improvements you mention >>>>> (backpressure / checksumming / etc). These things should be there. Are >>>> they >>>>> a hard requirement for 4.0? In my opinion no and Dinesh has done a good >>>> job >>>>> of providing some reasons as to why so I won’t go into much detail >>> here. >>>> In >>>>> short, a bug and a missing safety mechanism are not the same thing. Its >>>>> also important to consider how many users a change like that covers and >>>> for >>>>> what risk — we found a bug in 13304 late in review, had it slipped >>>> through >>>>> it would have subjected users to silent corruption they thought >>> couldn’t >>>>> occur anymore because we included the feature in a prod Cassandra >>>> release. >>>>> >>>>> 5. The patch could seriously benefit from some commit hygiene that >>> would >>>>> make it easier for folks to review. Had this been done not only would >>>>> review be easier but also the piecemeal breakup of features/fixes could >>>>> have been done more easily. I don’t buy the premise that this wasn’t >>>>> possible. If we had to add the feature/fix later it would have been >>>>> possible. I’m sure there was a smart way we could have organized it, if >>>> it >>>>> was a priority. >>>>> >>>>> 6. Have any upgrade tests been done/added? I would also like to see >>> some >>>>> performance benchmarks before merging so that the patch is in a similar >>>>> state as 14503 in terms of performance testing. >>>>> >>>>> I’m sure I have more thoughts but these seem like the important ones >>> for >>>>> now. >>>>> >>>>> Jordan >>>>> >>>>> On Wed, Apr 10, 2019 at 8:21 AM Dinesh Joshi >>> <djos...@icloud.com.invalid >>>>> >>>>> wrote: >>>>> >>>>>> Here's are my 2¢. >>>>>> >>>>>> I think the general direction of this work is valuable but I have a >>> few >>>>>> concerns I’d like to address. More inline. >>>>>> >>>>>>> On Apr 4, 2019, at 1:13 PM, Aleksey Yeschenko <alek...@apache.org> >>>>>> wrote: >>>>>>> >>>>>>> I would like to propose CASSANDRA-15066 [1] - an important set of bug >>>>>> fixes >>>>>>> and stability improvements to internode messaging code that Benedict, >>>> I, >>>>>>> and others have been working on for the past couple of months. >>>>>>> >>>>>>> First, some context. This work started off as a review of >>>>>> CASSANDRA-14503 >>>>>>> (Internode connection management is race-prone [2]), CASSANDRA-13630 >>>>>>> (Support large internode messages with netty) [3], and a pre-4.0 >>>>>>> confirmatory review of such a major new feature. >>>>>>> >>>>>>> However, as we dug in, we realized this was insufficient. With more >>>> than >>>>>> 50 >>>>>>> bugs uncovered [4] - dozens of them critical to correctness and/or >>>>>>> stability of the system - a substantial rework was necessary to >>>>>> guarantee a >>>>>>> solid internode messaging subsystem for the 4.0 release. >>>>>>> >>>>>>> In addition to addressing all of the uncovered bugs [4] that were >>>> unique >>>>>> to >>>>>>> trunk + 13630 [3] + 14503 [2], we used this opportunity to correct >>> some >>>>>>> long-existing, pre-4.0 bugs and stability issues. For the complete >>> list >>>>>> of >>>>>>> notable bug fixes, read the comments to CASSANDRA-15066 [1]. But I’d >>>> like >>>>>>> to highlight a few. >>>>>> >>>>>> Do you have regression tests that will fail if these bugs are >>>> reintroduced >>>>>> at a later point? >>>>>> >>>>>>> # Lack of message integrity checks >>>>>>> >>>>>>> It’s known that TCP checksums are too weak [5] and Ethernet CRC >>> cannot >>>> be >>>>>>> relied upon [6] for integrity. With sufficient scale or time, you >>> will >>>>>> hit >>>>>>> bit flips. Sadly, most of the time these go undetected. Cassandra’s >>>>>>> replication model makes this issue much more serious, as the faulty >>>> data >>>>>>> can infect the cluster. >>>>>>> >>>>>>> We recognised this problem, and recently introduced a fix for >>>>>> server-client >>>>>>> messages, implementing CRCs in CASSANDRA-13304 (Add checksumming to >>> the >>>>>>> native protocol) [7]. >>>>>> >>>>>> This was discussed in my review and Jason created a ticket[1] to track >>>> it. >>>>>> We explicitly decided to defer this work not only due to the feature >>>> freeze >>>>>> in the community but also for technical reasons detailed below. >>>>>> >>>>>> Regarding new features during the feature freeze window, we have had >>>> such >>>>>> discussions in the past. The most recent being the one I initiated on >>>> Zstd >>>>>> Compressor which went positively and we have moved forward after >>>> assessing >>>>>> risk & community consensus. >>>>>> >>>>>> Regarding checksumming, please scroll down to the comments section in >>>> the >>>>>> link[2] you provided. You'll notice this discussion – >>>>>> >>>>>>>> Daniel Fox Franke: >>>>>>>> Please don't design new network protocols that don't either run over >>>>>> TLS or do some other kind of cryptographic authentication. If you have >>>>>> cryptographic authentication, then CRC is redundant. >>>>>>> >>>>>>> Evan Jones: >>>>>>> Good point. Even internal applications inside a data center should be >>>>>> using encryption, and today the performance impact is probably small >>> (I >>>>>> haven't actually measured it myself these days). >>>>>> >>>>>> Enabling TLS & internode compression are mitigation strategies to >>> avoid >>>>>> data corruption in transit. By your own admission in >>>> CASSANDRA-13304[3], we >>>>>> don't require checksumming if TLS is present. Here's your full quote – >>>>>> >>>>>>> Aleksey Yeschenko: >>>>>>> >>>>>> >>>>>>> Checksums and TLS are orthogonal. It just so happens that you don't >>>> need >>>>>> the former if you already have the latter. >>>>>> >>>>>> I want to be fair, later you did say that we don't want to force >>> people >>>> to >>>>>> pay the cost of TLS overhead. However, I would also like to point out >>>> that >>>>>> with introduction of Netty for internode communication, we have 4-5x >>> the >>>>>> TLS performance thanks to OpenSSL JNI bindings. You can refer to >>> Norman >>>> or >>>>>> my talks on the topic. So TLS is practical & compression is necessary >>>> for >>>>>> performance. Both strategies work fine to protect against data >>>> corruption >>>>>> making checksumming redundant. With SSL certificate hot reloading, it >>>> also >>>>>> avoids unnecessary cluster restarts providing maximum availability. >>>>>> >>>>>> In the same vein, it's 2019 and if people are not using TLS for >>>> internode, >>>>>> then it is really really bad for data security in our industry and we >>>>>> should not be encouraging it. In fact, I would go so far as to make >>> TLS >>>> as >>>>>> the default. >>>>>> >>>>>> Managing TLS infrastructure is beyond Cassandra's scope and operators >>>>>> should figure it out by now for their & their user's sake. Cassandra >>>> makes >>>>>> it super easy & performant to have TLS enabled. People should be using >>>> it. >>>>>> >>>>>>> But until CASSANDRA-15066 [1] lands, this is also a critical flaw >>>>>>> internode. We have addressed it by ensuring that no matter what, >>>> whether >>>>>>> you use SSL or not, whether you use internode compression or not, a >>>>>>> protocol level CRC is always present, for every message frame. It’s >>> our >>>>>>> deep and sincere belief that shipping a new implementation of the >>>>>> messaging >>>>>>> protocol without application-level data integrity checks would be >>>>>>> unacceptable for a modern database. >>>>>> >>>>>> My previous technical arguments have provided enough evidence that >>>>>> protocol level checksumming is not a show stopper. >>>>>> >>>>>> The only reason I see for adding checksums in the protocol is when >>> some >>>>>> user doesn't want to enable TLS and internode compression. As it >>> stands, >>>>>> from your comments[7] it seems to be mandatory and adds unnecessary >>>>>> overhead when TLS and/or Compression is enabled. Frankly I don't think >>>> we >>>>>> need to risk destabilizing trunk for these use-cases. I want to >>>> reiterate >>>>>> that I believe in doing the right thing but we have to make acceptable >>>>>> tradeoffs – as a community. >>>>>> >>>>>>> # Lack of back-pressure and memory usage constraints >>>>>>> >>>>>>> As it stands today, it’s far too easy for a single slow node to >>> become >>>>>>> overwhelmed by messages from its peers. Conversely, multiple >>>>>> coordinators >>>>>>> can be made unstable by the backlog of messages to deliver to just >>> one >>>>>>> struggling node. >>>>>> >>>>>> This is a known issue and it could have been addressed a separate bug >>>> fix >>>>>> – one that could be independently verified. >>>>>> >>>>>>> To address this problem, we have introduced strict memory usage >>>>>> constraints >>>>>>> that apply TCP-level back-pressure, on both outbound and inbound. It >>>> is >>>>>>> now impossible for a node to be swamped on inbound, and on outbound >>> it >>>> is >>>>>>> made significantly harder to overcommit resources. It’s a simple, >>>>>> reliable >>>>>>> mechanism that drastically improves cluster stability under load, and >>>>>>> especially overload. >>>>>>> >>>>>>> Cassandra is a mature system, and introducing an entirely new >>> messaging >>>>>>> implementation without resolving this fundamental stability issue is >>>>>>> difficult to justify in our view. >>>>>> >>>>>> This is great in theory. I would really like to see objective >>>> measurements >>>>>> like Chris did in CASSANDRA-14654[4]. Netflix engineers tested the >>> Netty >>>>>> refactor with a 200 node cluster[5], Zero Copy Streaming[6] and >>> reported >>>>>> their results. It's invaluable work. It would be great to see >>> something >>>>>> similar. >>>>>> >>>>>>> # State of the patch, feature freeze and 4.0 timeline concerns >>>>>>> >>>>>>> The patch is essentially complete, with much improved unit tests all >>>>>>> passing, dtests green, and extensive fuzz testing underway - with >>>> initial >>>>>>> results all positive. We intend to further improve in-code >>>> documentation >>>>>>> and test coverage in the next week or two, and do some minor >>> additional >>>>>>> code review, but we believe it will be basically good to commit in a >>>>>> couple >>>>>>> weeks. >>>>>> >>>>>> A 20K LoC patch is unverifiable especially without much documentation. >>>> It >>>>>> places undue burden on reviewers. It also messes up everyone's >>> branches >>>>>> once you commit such a large refactor. Let's be considerate to others >>> in >>>>>> the community. It is a good engineering practice to limit patches to a >>>> size >>>>>> that is reasonable. >>>>>> >>>>>> More importantly such large refactors lend themselves to new bugs that >>>>>> cannot be caught easily unless you have a very strong regression test >>>> suite >>>>>> which Cassandra arguably lacks. Therefore I am of the opinion that the >>>> bug >>>>>> fixes can be applied piecemeal into the codebase. They should be small >>>>>> enough that can be individually reviewed and independently verified. >>>>>> >>>>>> I also noticed that you have replaced Netty's classes[7]. I am of the >>>>>> opinion that they should be upstreamed if they're better so the wider >>>> Netty >>>>>> community benefits from it and we don't have to maintain our own >>>> classes. >>>>>> >>>>>>> P.S. I believe that once it’s committed, we should cut our first 4.0 >>>>>> alpha. >>>>>>> It’s about time we started this train (: >>>>>> >>>>>> +1 on working towards an alpha but that is a separate discussion from >>>> this >>>>>> issue. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Dinesh >>>>>> >>>>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-14578 >>>>>> [2] https://www.evanjones.ca/tcp-and-ethernet-checksums-fail.html >>>>>> [3] >>>>>> >>>> >>> https://issues.apache.org/jira/browse/CASSANDRA-13304?focusedCommentId=16183034&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16183034 >>>>>> [4] https://issues.apache.org/jira/browse/CASSANDRA-14654 >>>>>> [5] https://issues.apache.org/jira/browse/CASSANDRA-14747 >>>>>> [6] https://issues.apache.org/jira/browse/CASSANDRA-14765 >>>>>> [7] >>>>>> >>>> >>> https://issues.apache.org/jira/browse/CASSANDRA-15066?focusedCommentId=16801277&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16801277 >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>> >>>> >>> >>> -- >>> alex p >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org