It is funny you say this: "tick-tock started based off of the 3.0 big bang “we broke everything” release"
*"Brain battles itself over short-term rewards, long-term goals"* https://www.princeton.edu/pr/news/04/q4/1014-brain.htm *Normalization of deviance in software: how broken practices become standard* https://news.ycombinator.com/item?id=10811822 I had something really long written. I summarized to this thought. Huge generalization coming: Group 1 "I have 1GB of data on a 200GB disk, I am going to switch to level DB and see what happens. YOLO DB!" v.s. Group 2 "I have 60GB data on a 200GB disk, If i switch to level DB I have to do in a way that does not impact my current users, and a way that won't fill my disks, and doing this in a controlled way might take days" Users gravitate toward Group 2 as they move they become more risk adverse. They are not going to want to upgrade more than twice a year. If they see risk they will not upgrade at all. If Group 2 is not upgrading all the "testers" become that of Group 1. I think a new metric systems would be fun. In the readme.txt TestAdded T DTestAdded D Feature F Fix B Ninja Fix N Refactor R Version 3.0 DDFFBBBBBBRRRRRTTTTDDDDDD Version 3.1 FBBBBBBBBBBRRRRTTDD Over time IF these did not gravitate toward FTD we know we are headed in the wrong direction. On Thu, Sep 15, 2016 at 2:57 PM, Jeremiah D Jordan < jeremiah.jor...@gmail.com> wrote: > Because tick-tock started based off of the 3.0 big bang “we broke > everything” release I don’t think we can judge wether or not it is working > until we are another 6 months in. AKA when we would have been releasing > the next big bang release. Right now a lot if not most of the bugs in a > given tick tock release are bugs that were introduced in 3.0. Even the bug > mentioned here, it is not a tick tock bug, it is a 3.0 bug. > > > > On Sep 15, 2016, at 1:48 PM, Jake Luciani <jak...@gmail.com> wrote: > > > > I'm pretty sure everyone will agree Tick-Tock didn't go well and needs to > > change. > > > > The problem for me is going back to the old way doesn't sound great. > There > > are parts of tick-tock I really like, > > for example, the cadence and limited scope per release. > > > > I know at the summit there were a lot of ideas thrown around I can > > regurgitate but perhaps people > > who have been thinking about this would like to chime in and present > ideas? > > > > -Jake > > > > On Thu, Sep 15, 2016 at 2:28 PM, Benedict Elliott Smith < > bened...@apache.org > >> wrote: > > > >> I agree tick-tock is a failure. But for two reasons IMO: > >> > >> 1) Ultimately, the users are the real testers and it takes a while for a > >> release to percolate into the wild for feedback. The reality is that a > >> release doesn't have its tires properly kicked for at least three months > >> after it's cut. So if we are to have any tocks, they should be > completely > >> unwed from the ticks, and should probably happen on a ~3M cadence to > keep > >> the labour down but the utility up (and there should probably still be > more > >> than one tock per tick) > >> > >> 2) Those promised resources to improved process never happened. We > haven't > >> even reached parity with the 2.1 release until very recently, i.e. no > >> failing u/dtests. > >> > >> > >> On 15 September 2016 at 19:08, Jeff Jirsa <jeff.ji...@crowdstrike.com> > >> wrote: > >> > >>> I know we’ve got a lot of folks following the dev list without a lot of > >>> background, so let’s make sure we get some context here so everyone can > >> be > >>> on the same page. > >>> > >>> Going to preface this wall of text by saying I’m +1 on a 3.5.1 (and > >> 3.3.1, > >>> etc) if it’s done AFTER 3.9 (I think we need to get 3.9 out first > before > >>> the RE manpower is spent on backporting fixes, even critical fixes, > >> because > >>> 3.9 has multiple critical fixes for people running 3.7). > >>> > >>> Now some background: > >>> > >>> For many years, Cassandra used to have a dev process that kept 3 active > >>> branches - “bleeding edge”, a “stable”, and an “old stable” branch, > where > >>> developers would be committing ALL new contributions to the bleeding > >> edge, > >>> non-api-breaking changes to stable, and bugfixes only to old stable. > >> While > >>> the api changed and major features were added, that bleeding edge would > >>> just be ‘trunk’, and it’d get cut into a major version when it was > ready > >> to > >>> ship. We saw that with 2.2 / 2.1 / 2.0 (and before that, 2.1 / 2.0 / > 1.2, > >>> and before that 2.0 / 1.2 / 1.1 ). When that bleeding edge got released > >> as > >>> a major x.y.0, the third, oldest, most stable branch went EOL, and new > >>> features would go into trunk for the next major version. > >>> > >>> There were two big negatives observed with this: > >>> > >>> The first big negative is that if multiple major new features were in > >>> flight, releases were prone to delay. Nobody wants to break an API on a > >>> x.y.1 release, and nobody wants to add a new feature to a x.y.2 > release, > >> so > >>> the project would delay the x.y releases if major features were close, > >> and > >>> then there’d be pressure to slip them in before they were fully tested, > >> or > >>> cut features to avoid delaying the release. This pressure was observed > to > >>> be bad for the project – it forced technical compromises. > >>> > >>> The second downside that was observed was that nobody would try to run > >> the > >>> new versions when they launched, because they were buggy because they > >> were > >>> filled with new features. 2.2, for example, introduced RBAC, commitlog > >>> compression, and user defined functions – major features that needed to > >> be > >>> tested. Unfortunately, because there were few real-world testers, there > >>> were still major bugs being found for months – the first > production-ready > >>> version of 2.2 is probably in the 2.2.5 or 2.2.6 range. > >>> > >>> For version 3, we moved to an alternate release, modeled on Intel’s > >>> tick/tock https://en.wikipedia.org/wiki/Tick-Tock_model > >>> > >>> The intention was to allow new features into 3.even releases (3.0, 3.2, > >>> 3.4, 3.6, and so on), with bugfixes in 3.odd releases (3.1, … ). The > hope > >>> was to allow more frequent releases to address the first big negative > >>> (flood of new features that blocked releases), while also helping to > >>> address the second – with fewer major features in a release, they > better > >>> get more/better test coverage. > >>> > >>> In the tick/tock model, anyone running 3.odd (like 3.5) should be > looking > >>> for bugfixes in 3.7. It’s certainly true that 3.5 is horribly broken > (as > >> is > >>> 3.3, and 3.4, etc), but with this release model, the bugfix SHOULD BE > in > >>> 3.7. As I mentioned previously, we have precedent for backporting > >> critical > >>> fixes, but we don’t have a well defined bar (that I see) for what’s > >>> critical enough for a backport. > >>> > >>> Jon is noting (and what many of us who run Cassandra in production have > >>> really known for a very long time) is that nobody wants to run 3.newest > >>> (even or odd), because 3.newest is likely broken (because it’s a > complex > >>> distributed database, and testing is hard, and it takes time and > complex > >>> workloads to find bugs). In the tick/tock model, because new features > >> went > >>> into 3.6, there are new features that may not be adequately > >>> tested/validated in 3.7 a user of 3.5 doesn’t want, and isn’t willing > to > >>> accept the risk. > >>> > >>> The bottom line here is that tick/tock is probably a well intentioned > but > >>> failed attempt to bring stability to Cassandra’s releases. The problems > >>> tick/tock was meant to solve are real problems, but tick/tock doesn’t > >> seem > >>> to be addressing them – new features invalidate old testing, which > makes > >> it > >>> difficult/impossible for real users to sit on the 3.odd versions. > >>> > >>> We’re due for cutting 3.9 and 3.0.9, and we have limited RE manpower to > >>> get those out. Only after those are out would I be +1 on a 3.5.1, and > >> then > >>> only because if I were running 3.5, and I hit this bug, I wouldn’t want > >> to > >>> spend the ~$100k it would cost my organization to validate 3.7 prior to > >>> upgrading, and I don’t think it’s reasonable to ask users to recompile > a > >>> release for a ~10 line fix for a very nasty bug. > >>> > >>> I’m also very strongly recommend we (committers/PMC) reconsider > tick/tock > >>> for 4.x releases, because this is exactly the type of problem that will > >>> continue to happen as we move forward. I suggest that we either need to > >> go > >>> back to the old model and do a better job of dealing with feature creep > >> and > >>> testing, or we need to better define what gets backported, because the > >>> community needs a stable version to run, and running latest odd release > >> of > >>> tick/tock isn’t it. > >>> > >>> - Jeff > >>> > >>> > >>> On 9/15/16, 10:31 AM, "dave_les...@apple.com on behalf of Dave > Lester" < > >>> dave_les...@apple.com> wrote: > >>> > >>>> How would cutting a 3.5.1 release possibly confuse users of the > >> software? > >>> It would be easy to document the change and to send release notes. > >>>> > >>>> Given the bug’s critical nature and that it's a minor fix, I’m +1 > >>> (non-binding) to a new release. > >>>> > >>>> Dave > >>>> > >>>>> On Sep 15, 2016, at 7:18 AM, Jeremiah D Jordan <https://urldefense. > >>> proofpoint.com/v2/url?u=http-3A__jeremiah.jordan-40gmail. > com&d=DQIFaQ&c= > >>> 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r= > >>> yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m= > >>> srNzKwrs8hKPoJMZ4Ao18CYaMYKnbWaCHou6ui5tqdM&s=iM_ > >>> LKKIhaiC0w6uz3lhK1lob4gJbKhLPqGNfPPLye6w&e= > wrote: > >>>>> > >>>>> I’m with Jeff on this, 3.7 (bug fixes on 3.6) has already been > >> released > >>> with the fix. Since the fix applies cleanly anyone is free to put it > on > >>> top of 3.5 on their own if they like, but I see no reason to put out a > >>> 3.5.1 right now and confuse people further. > >>>>> > >>>>> -Jeremiah > >>>>> > >>>>> > >>>>>> On Sep 15, 2016, at 9:07 AM, Jonathan Haddad <j...@jonhaddad.com> > >>> wrote: > >>>>>> > >>>>>> As I follow up, I suppose I'm only advocating for a fix to the odd > >>>>>> releases. Sadly, Tick Tock versioning is misleading. > >>>>>> > >>>>>> If tick tock were to continue (and I'm very much against how it > >>> currently > >>>>>> works) the whole even-features odd-fixes thing needs to stop ASAP, > >> all > >>> it > >>>>>> does it confuse people. > >>>>>> > >>>>>> The follow up to 3.4 (3.5) should have been 3.4.1, following semver, > >> so > >>>>>> people know it's bug fixes only to 3.4. > >>>>>> > >>>>>> Jon > >>>>>> > >>>>>> On Wed, Sep 14, 2016 at 10:37 PM Jonathan Haddad <j...@jonhaddad.com > > > >>> wrote: > >>>>>> > >>>>>>> In this particular case, I'd say adding a bug fix release for every > >>>>>>> version that's affected would be the right thing. The issue is so > >>> easily > >>>>>>> reproducible and will likely result in massive data loss for anyone > >>> on 3.X > >>>>>>> WHERE X < 6 and uses the "date" type. > >>>>>>> > >>>>>>> This is how easy it is to reproduce: > >>>>>>> > >>>>>>> 1. Start Cassandra 3.5 > >>>>>>> 2. create KEYSPACE test WITH replication = {'class': > >> 'SimpleStrategy', > >>>>>>> 'replication_factor': 1}; > >>>>>>> 3. use test; > >>>>>>> 4. create table fail (id int primary key, d date); > >>>>>>> 5. delete d from fail where id = 1; > >>>>>>> 6. Stop Cassandra > >>>>>>> 7. Start Cassandra > >>>>>>> > >>>>>>> You will get this, and startup will fail: > >>>>>>> > >>>>>>> ERROR 05:32:09 Exiting due to error while processing commit log > >> during > >>>>>>> initialization. > >>>>>>> org.apache.cassandra.db.commitlog.CommitLogReplayer$ > >>> CommitLogReplayException: > >>>>>>> Unexpected error deserializing mutation; saved to > >>>>>>> /var/folders/0l/g2p6cnyd5kx_1wkl83nd3y4r0000gn/T/ > >>> mutation6313332720566971713dat. > >>>>>>> This may be caused by replaying a mutation against a table with the > >>> same > >>>>>>> name but incompatible schema. Exception follows: > >>>>>>> org.apache.cassandra.serializers.MarshalException: Expected 4 byte > >>> long for > >>>>>>> date (0) > >>>>>>> > >>>>>>> I mean.. come on. It's an easy fix. It cleanly merges against 3.5 > >>> (and > >>>>>>> probably the other releases) and requires very little investment > >> from > >>>>>>> anyone. > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Sep 14, 2016 at 9:40 PM Jeff Jirsa < > >>> jeff.ji...@crowdstrike.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> We did 3.1.1 and 3.2.1, so there’s SOME precedent for emergency > >>> fixes, > >>>>>>>> but we certainly didn’t/won’t go back and cut new releases from > >> every > >>>>>>>> branch for every critical bug in future releases, so I think we > >> need > >>> to > >>>>>>>> draw the line somewhere. If it’s fixed in 3.7 and 3.0.x (x >= 6), > >> it > >>> seems > >>>>>>>> like you’ve got options (either stay on the tick and go up to 3.7, > >>> or bail > >>>>>>>> down to 3.0.x) > >>>>>>>> > >>>>>>>> Perhaps, though, this highlights the fact that tick/tock may not > be > >>> the > >>>>>>>> best option long term. We’ve tried it for a year, perhaps we > should > >>> instead > >>>>>>>> discuss whether or not it should continue, or if there’s another > >>> process > >>>>>>>> that gives us a better way to get useful patches into versions > >>> people are > >>>>>>>> willing to run in production. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 9/14/16, 8:55 PM, "Jonathan Haddad" <j...@jonhaddad.com> wrote: > >>>>>>>> > >>>>>>>>> Common sense is what prevents someone from upgrading to yet > >> another > >>>>>>>>> completely unknown version with new features which have probably > >>> broken > >>>>>>>>> even more stuff that nobody is aware of. The folks I'm helping > >>> right > >>>>>>>>> deployed 3.5 when they got started because > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__ > >>> cassandra.apache.org&d=DQIBaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kq > >>> hAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m= > >>> MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=pLP3udocOcAG6k_ > >>> sAb9p8tcAhtOhpFm6JB7owGhPQEs&e= > >>>>>>>> suggests > >>>>>>>>> it's acceptable for production. It turns out using 4 of the > built > >>> in > >>>>>>>>> datatypes of the database result in the server being unable to > >>> restart > >>>>>>>>> without clearing out the commit logs and running a repair. That > >>> screams > >>>>>>>>> critical to me. You shouldn't even be able to install 3.5 > without > >>> the > >>>>>>>>> patch I've supplied - that bug is a ticking time bomb for anyone > >>> that > >>>>>>>>> installs it. > >>>>>>>>> > >>>>>>>>> On Wed, Sep 14, 2016 at 8:12 PM Michael Shuler < > >>> mich...@pbandjelly.org> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> What's preventing the use of the 3.6 or 3.7 releases where this > >>> bug is > >>>>>>>>>> already fixed? This is also fixed in the 3.0.6/7/8 releases. > >>>>>>>>>> > >>>>>>>>>> Michael > >>>>>>>>>> > >>>>>>>>>> On 09/14/2016 08:30 PM, Jonathan Haddad wrote: > >>>>>>>>>>> Unfortunately CASSANDRA-11618 was fixed in 3.6 but was not back > >>>>>>>> ported to > >>>>>>>>>>> 3.5 as well, and it makes Cassandra effectively unusable if > >>> someone > >>>>>>>> is > >>>>>>>>>>> using any of the 4 types affected in any of their schema. > >>>>>>>>>>> > >>>>>>>>>>> I have cherry picked & merged the patch back to here and will > >> put > >>> it > >>>>>>>> in a > >>>>>>>>>>> JIRA as well tonight, I just wanted to get the ball rolling > asap > >>> on > >>>>>>>> this. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github. > >>> com_rustyrazorblade_cassandra_tree_fix-5Fcommitlog- > >> 5Fexception&d=DQIBaQ&c= > >>> 08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r= > >>> yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m= > >>> MZ9nLcNNhQZkuXyH0NBbP1kSEE2M-SYgyVqZ88IJcXY&s=ktY5tkT- > >>> nO1jtyc0EicbgZHXJYl03DvzuxqzyyOgzII&e= > >>>>>>>>>>> > >>>>>>>>>>> Jon > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >>>> > >>> > >> > > > > > > > > -- > > http://twitter.com/tjake > >