Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

Jason Brown Sun, 20 Nov 2016 06:51:05 -0800

Hey all,

One of the goals on my team, when working on large patches, is to get
community feedback on these initiatives before throwing them into prod.
This gets us a wider net of feedback (see Sylvain's continuing excellent
rounds of feedback to my work on CASSANDRA-8457), as well as making sure we
don't go too far off the deep end in terms of straying from the community
version. The latter point is crucial because if we make too many
incompatible changes to, for example, the internode messaging protocol or
the CQL protocol or the sstable file format, and deploy that, it may be
very difficult, if not impossible, to rectify with future, in-development
versions of cassandra.


We fully intend to "engineer and test the snot out of" the changes we are
working on as the whole point of us working on them is so we *can* run them
in production, at our scale. We aren't expecting others in the community to
dog food it for us. There will be a delay between committing something
upstream, and us backporting it to a current version we run in production
and actually deploying it. However, you can be sure that any bugs we find
will be fixed ASAP; we have many users counting on it.

Thanks for listening,

-Jason


On Sat, Nov 19, 2016 at 11:04 AM, Blake Eggleston <[email protected]>
wrote:

> I think Ed's just using gossip 2.0 as a hypothetical example. His point is
> that we should only commit things when we have a high degree of confidence
> that they work correctly, not with the expectation that they don't.
>
>
> On November 19, 2016 at 10:52:38 AM, Michael Kjellman (
> [email protected]) wrote:
>
> Jason has asked for review and feedback many times. Maybe be constructive
> and review his code instead of just complaining (once again)?
>
> Sent from my iPhone
>
> > On Nov 19, 2016, at 1:49 PM, Edward Capriolo <[email protected]>
> wrote:
> >
> > I would say start with a mindset like 'people will run this in
> production'
> > not like 'why would you expect this to work'.
> >
> > Now how does this logic effect feature develement? Maybe use gossip 2.0
> as
> > an example.
> >
> > I will play my given debby downer role. I could imagine 1 or 2 dtests and
> > the logic of 'dont expect it to work' unleash 4.0 onto hords of nubes
> with
> > twitter announce of the release let bugs trickle in.
> >
> > One could also do something comprehensive like test on clusters of 2 to
> > 1000 nodes. Test with jepsen to see what happens during partitions,
> inject
> > things like jvm pauses and account for behaivor. Log convergence times
> > after given events.
> >
> > Take a stand and say look "we engineered and beat the crap out of this
> > feature. I deployed this release feature at my company and eat my
> dogfood.
> > You are not my crash test dummy."
> >
> >
> >> On Saturday, November 19, 2016, Jeff Jirsa <[email protected]> wrote:
> >>
> >> Any proposal to solve the problem you describe?
> >>
> >> --
> >> Jeff Jirsa
> >>
> >>
> >>> On Nov 19, 2016, at 8:50 AM, Edward Capriolo <[email protected]
> >> <;>> wrote:
> >>>
> >>> This is especially relevant if people wish to focus on removing things.
> >>>
> >>> For example, gossip 2.0 sounds great, but seems geared toward huge
> >> clusters
> >>> which is not likely a majority of users. For those with a 20 node
> cluster
> >>> are the indirect benefits woth it?
> >>>
> >>> Also there seems to be a first push to remove things like compact
> storage
> >>> or thrift. Fine great. But what is the realistic update path for
> someone.
> >>> If the big players are running 2.1 and maintaining backports, the
> average
> >>> shop without a dedicated team is going to be stuck saying (great
> features
> >>> in 4.0 that improve performance, i would probably switch but its not
> >> stable
> >>> and we have that one compact storage cf and who knows what is going to
> >>> happen performance wise when)
> >>>
> >>> We really need to lose this realease wont be stable for 6 minor
> versions
> >>> concept.
> >>>
> >>> On Saturday, November 19, 2016, Edward Capriolo <[email protected]
> >> <;>>
> >>> wrote:
> >>>
> >>>>
> >>>>
> >>>> On Friday, November 18, 2016, Jeff Jirsa <[email protected]
> >> <;>
> >>>> <_e(%7B%7D,'cvml','[email protected] <;>');>>
> >> wrote:
> >>>>
> >>>>> We should assume that we’re ditching tick/tock. I’ll post a thread on
> >>>>> 4.0-and-beyond here in a few minutes.
> >>>>>
> >>>>> The advantage of a prod release every 6 months is fewer incentive to
> >> push
> >>>>> unfinished work into a release.
> >>>>> The disadvantage of a prod release every 6 months is then we either
> >> have
> >>>>> a very short lifespan per-release, or we have to maintain lots of
> >> active
> >>>>> releases.
> >>>>>
> >>>>> 2.1 has been out for over 2 years, and a lot of people (including us)
> >> are
> >>>>> running it in prod – if we have a release every 6 months, that means
> >> we’d
> >>>>> be supporting 4+ releases at a time, just to keep parity with what we
> >> have
> >>>>> now? Maybe that’s ok, if we’re very selective about ‘support’ for 2+
> >> year
> >>>>> old branches.
> >>>>>
> >>>>>
> >>>>> On 11/18/16, 3:10 PM, "[email protected] <;> on behalf
> >> of Blake
> >>>>> Eggleston" <[email protected] <;>> wrote:
> >>>>>
> >>>>>>> While stability is important if we push back large "core" changes
> >>>>> until later we're just setting ourselves up to face the same issues
> >> later on
> >>>>>>
> >>>>>> In theory, yes. In practice, when incomplete features are earmarked
> >> for
> >>>>> a certain release, those features are often rushed out, and not
> always
> >>>>> fully baked.
> >>>>>>
> >>>>>> In any case, I don’t think it makes sense to spend too much time
> >>>>> planning what goes into 4.0, and what goes into the next major
> release
> >> with
> >>>>> so many release strategy related decisions still up in the air. Are
> we
> >>>>> going to ditch tick-tock? If so, what will it’s replacement look
> like?
> >>>>> Specifically, when will the next “production” release happen? Without
> >>>>> knowing that, it's hard to say if something should go in 4.0, or 4.5,
> >> or
> >>>>> 5.0, or whatever.
> >>>>>>
> >>>>>> The reason I suggested a production release every 6 months is
> because
> >>>>> (in my mind) it’s frequent enough that people won’t be tempted to
> rush
> >>>>> features to hit a given release, but not so frequent that it’s not
> >>>>> practical to support. It wouldn’t be the end of the world if some of
> >> these
> >>>>> tickets didn’t make it into 4.0, because 4.5 would fine.
> >>>>>>
> >>>>>> On November 18, 2016 at 1:57:21 PM, kurt Greaves (
> >> [email protected] <;>)
> >>>>> wrote:
> >>>>>>
> >>>>>>> On 18 November 2016 at 18:25, Jason Brown <[email protected]
> >> <;>> wrote:
> >>>>>>>
> >>>>>>> #11559 (enhanced node representation) - decided it's *not*
> something
> >> we
> >>>>>>> need wrt #7544 storage port configurable per node, so we are
> punting
> >> on
> >>>>>>>
> >>>>>>
> >>>>>> #12344 - Forward writes to replacement node with same address during
> >>>>> replace
> >>>>>> depends on #11559. To be honest I'd say #12344 is pretty important,
> >>>>>> otherwise it makes it difficult to replace nodes without potentially
> >>>>>> requiring client code/configuration changes. It would be nice to get
> >>>>> #12344
> >>>>>> in for 4.0. It's marked as an improvement but I'd consider it a bug
> >> and
> >>>>>> thus think it could be included in a later minor release.
> >>>>>>
> >>>>>> Introducing all of these in a single release seems pretty risky. I
> >> think
> >>>>> it
> >>>>>>> would be safer to spread these out over a few 4.x releases (as
> >> they’re
> >>>>>>> finished) and give them time to stabilize before including them in
> an
> >>>>> LTS
> >>>>>>> release. The downside would be having to maintain backwards
> >>>>> compatibility
> >>>>>>> across the 4.x versions, but that seems preferable to delaying the
> >>>>> release
> >>>>>>> of 4.0 to include these, and having another big bang release.
> >>>>>>
> >>>>>>
> >>>>>> I don't think anyone expects 4.0.0 to be stable. It's a major
> version
> >>>>>> change with lots of new features; in the production world people
> don't
> >>>>>> normally move to a new major version until it has been out for quite
> >> some
> >>>>>> time and several minor releases have passed. Really, most people are
> >> only
> >>>>>> migrating to 3.0.x now. While stability is important if we push back
> >>>>> large
> >>>>>> "core" changes until later we're just setting ourselves up to face
> the
> >>>>> same
> >>>>>> issues later on. There should be enough uptake on the early releases
> >> of
> >>>>> 4.0
> >>>>>> from new users to help test and get it to a production-ready state.
> >>>>>>
> >>>>>>
> >>>>>> Kurt Greaves
> >>>>>> [email protected] <;>
> >>>>>
> >>>>>
> >>>> I don't think anyone expects 4.0.0 to be stable
> >>>>
> >>>> Someone previously described 3.0 as the "break everything release".
> >>>>
> >>>> We know that many people are still 2.1 and 3.0. Cassandra will always
> be
> >>>> maintaining 3 or 4 active branches and have adoption issues if
> releases
> >> are
> >>>> not stable and usable.
> >>>>
> >>>> Being that cassandra was 1.0 years ago I expect things to be stable.
> >> Half
> >>>> working features , or added this broke that are not appealing to me.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Sorry this was sent from mobile. Will do less grammar and spell check
> >> than
> >>>> usual.
> >>>>
> >>>
> >>>
> >>> --
> >>> Sorry this was sent from mobile. Will do less grammar and spell check
> >> than
> >>> usual.
> >>
> >
> >
> > --
> > Sorry this was sent from mobile. Will do less grammar and spell check
> than
> > usual.
>

Re: Summary of 4.0 Large Features/Breaking Changes (Was: Rough roadmap for 4.0)

Reply via email to