We have precedent for changing defaults that have near-universal positive impact in patchlevel releases, yep.
disk_access_mode: auto -> mmap_index_only comes to mind. - Scott > On Nov 16, 2022, at 6:49 PM, Derek Chen-Becker <de...@chen-becker.org> wrote: > > I'm fine with not including G1 in 4.1, but would we consider inclusion > for 4.1.X down the road once validation has been done? > > Derek > > >> On Wed, Nov 16, 2022 at 4:39 PM David Capwell <dcapw...@apple.com> wrote: >> >> Getting poked in Slack to be more explicit in this thread… >> >> Switching to G1 on trunk, +1 >> Switching to G1 on 4.1, -1. 4.1 is about to be released and this isn’t a >> bug fix but a perf improvement ticket and as such should go through >> validation that the perf improvements are seen, there is not enough time >> left for that added performance work burden so strongly feel it should be >> pushed to 4.2/5.0 where it has plenty of time to be validated against. The >> ticket even asks to avoid validating the claims; saying 'Hoping we can skip >> due diligence on this ticket because the data is "in the past” already”'. >> Others have attempted both shenandoah and ZGC and found mixed results, so >> nothing leads me to believe that won’t be true here either. >> >>>> On Nov 16, 2022, at 9:15 AM, J. D. Jordan <jeremiah.jor...@gmail.com> >>>> wrote: >>> >>> Heap - >>> +1 for G1 in trunk >>> +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I >>> understand pushback against changing this so late in the game. >>> >>> Memtable - >>> -1 for off heap in 4.1. I think this needs more testing and isn’t something >>> to change at the last minute. >>> +1 for running performance/fuzz tests against the alternate memtable >>> choices in trunk and switching if they don’t show regressions. >>> >>>> On Nov 16, 2022, at 10:48 AM, Josh McKenzie <jmcken...@apache.org> wrote: >>>> >>>> >>>> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to >>>> prioritize digging into G1's behavior on small heaps vs. CMS w/our default >>>> tuning sooner rather than later. With that info I'd likely be a strong +1 >>>> on the shift. >>>> >>>> -1 on switching to offheap_objects for 4.1 RC; again, think this is just a >>>> small step away from being a +1 w/some more rigor around seeing the >>>> current state of the technology's intersections. >>>> >>>> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote: >>>>> All right. I’ll clarify then. >>>>> >>>>> -0 on switching the default to G1 *this late* just before RC1. >>>>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it >>>>> in principle, for 4.2, after we run some more test and resolve the >>>>> concerns raised by Jeff. >>>>> >>>>> Let’s please try to avoid this kind of super late defaults switch going >>>>> forward? >>>>> >>>>> — >>>>> AY >>>>> >>>>>> On 16 Nov 2022, at 03:27, Derek Chen-Becker <de...@chen-becker.org> >>>>>> wrote: >>>>>> >>>>>> For the record, I'm +100 on G1. Take it with whatever sized grain of >>>>>> salt you think appropriate for a relative newcomer to the list, but >>>>>> I've spent my last 7-8 years dealing with the intersection of >>>>>> high-throughput, low latency systems and their interaction with GC and >>>>>> in my personal experience G1 outperforms CMS in all cases and with >>>>>> significantly less work (zero work, in many cases). The only things >>>>>> I've seen perform better *with a similar heap footprint* are GenShen >>>>>> (currently experimental) and Rust (beyond the scope of this topic). >>>>>> >>>>>> Derek >>>>>> >>>>>> On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad <rustyrazorbl...@apache.org> >>>>>> wrote: >>>>>>> >>>>>>> I'm curious what it would take for folks to be OK with merging this >>>>>>> into 4.1? How much additional time would you want to feel comfortable? >>>>>>> >>>>>>> I should probably have been a little more vigorous in my +1 of Mick's >>>>>>> PR. For a little background - I worked on several hundred clusters >>>>>>> while at TLP, mostly dealing with stability and performance issues. A >>>>>>> lot of them stemmed partially or wholly from the GC settings we ship in >>>>>>> the project. Par New with CMS and small new gen results in a lot of >>>>>>> premature promotion leading to high pause times into the hundreds of ms >>>>>>> which pushes p99 latency through the roof. >>>>>>> >>>>>>> I'm a big +1 in favor of G1 because it's not just better for most >>>>>>> people but it's better for _every_ new Cassandra user. The first >>>>>>> experience that people have with the project is important, and our >>>>>>> current GC settings are quite bad - so bad they lead to problems with >>>>>>> stability in production. The G1 settings are mostly hands off, result >>>>>>> in shorter pause times and are a big improvement over the status quo. >>>>>>> >>>>>>> Most folks don't do GC tuning, they use what we supply, and what we >>>>>>> currently supply leads to a poor initial experience with the database. >>>>>>> I think we owe the community our best effort even if it means pushing >>>>>>> the release back little bit. >>>>>>> >>>>>>> Just for some additional context, we're (Netflix) running 25K nodes on >>>>>>> G1 across a variety of hardware in AWS with wildly varying workloads, >>>>>>> and I haven't seen G1 be the root cause of a problem even once. The >>>>>>> settings that Mick is proposing are almost identical to what we use (we >>>>>>> use half of heap up to 30GB). >>>>>>> >>>>>>> I'd really appreciate it if we took a second to consider the community >>>>>>> effect of another release that ships settings that cause significant >>>>>>> pain for our users. >>>>>>> >>>>>>> Jon >>>>>>> >>>>>>> On 2022/11/10 21:49:36 Mick Semb Wever wrote: >>>>>>>>> >>>>>>>>> In case of GC, reasonably extensive performance testing should be the >>>>>>>>> expectations. Potentially revisiting some of the G1 params for the 4.1 >>>>>>>>> reality - quite a lot has changed since those optional defaults where >>>>>>>>> picked. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I've put our battle-tested g1 opts (from consultants at TLP and >>>>>>>> DataStax) >>>>>>>> in the patch for CASSANDRA-18027 >>>>>>>> >>>>>>>> In reality it is really not much of a change, g1 does make it simple. >>>>>>>> Picking the correct ParallelGCThreads and ConcGCThreads and the floor >>>>>>>> to >>>>>>>> the new heap (XX:NewSize) is still required, though we could do a much >>>>>>>> better job of dynamic defaults to them. >>>>>>>> >>>>>>>> Alex Dejanovski's blog is a starting point: >>>>>>>> https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html >>>>>>>> where this gc opt set was used (though it doesn't prove why those >>>>>>>> options >>>>>>>> are chosen) >>>>>>>> >>>>>>>> The bar for objection to sneaking these into 4.1 was intended to be >>>>>>>> low, >>>>>>>> and I stand by those that raise concerns. >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> +---------------------------------------------------------------+ >>>>>> | Derek Chen-Becker | >>>>>> | GPG Key available at https://keybase.io/dchenbecker and | >>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >>>>>> +---------------------------------------------------------------+ >>>>> >>>>> >>>> >> > > > -- > +---------------------------------------------------------------+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---------------------------------------------------------------+