Re: Should we change 4.1 to G1 and offheap_objects ?

Josh McKenzie Wed, 16 Nov 2022 08:48:13 -0800

To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to prioritize 
digging into G1's behavior on small heaps vs. CMS w/our default tuning sooner 
rather than later. With that info I'd likely be a strong +1 on the shift.


-1 on switching to offheap_objects for 4.1 RC; again, think this is just a 
small step away from being a +1 w/some more rigor around seeing the current 
state of the technology's intersections.

On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
> All right. I’ll clarify then.
> 
> -0 on switching the default to G1 *this late* just before RC1.
> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it in 
> principle, for 4.2, after we run some more test and resolve the concerns 
> raised by Jeff.
> 
> Let’s please try to avoid this kind of super late defaults switch going 
> forward?
> 
> —
> AY
> 
> > On 16 Nov 2022, at 03:27, Derek Chen-Becker <de...@chen-becker.org> wrote:
> > 
> > For the record, I'm +100 on G1. Take it with whatever sized grain of
> > salt you think appropriate for a relative newcomer to the list, but
> > I've spent my last 7-8 years dealing with the intersection of
> > high-throughput, low latency systems and their interaction with GC and
> > in my personal experience G1 outperforms CMS in all cases and with
> > significantly less work (zero work, in many cases). The only things
> > I've seen perform better *with a similar heap footprint* are GenShen
> > (currently experimental) and Rust (beyond the scope of this topic).
> > 
> > Derek
> > 
> > On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad <rustyrazorbl...@apache.org> 
> > wrote:
> >> 
> >> I'm curious what it would take for folks to be OK with merging this into 
> >> 4.1?  How much additional time would you want to feel comfortable?
> >> 
> >> I should probably have been a little more vigorous in my +1 of Mick's PR.  
> >> For a little background - I worked on several hundred clusters while at 
> >> TLP, mostly dealing with stability and performance issues.  A lot of them 
> >> stemmed partially or wholly from the GC settings we ship in the project. 
> >> Par New with CMS and small new gen results in a lot of premature promotion 
> >> leading to high pause times into the hundreds of ms which pushes p99 
> >> latency through the roof.
> >> 
> >> I'm a big +1 in favor of G1 because it's not just better for most people 
> >> but it's better for _every_ new Cassandra user.  The first experience that 
> >> people have with the project is important, and our current GC settings are 
> >> quite bad - so bad they lead to problems with stability in production.  
> >> The G1 settings are mostly hands off, result in shorter pause times and 
> >> are a big improvement over the status quo.
> >> 
> >> Most folks don't do GC tuning, they use what we supply, and what we 
> >> currently supply leads to a poor initial experience with the database.  I 
> >> think we owe the community our best effort even if it means pushing the 
> >> release back little bit.
> >> 
> >> Just for some additional context, we're (Netflix) running 25K nodes on G1 
> >> across a variety of hardware in AWS with wildly varying workloads, and I 
> >> haven't seen G1 be the root cause of a problem even once.  The settings 
> >> that Mick is proposing are almost identical to what we use (we use half of 
> >> heap up to 30GB).
> >> 
> >> I'd really appreciate it if we took a second to consider the community 
> >> effect of another release that ships settings that cause significant pain 
> >> for our users.
> >> 
> >> Jon
> >> 
> >> On 2022/11/10 21:49:36 Mick Semb Wever wrote:
> >>>> 
> >>>> In case of GC, reasonably extensive performance testing should be the
> >>>> expectations. Potentially revisiting some of the G1 params for the 4.1
> >>>> reality - quite a lot has changed since those optional defaults where
> >>>> picked.
> >>>> 
> >>> 
> >>> 
> >>> I've put our battle-tested g1 opts (from consultants at TLP and DataStax)
> >>> in the patch for CASSANDRA-18027
> >>> 
> >>> In reality it is really not much of a change, g1 does make it simple.
> >>> Picking the correct ParallelGCThreads and ConcGCThreads and the floor to
> >>> the new heap (XX:NewSize) is still required, though we could do a much
> >>> better job of dynamic defaults to them.
> >>> 
> >>> Alex Dejanovski's blog is a starting point:
> >>> https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html
> >>> where this gc opt set was used (though it doesn't prove why those options
> >>> are chosen)
> >>> 
> >>> The bar for objection to sneaking these into 4.1 was intended to be low,
> >>> and I stand by those that raise concerns.
> >>> 
> > 
> > 
> > 
> > -- 
> > +---------------------------------------------------------------+
> > | Derek Chen-Becker                                             |
> > | GPG Key available at https://keybase.io/dchenbecker and       |
> > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> > +---------------------------------------------------------------+
> 
>

Re: Should we change 4.1 to G1 and offheap_objects ?

Reply via email to