Re: Should we change 4.1 to G1 and offheap_objects ?

C. Scott Andreas Wed, 16 Nov 2022 19:02:51 -0800

We have precedent for changing defaults that have near-universal positive 
impact in patchlevel releases, yep.


disk_access_mode: auto -> mmap_index_only comes to mind.

- Scott

> On Nov 16, 2022, at 6:49 PM, Derek Chen-Becker <de...@chen-becker.org> wrote:
> 
> I'm fine with not including G1 in 4.1, but would we consider inclusion
> for 4.1.X down the road once validation has been done?
> 
> Derek
> 
> 
>> On Wed, Nov 16, 2022 at 4:39 PM David Capwell <dcapw...@apple.com> wrote:
>> 
>> Getting poked in Slack to be more explicit in this thread…
>> 
>> Switching to G1 on trunk, +1
>> Switching to G1 on 4.1, -1.  4.1 is about to be released and this isn’t a 
>> bug fix but a perf improvement ticket and as such should go through 
>> validation that the perf improvements are seen, there is not enough time 
>> left for that added performance work burden so strongly feel it should be 
>> pushed to 4.2/5.0 where it has plenty of time to be validated against.  The 
>> ticket even asks to avoid validating the claims; saying 'Hoping we can skip 
>> due diligence on this ticket because the data is "in the past” already”'.  
>> Others have attempted both shenandoah and ZGC and found mixed results, so 
>> nothing leads me to believe that won’t be true here either.
>> 
>>>> On Nov 16, 2022, at 9:15 AM, J. D. Jordan <jeremiah.jor...@gmail.com> 
>>>> wrote:
>>> 
>>> Heap -
>>> +1 for G1 in trunk
>>> +0 for G1 in 4.1 - I think it’s worthwhile and fairly well tested but I 
>>> understand pushback against changing this so late in the game.
>>> 
>>> Memtable -
>>> -1 for off heap in 4.1. I think this needs more testing and isn’t something 
>>> to change at the last minute.
>>> +1 for running performance/fuzz tests against the alternate memtable 
>>> choices in trunk and switching if they don’t show regressions.
>>> 
>>>> On Nov 16, 2022, at 10:48 AM, Josh McKenzie <jmcken...@apache.org> wrote:
>>>> 
>>>> 
>>>> To clarify: -0 here on G1 as default for 4.1 as well; I'd like us to 
>>>> prioritize digging into G1's behavior on small heaps vs. CMS w/our default 
>>>> tuning sooner rather than later. With that info I'd likely be a strong +1 
>>>> on the shift.
>>>> 
>>>> -1 on switching to offheap_objects for 4.1 RC; again, think this is just a 
>>>> small step away from being a +1 w/some more rigor around seeing the 
>>>> current state of the technology's intersections.
>>>> 
>>>> On Wed, Nov 16, 2022, at 7:47 AM, Aleksey Yeshchenko wrote:
>>>>> All right. I’ll clarify then.
>>>>> 
>>>>> -0 on switching the default to G1 *this late* just before RC1.
>>>>> -1 on switching the default offheap_objects *for 4.1 RC1*, but all for it 
>>>>> in principle, for 4.2, after we run some more test and resolve the 
>>>>> concerns raised by Jeff.
>>>>> 
>>>>> Let’s please try to avoid this kind of super late defaults switch going 
>>>>> forward?
>>>>> 
>>>>> —
>>>>> AY
>>>>> 
>>>>>> On 16 Nov 2022, at 03:27, Derek Chen-Becker <de...@chen-becker.org> 
>>>>>> wrote:
>>>>>> 
>>>>>> For the record, I'm +100 on G1. Take it with whatever sized grain of
>>>>>> salt you think appropriate for a relative newcomer to the list, but
>>>>>> I've spent my last 7-8 years dealing with the intersection of
>>>>>> high-throughput, low latency systems and their interaction with GC and
>>>>>> in my personal experience G1 outperforms CMS in all cases and with
>>>>>> significantly less work (zero work, in many cases). The only things
>>>>>> I've seen perform better *with a similar heap footprint* are GenShen
>>>>>> (currently experimental) and Rust (beyond the scope of this topic).
>>>>>> 
>>>>>> Derek
>>>>>> 
>>>>>> On Tue, Nov 15, 2022 at 4:51 PM Jon Haddad <rustyrazorbl...@apache.org> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> I'm curious what it would take for folks to be OK with merging this 
>>>>>>> into 4.1?  How much additional time would you want to feel comfortable?
>>>>>>> 
>>>>>>> I should probably have been a little more vigorous in my +1 of Mick's 
>>>>>>> PR.  For a little background - I worked on several hundred clusters 
>>>>>>> while at TLP, mostly dealing with stability and performance issues.  A 
>>>>>>> lot of them stemmed partially or wholly from the GC settings we ship in 
>>>>>>> the project. Par New with CMS and small new gen results in a lot of 
>>>>>>> premature promotion leading to high pause times into the hundreds of ms 
>>>>>>> which pushes p99 latency through the roof.
>>>>>>> 
>>>>>>> I'm a big +1 in favor of G1 because it's not just better for most 
>>>>>>> people but it's better for _every_ new Cassandra user.  The first 
>>>>>>> experience that people have with the project is important, and our 
>>>>>>> current GC settings are quite bad - so bad they lead to problems with 
>>>>>>> stability in production.  The G1 settings are mostly hands off, result 
>>>>>>> in shorter pause times and are a big improvement over the status quo.
>>>>>>> 
>>>>>>> Most folks don't do GC tuning, they use what we supply, and what we 
>>>>>>> currently supply leads to a poor initial experience with the database.  
>>>>>>> I think we owe the community our best effort even if it means pushing 
>>>>>>> the release back little bit.
>>>>>>> 
>>>>>>> Just for some additional context, we're (Netflix) running 25K nodes on 
>>>>>>> G1 across a variety of hardware in AWS with wildly varying workloads, 
>>>>>>> and I haven't seen G1 be the root cause of a problem even once.  The 
>>>>>>> settings that Mick is proposing are almost identical to what we use (we 
>>>>>>> use half of heap up to 30GB).
>>>>>>> 
>>>>>>> I'd really appreciate it if we took a second to consider the community 
>>>>>>> effect of another release that ships settings that cause significant 
>>>>>>> pain for our users.
>>>>>>> 
>>>>>>> Jon
>>>>>>> 
>>>>>>> On 2022/11/10 21:49:36 Mick Semb Wever wrote:
>>>>>>>>> 
>>>>>>>>> In case of GC, reasonably extensive performance testing should be the
>>>>>>>>> expectations. Potentially revisiting some of the G1 params for the 4.1
>>>>>>>>> reality - quite a lot has changed since those optional defaults where
>>>>>>>>> picked.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I've put our battle-tested g1 opts (from consultants at TLP and 
>>>>>>>> DataStax)
>>>>>>>> in the patch for CASSANDRA-18027
>>>>>>>> 
>>>>>>>> In reality it is really not much of a change, g1 does make it simple.
>>>>>>>> Picking the correct ParallelGCThreads and ConcGCThreads and the floor 
>>>>>>>> to
>>>>>>>> the new heap (XX:NewSize) is still required, though we could do a much
>>>>>>>> better job of dynamic defaults to them.
>>>>>>>> 
>>>>>>>> Alex Dejanovski's blog is a starting point:
>>>>>>>> https://thelastpickle.com/blog/2020/06/29/cassandra_4-0_garbage_collectors_performance_benchmarks.html
>>>>>>>> where this gc opt set was used (though it doesn't prove why those 
>>>>>>>> options
>>>>>>>> are chosen)
>>>>>>>> 
>>>>>>>> The bar for objection to sneaking these into 4.1 was intended to be 
>>>>>>>> low,
>>>>>>>> and I stand by those that raise concerns.
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> +---------------------------------------------------------------+
>>>>>> | Derek Chen-Becker                                             |
>>>>>> | GPG Key available at https://keybase.io/dchenbecker and       |
>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>> +---------------------------------------------------------------+
>>>>> 
>>>>> 
>>>> 
>> 
> 
> 
> --
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+

Re: Should we change 4.1 to G1 and offheap_objects ?

Reply via email to