> The fact o3 used "Bus-factor" as a dimension is just amazing. 
Yeah - that got me too.

On Fri, Jun 13, 2025, at 2:38 PM, Jon Haddad wrote:
> I'd be very happy to see async-profiler included with C*  I've made extensive 
> use of it in my performance evaluations [1][2], and even posted a video about 
> it [3] for general Java perf analysis (among others).  It's part of 
> easy-cass-lab and is easily the most informative tool I've found for the 
> getting to the bottom of anything performance related.
> 
> There's probably a good case to be made for including it with the C* artifact 
> as well as having it be something you can drop in. I lean towards including 
> it all the time, but I haven't run it this way myself yet, so there might be 
> some downside I'm unaware of.
> 
> When you call the asprof executable, it attaches the async-profiler to the 
> running jvm using jattach [4].  We could do this as well, if we wanted to 
> avoid including it with the release, but I don't know how much we really 
> benefit from that.  I've run into issues with it when it's unable to detatch 
> correctly, then you're unable to reattach it until after the server is 
> restarted.  On the flip side, I don't know if you're able to set up all the 
> same options for arbitrary profiling when it's loaded as an agent and turned 
> on/off dynamically.  I think we can, based on the integration page [6], but I 
> haven't tried it yet.  It would be a bummer if we only had a single mode of 
> profiling available.  
> 
> The default mode, CPU profiling, is fantastic, but I've also made extensive 
> use of allocation profiling [5] to identify perf issues as well so having 
> that available is a must, imo. Wall clock / off cpu profiling is great for 
> identifying when IO is the root cause, which isn't clearly revealed by on-cpu 
> profiling due to the way threads are scheduled.  When I look at a system I 
> typically do CPU / Wall / Alloc / Off-CPU to be thorough, and the last thing 
> you want to do is have to restart between each one.  You can also specify 
> specific Java methods, include or exclude frames matching specific regex, and 
> a whole slew of other options.  The latest version even supports continuous 
> profiling with heatmaps although I haven't tried it yet.  
> 
> So hopefully the option we go with allows all of that, otherwise the limits 
> would impose more of a headache to me as I'd need to remove it and continue 
> to bring my own.
> 
> Under the hood, the async-profiler uses Linux perf events + asynchronous 
> polling of the java stack to match them up and generate it's reports.  As a 
> result, it requires certain permissions to run and get all the details I 
> like.  Specifically these kernel parameters:
> 
> sudo sysctl kernel.perf_event_paranoid=1
> sudo sysctl kernel.kptr_restrict=0
> 
> You also need to enable some capabilities for off-cpu profiliing:
> 
> sudo find /usr/lib/jvm/ -type f -name 'java' -exec setcap 
> "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" {} \;
> 
> Then you can do off-cpu with this wild cryptic version (shout out to Andrei 
> Pangin for helping me with this [7]):
> 
> asprof -e kprobe:schedule -i 2 --cstack dwarf -X '*Unsafe.park*' "${@:2}" $PID
> 
> There's also some subtle issues when it's run in a container, since by 
> default you don't have access to the perf_event_open syscall.  Just something 
> to keep in mind.  This is one of my main grievances with container 
> deployments.
> 
> Indeed Patrick, I am very happy to see this discussion!  Thanks Doug for 
> starting the thread.
> 
> Jon
> 
> [1] https://issues.apache.org/jira/browse/CASSANDRA-15452
> [2] https://issues.apache.org/jira/browse/CASSANDRA-19477
> [3] 
> https://www.youtube.com/watch?v=yNZtnzjyJRI&t=212s&pp=ygUOYXN5bmMgcHJvZmlsZXI%3D
> [4] 
> https://github.com/async-profiler/async-profiler/blob/2b556680dc8f5d02c3f26ac119d835dc2381e604/src/jattach/jattach_hotspot.c#L38
> [5] https://issues.apache.org/jira/browse/CASSANDRA-20428
> [6] 
> https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md
> [7] https://github.com/async-profiler/async-profiler/issues/907
> 
> 
> On Fri, Jun 13, 2025 at 10:18 AM Patrick McFadin <pmcfa...@gmail.com> wrote:
>> The fact o3 used "Bus-factor" as a dimension is just amazing. 
>> 
>> After reading more about the project, the possibilities are pretty 
>> interesting. I suspect we'll see this in a Haddad talk soon. 
>> 
>> On Fri, Jun 13, 2025 at 1:57 AM Josh McKenzie <jmcken...@apache.org> wrote:
>>> __
>>> I was curious if o3 (model from OpenAI) would be able to do a deep dive 
>>> health check on a repo to assist in considering taking it as a dependency. 
>>> The results can be found here: 
>>> https://chatgpt.com/share/684be703-1d4c-8002-b831-f997f829f4b4
>>> 
>>> Apparently it can, and can do it quite well. This was a useful time saver 
>>> (and honestly did a better job than I usually can in > 10x the time)
>>> 
>>> I'm +1 to taking this as a dependency on the lib in core C*. The rest of 
>>> the ecosystem can consume it (more easily if we move to a cassandra-shared 
>>> regime shared library build as well), and it opens up some interesting 
>>> opportunities for us in both how we test core C* proper and what we expose 
>>> in tooling.
>>> 
>>> On Thu, Jun 12, 2025, at 7:36 PM, Paulo Motta wrote:
>>>> I'd prefer to avoid calling an external process and use the library if 
>>>> possible. Not sure about including it in the project by default, but also 
>>>> not against.
>>>> 
>>>> If there's contention about including it, I wonder if it would make sense 
>>>> to explore  java's optional module extension[1] to make this available 
>>>> optionally ? I can see this being useful for other extensions if we 
>>>> haven't explored that option.
>>>> 
>>>> Then we could have another project cassandra-sidecar-extensions (or 
>>>> similar) that would be linked by sidecar/advanced operators to enable 
>>>> extended featureset in the main process.
>>>> 
>>>> 
>>>> [1] - 
>>>> https://openjdk.org/projects/jigsaw/doc/topics/optional.html
>>>> 
>>>> On Thu, 12 Jun 2025 at 17:57 Doug Rohrer <droh...@apple.com> wrote:
>>>>> Hey folks!
>>>>> 
>>>>> We're looking into enabling the sidecar to collect async profiles from 
>>>>> Cassandra and, digging through the async-profiler code and usage, it 
>>>>> seems like there may be a few different ways to do it. I’m curious if 
>>>>> other folks have already done this beyond just “run asprof with the pid 
>>>>> of the Cassandra process”, as I’m a bit hesitant to depend on executing 
>>>>> an external process from the Sidecar to gather the actual profile if we 
>>>>> can avoid it.
>>>>> 
>>>>> There seem to be some opportunities to integrate the profiler into 
>>>>> another project (see 
>>>>> https://github.com/async-profiler/async-profiler/blob/master/docs/IntegratingAsyncProfiler.md#using-java-api)
>>>>>  but it seems this would end up having to be part of Cassandra, and 
>>>>> somehow callable via the sidecar (JMX? Some virtual table interface where 
>>>>> you insert a row to start a profile with the profiler options, and it 
>>>>> kicks off the profile, dumping the results into the table when it’s 
>>>>> done?).
>>>>> 
>>>>> The benefit in putting this functionality into Cassandra would be that 
>>>>> other consumers (in-jvm dtests, python dtests, other monitoring systems 
>>>>> where Sidecar isn’t available, easy-cass-lab) would be able to leverage 
>>>>> the same interface rather than having to re-invent the wheel each time.
>>>>> 
>>>>> Drawback is it’s another library, and one with native library 
>>>>> dependencies, added to the class path and loaded at runtime.
>>>>> 
>>>>> Thoughts? Previous experiences (good or bad)?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Doug
>>> 

Reply via email to