On Wed, Apr 16, 2025 at 4:40 PM T.J. Mercier <tjmerc...@google.com> wrote: > > On Wed, Apr 16, 2025 at 4:08 PM Song Liu <s...@kernel.org> wrote: > > > > On Wed, Apr 16, 2025 at 3:51 PM T.J. Mercier <tjmerc...@google.com> wrote: > > [...] > > > > > > > > IIUC, the iterator simply traverses elements in a linked list. I feel > > > > it is > > > > an overkill to implement a new BPF iterator for it. > > > > > > Like other BPF iterators such as kmem_cache_iter or task_iter. > > > Cgroup_iter iterates trees instead of lists. This is iterating over > > > kernel objects just like the docs say, "A BPF iterator is a type of > > > BPF program that allows users to iterate over specific types of kernel > > > objects". More complicated iteration should not be a requirement here. > > > > > > > Maybe we simply > > > > use debugging tools like crash or drgn for this? The access with > > > > these tools will not be protected by the mutex. But from my personal > > > > experience, this is not a big issue for user space debugging tools. > > > > > > drgn is *way* too slow, and even if it weren't the dependencies for > > > running it aren't available. crash needs debug symbols which also > > > aren't available on user builds. This is not just for manual > > > debugging, it's for reporting memory use in production. Or anything > > > else someone might care to extract like attachment info or refcounts. > > > > Could you please share more information about the use cases and > > the time constraint here, and why drgn is too slow. Is most of the delay > > comes from parsing DWARF? This is mostly for my curiosity, because > > I have been thinking about using drgn to do some monitoring in > > production. > > > > Thanks, > > Song > > These RunCommands have 10 second timeouts for example. It's rare that > I see them get exceeded but it happens occasionally.: > https://cs.android.com/android/platform/superproject/main/+/main:frameworks/native/cmds/dumpstate/dumpstate.cpp;drc=98bdc04b7658fde0a99403fc052d1d18e7d48ea6;l=2008
Thanks for sharing this information. > The last time I used drgn (admittedly back in 2023) it took over a > minute to iterate through less than 200 cgroups. I'm not sure what the > root cause of the slowness was, but I'd expect the DWARF processing to > be done up-front once and the slowness I experienced was not just at > startup. Eventually I switched over to tracefs for that issue, which > we still use for some telemetry. I haven't tried drgn on Android. On server side, iterating should 200 cgroups should be fairly fast (< 5 seconds, where DWARF parsing is the most expensive part). > Other uses are by statsd for telemetry, memory reporting on app kills > or death, and for "dumpsys meminfo". Here is another rookie question, it appears to me there is a file descriptor associated with each DMA buffer, can we achieve the same goal with a task-file iterator? Thanks, Song