On Tue, 30 Jun 2026 11:01:56 +0100 David Laight <[email protected]> wrote:
> > It's been a long time since I worked on this, but IIRC, it was to keep > > the pressure down on the TLB when tracing. It updates at every > > sched_switch that has a trace event occurring so, I likely used normal > > pages which are part of the huge pages the kernel sets up and doesn't > > affect the TLB as much. vmalloc does have impact on the TLB pressure, > > and tracing should always try to avoid that. > > Isn't this a cache so that the pid numbers can be converted to strings > when the trace is read out after the actual process has exited? > That does mean that cache doesn't need to be updated on every trace > request - it might be enough to just save on process exit and lookup the > pid itself for running processes (the whole thing relies on pids not > being reused). Yes it's a cache but it only gets filled when needed. That is, after a trace event occurred. Tracing is very commonly used with filtering, where events can be seldom triggered. What is in the saved_cmdlines file should only be tasks that were running when a trace occurred. Now what we could do is add a flag to the task struct and only set that when tracing happens. Only tasks that exit would be saved in this array. The other tasks could be queried via iterating the tasks and reporting any task with this bit set. > > > > > > map_pid_to_cmdline[] is 64k*sizeof(int) so the whole structure > > > expands to 512k with about 64k/20 (about 3200) pid entries even > > > though the default is 128. > > > > That's because it is not dynamic. That array needs to be able to hold > > most PIDs. The default is 128 but it will expand to how much it can > > hold to allocate the full map_pid_to_cmdline. The real default for 4098 > > page sized architectures is 6552 entries. > > That is double my 'quick calculation' - but both are a lot of entries. I got the number by looking at saved_cmdlines_size after boot ;-) > > > > AFAICT there is only one copy of the data - so it could be static. > > > Perhaps with pointers to map_pid_cmdline[] and (after this patch) > > > pid_comm[], both of which could be separately resized. > > > > map_pid_t_cmdline[] is to hold the PID_MAX_DEFAULT amount of PIDs to > > avoid collisions. I wouldn't resize it. > > If comm[] is only saved on process exit you'd likely get away with far > fewer entries - getting collisions for processes that have exited is > rather unlikely. > (I wonder if I could make that work.) > > Does that memory get allocated at boot time? > 512k is a lot to allocated for a feature that won't usually be used. > OTOH you won't reliably get that much contiguous memory later on. > Deferring to a later time (maybe as late as the first tracing_on()) > might be more reasonable - but that would have to use vmalloc(). > > I'm also not sure about the code that lets you trace from boot. > That must be able to initialise early - but I'm not sure how early. Well tracing can start before init, so pretty early. -- Steve
