Shailabh wrote: > Sends a separate "registration" message with cpumask to listen to. > Kernel stores (real) pid and cpumask.
Question: ========= Ah - good. So this means that I could configure a system with a fork/exit intensive, performance critical job on some dedicated CPUs, and be able to collect taskstat data from tasks exiting on the -other- CPUS, while avoiding collecting data from this special job, thus avoiding any taskstat collection performance impact on said job. If I'm understanding this correctly, excellent. Caveat: ======= Passing cpumasks across the kernel-user boundary can be tricky. Historically, Unix has a long tradition of boloxing up the passing of variable length data types across the kernel-user boundary. We've got perhaps a half dozen ways of getting these masks out of the kernel, and three ways of getting them (or the similar nodemasks) back into the kernel. The three ways being used in the sched_setaffinity system call, the mbind and set_mempolicy system calls, and the cpuset file system. All three of these ways have their controversial details: * The kernel cpumask mask size needed for sched_setaffinity calls is not trivially available to userland. * The nodemask bit size is off by one in the mbind and set_mempolicy calls. * The CPU and Node masks are ascii, not binary, in the cpuset calls. One option that might make sense for these task stat registrations would be to: 1) make the kernel/sched.c get_user_cpu_mask() routine generic, moving it to non-static lib/*.c code, and 2) provide a sensible way for user space to query the size of the kernel cpumask (and perhaps nodemask while you're at it.) Currently, the best way I know for user space to query the kernels cpumask and nodemask size is to examine the length of the ascii string values labeled "Cpus_allowed:" and "Mems_allowed:" in the file /proc/self/status. These ascii strings always require exactly nine ascii chars to express each 32 bits of kernel mask code, if you include in the count the trailing ',' comma or '\n' newline after each eight ascii character word. Probing /proc/self/status fields for these mask sizes is rather unobvious and indirect, and requires caching the result if you care at all about performance. Userland code in support of your taskstat facility might be better served by a more obvious way to size cpumasks. ... unless of course you're inclined to pass cpumasks formatted as ascii strings, in which case speak up, as I'd be delighted to throw in my 2 cents on how to do that ;). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html