On 06-Jul-2016 09:13:25 AM, Steven Rostedt wrote: > On Tue, 5 Jul 2016 21:50:34 +0000 (UTC) > Mathieu Desnoyers <mathieu.desnoy...@efficios.com> wrote: > > > > > > >> + > > >> + TP_PROTO(struct task_struct *tsk), > > >> + > > >> + TP_ARGS(tsk), > > >> + > > >> + TP_STRUCT__entry( > > >> + __array( char, comm, TASK_COMM_LEN ) > > > > > > I could imagine this being a high frequency tracepoint, especially with > > > a lot of boosting going on. Can we nuke the comm recording and let the > > > userspace tools just hook to the sched_switch tracepoint for that? > > > > We can surely do that. > > > > Just to clarify: currently this tracepoint is *not* hooked on PI boosting, > > as described in the changelog. This tracepoint is about the prio attributes > > set by user-space. The PI boosting temporarily changes the task struct prio > > without updating the associated policy, which seems rather > > implementation-specific and odd to expose. > > > > Thoughts ? > > Ah, you're right, I was thinking it was at boosting. But still, it's a > rather hefty tracepoint (lots of fields), probably want to keep from > adding comm too.
Yes, I agree we can remove the comm field, it is easy to get from the previous sched_switch. > > >> + __field( pid_t, pid ) > > >> + __field( unsigned int, policy ) > > >> + __field( int, nice ) > > >> + __field( unsigned int, rt_priority ) > > >> + __field( u64, dl_runtime ) > > >> + __field( u64, dl_deadline ) > > >> + __field( u64, dl_period ) > > >> + ), > > >> + > > >> + TP_fast_assign( > > >> + memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN); > > >> + __entry->pid = tsk->pid; > > >> + __entry->policy = tsk->policy; > > >> + __entry->nice = task_nice(tsk); > > >> + __entry->rt_priority = tsk->rt_priority; > > >> + __entry->dl_runtime = tsk->dl.dl_runtime; > > >> + __entry->dl_deadline = tsk->dl.dl_deadline; > > >> + __entry->dl_period = tsk->dl.dl_period; > > >> + ), > > >> + > > >> + TP_printk("comm=%s pid=%d, policy=%s, nice=%d, rt_priority=%u, " > > >> + "dl_runtime=%Lu, dl_deadline=%Lu, > > >> dl_period=%Lu", > > >> + __entry->comm, __entry->pid, > > >> + __print_symbolic(__entry->policy, > > >> SCHEDULING_POLICY), > > >> + __entry->nice, __entry->rt_priority, > > >> + __entry->dl_runtime, __entry->dl_deadline, > > >> + __entry->dl_period) > > >> +); > > >> #endif /* _TRACE_SCHED_H */ > > >> > > >> /* This part must be outside protection */ > > >> diff --git a/kernel/fork.c b/kernel/fork.c > > >> index 7926993..ac4294a 100644 > > >> --- a/kernel/fork.c > > >> +++ b/kernel/fork.c > > >> @@ -1773,6 +1773,7 @@ long _do_fork(unsigned long clone_flags, > > >> struct pid *pid; > > >> > > >> trace_sched_process_fork(current, p); > > >> + trace_sched_prio_update(p); > > From the change log: > > "It is emitted in the code path of the sched_setscheduler, > sched_setattr, sched_setparam, nice and the fork system calls. For fork, it > is emitted > after the sched_process_fork tracepoint for timeline consistency and > because the PID is not yet set when sched_fork() is called." > > I'm not convinced this should be needed. I hate adding back to back > tracepoints. Indeed, having two tracepoints back to back is not pretty. We placed it here to get the priority of the newly created threads. Maybe a more appropriate way of doing that would be to extend the sched_process_fork tracepoint to output the same scheduling informations. Would you prefer that option ? Thanks, Julien