On 03/08/2012 05:20 PM, Pantelis Antoniou wrote:
The current issue is that scheduler development is not easily shared between
developers. Each developer has their own 'itch', be it Android use cases, server
workloads, VM, etc. The risk is high of optimizing for one's own use case and
causing severe degradation on most other use cases.
One way to fix this problem would be the development of a method with which one
could perform a given use-case workload in a host, record the activity in a
interchangeable portable trace format file, and then play it back on another
host via a playback application that will generate an approximately similar load
which was observed during recording.
Have you tried to investigate whether 'perf' tool with 'sched record' and
'sched replay'
features might be useful for such a purpose?
I tried to record and replay the various types of commonly used benchmarks,
including
CPU, I/O and network intensive workloads, and have to say that the recording and
(especially) replaying overhead is quite high, at least for the default Panda
board
configuration (where main I/O is slow due to root file system on SD card).
Simple
things like 'perf sched record sleep 10' works for the most of the cases (but
still
may cause sample loss, up to 10-20%). But, when I tried to add some I/O, for
example,
with 'find /', the total workload becomes too high and the system (almost) hangs
with a lot of messages like:
INFO: task kjournald:512 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: rcu_preempt detected stalls on CPUs/tasks: 8055ec64 0 512 2
0x00000000
INFO: Stall ended before state dump start
INFO: task kjournald:512 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task flush-179:0:511 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task kjournald:512 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Now I'm checking whether it's possible to do some partial recording (by skipping
some kinds of unrelated samples) and offload the kernel tracing subsystem to get
more CPUs time for the user-space tasks.
Do you have any thoughts about this?
Thanks,
Dmitry
_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev