The reply below is way too long, but oh well. The summary is: I am 100% certain that my ExprCount() loop is CPU bound. It is unfortunate that time gives inaccurate information about this. It would be nice if the documentation acknowledged that in order to save future generations from confusion. I think my test shows this quite effectively and I have verified the CPU bound nature of the test in many ways. While your results are different, they also show 'time' failing to recognize that the loop is CPU bound. I believe that your system is showing consistent behavior with mine -- the scheduler is working correctly, but 'time' is being misleading.
You seem convinced that the task is not CPU bound on my machine, but I'm convinced that it is. The main reason that I believe this is that if it is not CPU bound -- if there are significant delays between when bash and expr are running -- then that is a pri-0 bug in the Linux kernel. This would have massive implications for the efficiency of web servers and other systems that depend on Linux. Therefore a high burden of proof is needed to justify that claim. The only evidence I saw that my system was not CPU bound when running my test script was 'time'. That's it. On the other side, as evidence that it is CPU bound (i.e.; one CPU core's worth of CPU time being consumed per second) I have the following evidence: - It should be, and if it isn't it is a serious kernel bug. - "mpstat -P -ALL 1" shows 99.7% idle when the script is not running, and this drops to ~86.8% idle when it is running. That's a difference of 12.9% which represents ~one core on my eight core home system. - "iostat -c 1" shows identical results - 'top' shows idle time of ~99.7% and ~86.8% in the summary section at the top, depending on whether the test is running. However in the list of processes it only shows counttest.sh, using 15% of a core. That means that top is internally inconsistent -- it fails to show where the other 85% of a core's time is being spent. This is (IIRC) a known weakness with top that short-lived processes don't show up in the process list. - The scheduler graph that is shown on my blog post clearly shows what is happening. You can see the full version of it here: http://www.cygnus-software.com/images/ZoomScreenshot_croppedbig.png It shows bash (in red) running for fractions of a millisecond, and each expr instance running about 0.5 ms. The OS is able to switch between processes faster than every 24 ms because when expr ends the operating system *knows* that bash was waiting for it so it immediately wakes up bash. The full scheduler doesn't have to run -- it just wakes up whoever was waiting. That is a crucial feature of the Windows and Linux kernels. - xosview shows the load at 12-13% when running ExprCount(), and 0.1% when not. - qps only showed four of my eight CPUs and I couldn't see a summary of total CPU usage. I'm not sure what scheduler information it was supposed to show. - "perf stat counttest.sh ExprTest" shows 0.984 CPUs utilized So, *lots* of evidence that the bash process is CPU bound on my system. And, this is correct and proper. The scheduler in my kernel is working correctly. Here's what kernel I'm running. I have a 64-bit 3.5 kernel at work -- same behavior. Note that 'top' is potentially confusing because the units of the global CPU percentages and the process CPU percentages are different. The global CPU percentage is usage/numcores whereas the process percentage is usage of-one-core. It takes eight processes at 100% to get the global percentage on an eight-core system to 100%. > At full load, my system shows 1200% cpu a measured by the standard definition of CPU seconds / real time. On my version of 'top' (3.2.8) it does not use that definition *in*its*summary*area. It uses CPU seconds / (real time * num-cores). So if I have all eight cores fully loaded *then* it shows 100% CPU utilization. In the individual process area it uses your definition, but at the top it shows the percentage of total-available-CPU-power. > But 'top' never shows over 20% usage Where? In the total of user and system at the top, or in the per-process section? Also, you said 'time' was showing your test using 71.48% CPU and then you said that 'top' was showing 20% -- that's a heck of an anomaly. > If your kernel is spending 80% of it's time in the scheduler when there are no tasks being scheduled, I'd say you have a problem. It's not. It's spending 80% of its time in process creation. This is shown clearly in the screenshot, and in the sampling profiler data that I examined. > You can control that with cpuaffinity. Yes, I know. I mention that technique in my blog post. I also mention that it is generally a bad idea, unless your machine is devoted to a single task. sched_latency_ns on my system is 24000000, just as on yours. $uname -a Linux brucedglados2 3.2.0-38-generic-pae #60-Ubuntu SMP Wed Feb 13 13:47:26 UTC 2013 i686 i686 i386 GNU/Linux I tried grabbing /proc/config.gz but that file doesn't exist. Blog post is here: http://randomascii.wordpress.com/2013/03/18/counting-to-ten-on-linux/ > If that latency of 24 is how long before the scheduler wakes up and switches tasks No. Definitely not. The 24 ms scheduler latency is for rescheduling of long running tasks. It does not apply to the case where a process is waiting on some event, such as a child process dying. I am 100% positive about this, on Linux and on Windows. Both operating systems can, if conditions demand it, schedule threads/processes hundreds of times in 24 ms. I look at process scheduling data on Windows all the time and if Linux was that much less efficient then Steve Ballmer would be laughing maniacally and Linux wouldn't be used on servers. > Note.. you said your loop took 2 seconds on your 3.2GHz sys. Yes, but that was then my system was running at 1.1 GHz. When my system ramps up to its full speed then the 1,000 iterations are done in less than a second. Our systems are quite consistent there. -----Original Message----- From: Linda Walsh [mailto:b...@tlinx.org] Sent: Tuesday, March 19, 2013 3:41 PM To: Bruce Dawson Cc: 'bug-bash' Subject: Re: Bug/limitation in 'time' (kernel setings?)... Linda Walsh wrote: > > Bruce Dawson wrote: >> How many CPUs do you have? 20% usage doesn't mean anything without >> knowing that. On a five-core system 20% usage would imply one full core. --- Another tool to check out if you have it is 'qps'... it can show the time slice/schedule interval ... I can see it scheduling 24ms/ intervals for bash when it runs a separate proc/loop. It sounds a bit sketchy to me why it would take 76ms for it to be scheduled for another run, but ... that's about what it measures out it in real time. The fact that your kernel doesn't show you that ... I don't know what to say, other than it's how your kernel is configured. Now looking in my /proc/sys/kernel I see sched_latency_ns = 24000000 (24 ms). I also see the scheduler doesn't wakup all the time: but about every 4 ms. (I inserted the commas): proc/sys/kernel# for i in sched*;do > echo $i: $(<$i) > done sched_autogroup_enabled: 1 sched_child_runs_first: 0 sched_domain: sched_latency_ns: 24,000,000 sched_migration_cost_ns: 500,000 sched_min_granularity_ns: 3,000,000 sched_nr_migrate: 32 sched_rt_period_us: 1,000,000 sched_rt_runtime_us: 950,000 sched_shares_window_ns: 10,000,000 sched_time_avg_ms: 1000 sched_tunable_scaling: 1 sched_wakeup_granularity_ns: 4,000,000 So if the child runs first = 0, the parent keeps running at first, If that latency of 24 is how long before the scheduler wakes up and switches tasks, that 24 ms wasted, then same for child. That's about a 33% duty cycle there -- which can likely be tweaked by tweaking the above numbers...