Robert, all, * Robert Haas (robertmh...@gmail.com) wrote: > There is a considerable amount of variation in the amount of time this > takes to run based on how much of the relation is cached. Clearly, > there's no way for the system to cache it all, but it can cache a > significant portion, and that affects the results to no small degree. > dd on hydra prints information on the data transfer rate; on uncached > 1GB segments, it runs at right around 400 MB/s, but that can soar to > upwards of 3GB/s when the relation is fully cached. I tried flushing > the OS cache via echo 1 > /proc/sys/vm/drop_caches, and found that > immediately after doing that, the above command took 5m21s to run - > i.e. ~321000 ms. Most of your test times are faster than that, which > means they reflect some degree of caching. When I immediately reran > the command a second time, it finished in 4m18s the second time, or > ~258000 ms. The rate was the same as the first test - about 400 MB/s > - for most of the files, but 27 of the last 28 files went much faster, > between 1.3 GB/s and 3.7 GB/s.
[...] > With 0 workers, first run took 883465.352 ms, and second run took 295050.106 > ms. > With 8 workers, first run took 340302.250 ms, and second run took 307767.758 > ms. > > This is a confusing result, because you expect parallelism to help > more when the relation is partly cached, and make little or no > difference when it isn't cached. But that's not what happened. These numbers seem to indicate that the oddball is the single-threaded uncached run. If I followed correctly, the uncached 'dd' took 321s, which is relatively close to the uncached-lots-of-workers and the two cached runs. What in the world is the uncached single-thread case doing that it takes an extra 543s, or over twice as long? It's clearly not disk i/o which is causing the slowdown, based on your dd tests. One possibility might be round-trip latency. The multi-threaded case is able to keep the CPUs and the i/o system going, and the cached results don't have as much latency since things are cached, but the single-threaded uncached case going i/o -> cpu -> i/o -> cpu, ends up with a lot of wait time as it switches between being on CPU and waiting on the i/o. Just some thoughts. Thanks, Stephen
signature.asc
Description: Digital signature