On 8/14/2011 2:14 AM, Dion Kant wrote: > On 08/13/2011 03:55 PM, Stan Hoeppner wrote: >> My explanation to you wasn't fully correct. I confused specifying no >> block size with specifying an insanely large block size. The other post >> I was referring to dealt with people using a 1GB (or larger) block size >> because it made the math easier for them when wanting to write a large >> test file. > Ok, that makes sense. It is the large specified bs which make the dd is > going to buffer the data first.
Yep, it's really dramatic on machines with low memory due to swapping. When I first tested this phenomenon with a 1GB dd block size on my machine with only 384 MB RAM and a 1GB swap partition, it took many minutes to complete, vs tens of seconds using a 4kB block size. Almost all of the 2GB of test was being pushed into swap, then being read from swap and written to the file-- swap and file on the same physical disk. This is one of the reasons I keep this old machine around--problems of this nature show up more quickly and are more easily identified. >> Instead of dividing their total file size by 4096 and using the result >> for "bs=4096 count=X" (which is the proper method I described to you) >> they were simply specifying, for example, "bs=2G count=1" to write a 2 >> GB test file. Doing this causes the massive buffering I described, and >> consequently, horrible performance, typically by a factor of 10 or more, >> depending on the specific system. >> >> The horrible performance with bs=512 is likely due to the LVM block size >> being 4096, and forcing block writes that are 1/8th normal size, causing >> lots of merging. If you divide 120MB/s by 8 you get 15MB/s, which IIRC >> from your original post, is approximately the write performance you were >> seeing, which was 19MB/s. > Recall that I took LVM out of the loop already. So now I am doing the > experiment with writing data straight to the block device. In my case > /dev/sdb4. (If writing on the block device level does not perform, how > will LVM be able to perform?) > > Inspired by your advice, I did some more investigation on this. I wrote > a small test program, i.e. taking dd out of the loop as well. It writes > 1 GB test data with increasing block sizes directly to /dev/sdb4. Here > are some results: > > root@dom0-2:~# ./bw > Writing 1 GB > bs time rate > (bytes) (s) (MiB/s) > 1 85.476 12.5619 > 2 33.016 32.5218 > 4 23.6675 45.3679 > 8 20.112 53.3881 > 16 18.76 57.2356 > 32 17.872 60.0795 > 64 17.636 60.8834 > 128 17.096 62.8064 > 256 17.188 62.4704 > 512 16.8482 63.7303 > 1024 57.6053 18.6396 > 2048 57.94 18.532 > 4096 17.016 63.1019 > 8192 16.604 64.6675 > 16384 16.452 65.2649 > 32768 17.132 62.6748 > 65536 16.256 66.052 > 131072 16.44 65.3127 > 262144 16.264 66.0194 > 524288 16.388 65.5199 The dips at 1024 & 2048 are strange, but not entirely unexpected. > The good and problematic block sizes do not really coincide with the > ones I observe with dd, but the odd behaviour is there. There are some > magic block sizes {1,1024, 2048} which cause a drop in performance. > Looking at vmstat output at the same time I see unexpected bi and the > interrupt rate goes sky high. > > In my case it is the ahci driver handling the writes. Here is the vmstat > trace belonging to the bs=1 write and I add some more observations below: Yeah, every platform will have quirks. > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy > id wa > 0 0 0 6379780 23820 112616 0 0 0 0 78 82 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 77 80 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 79 80 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 78 82 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 76 80 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 77 83 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 75 80 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 82 82 0 0 > 100 0 > 0 0 0 6379780 23820 112616 0 0 0 0 90 93 0 0 > 100 0 > 1 0 0 6376796 27132 112524 0 0 828 4 400 531 0 0 > 100 0 > 1 0 0 6346416 57496 112560 0 0 7590 0 2408 3877 5 0 > 92 3 > 1 0 0 6315788 88048 112580 0 0 7638 0 2435 3903 7 0 > 90 3 > 1 1 0 6284416 118548 112540 0 0 7624 0 2428 3903 6 0 > 91 3 > 1 0 0 6253168 148896 112564 0 0 7586 0 2403 3875 6 0 > 91 3 > 1 0 0 6221920 179284 112484 0 0 7596 0 2414 3884 5 0 > 93 2 > 1 0 0 6190672 209648 112540 0 0 7590 0 2417 3877 5 0 > 93 2 > 0 1 0 6160540 239796 112536 0 0 7540 0 6240 3851 6 0 > 76 18 > 0 1 0 6129540 269952 112584 0 0 7538 0 6255 3856 6 0 > 86 8 > 1 0 0 6098292 300116 112504 0 0 7540 0 6233 3853 5 0 > 89 6 > 1 0 0 6067540 330280 112552 0 0 7538 0 6196 3857 6 0 > 87 7 > 1 0 0 6036540 360452 112536 0 0 7542 0 6281 3868 5 0 > 89 6 > 1 0 0 6005540 390608 112464 0 0 7540 0 6268 3856 6 0 > 85 8 > 1 0 0 5974292 420788 112516 0 0 7542 0 6246 3865 6 0 > 86 7 > 1 0 0 5943416 450952 112444 0 0 7540 0 6253 3860 5 0 > 88 6 > 1 0 0 5912540 481128 112488 0 0 7546 0 6226 3861 6 0 > 86 7 > 1 0 0 5881292 511300 112472 0 0 7540 0 6225 3860 5 0 > 89 6 > 1 0 0 5850292 541456 112464 0 0 7538 0 6192 3858 6 0 > 86 7 > 0 2 0 5817664 570260 112516 0 0 7200 40706 5990 4820 6 0 > 81 13 > 0 2 0 5789268 597752 112472 0 0 6870 0 5775 5251 5 0 > 80 15 > 1 1 0 5760996 625164 112676 0 0 6854 8192 5795 5248 5 0 > 73 21 > 1 1 0 5732476 653232 112572 0 0 7014 8192 5285 5362 5 0 > 82 13 > 1 1 0 5704080 680924 112676 0 0 6922 0 2340 5290 3 0 > 92 5 > 1 1 0 5674504 709444 112540 0 0 7130 8192 2404 5469 5 0 > 71 24 > 1 1 0 5646184 737144 112484 0 0 6924 0 2320 5293 5 0 > 85 10 > 1 1 0 5617460 765004 112484 0 0 6966 8192 5844 5329 5 0 > 75 20 > 2 2 0 5588264 793288 112500 0 0 7068 8192 5313 5404 4 0 > 85 11 > 1 1 0 5559556 821084 112628 0 0 6948 0 2326 5309 8 0 > 78 14 > 1 1 0 5530468 849304 112476 0 0 7054 8192 2374 5395 5 0 > 75 20 > 1 1 0 5501892 876956 112464 0 0 6912 8192 2321 5285 5 0 > 85 10 > 0 2 0 5472936 905044 112584 0 0 7024 0 5889 5370 5 0 > 70 25 > 0 2 0 5444476 933096 112596 0 0 7010 8192 5874 5360 4 0 > 82 13 > 0 2 0 5415520 960924 112476 0 0 6960 0 5841 5323 6 0 > 70 24 > 1 1 0 5386580 989096 112696 0 0 7038 8192 5282 5384 6 0 > 69 25 > 2 2 0 5357624 1017164 112688 0 0 7016 0 2358 5362 4 > 0 89 7 > 1 1 0 5328428 1045280 112580 0 0 7028 8192 2356 5379 5 > 0 80 15 > 0 2 0 5296688 1072396 112540 0 0 6778 50068 2314 5194 0 > 0 99 1 > 0 2 0 5297044 1072396 112616 0 0 0 64520 317 176 0 > 0 75 24 > 0 2 0 5297044 1072396 112616 0 0 0 64520 310 175 0 > 0 77 23 > 0 2 0 5297044 1072396 112616 0 0 0 64520 300 161 0 > 0 85 15 > 0 2 0 5297052 1072396 112616 0 0 0 72204 317 180 0 > 0 77 22 > 0 2 0 5297052 1072396 112616 0 0 0 64520 307 170 0 > 0 84 16 > 0 1 0 5300540 1072396 112616 0 0 0 21310 309 203 0 > 0 98 2 > 1 0 0 6351440 52252 112680 0 0 54 25 688 343 1 1 > 63 35 > 1 0 0 6269720 133036 112600 0 0 0 0 575 88 7 0 > 93 0 > 1 0 0 6186516 213812 112560 0 0 0 0 568 83 9 0 > 91 0 > 1 0 0 6103560 294588 112512 0 0 0 0 569 85 6 0 > 94 0 > 1 0 0 6020852 375428 112688 0 0 0 0 571 84 9 0 > 90 0 > 1 0 0 5937896 456244 112664 0 0 0 0 571 86 7 0 > 93 0 > > Writing to /dev/sdb4 starts when there is a step in the interrupt > column. As long as the interrupts are high there is bi related to this > writing. After initial buffering there is a first write to the disk at > 40MB/s averaged over 2 seconds. Then only a couple of 8MB/s writes and > in the mean time the (kernel) buffer is growing up to 1072396 kB. Then > the driver starts writing at expected rates and the interrupt rate goes > down to a reasonable level. It is only at the end of the write that the > ahci driver gives back its buffer memory. After this, when the interrupt > rate goes to a level of about 570, the ahci driver is swallowing the > second write iteration with a block size of 2 bytes. What do you see when you insert large delays between iterations, or run each iteration after clearing the baffles, i.e. $ echo 3 > /proc/sys/vm/drop_caches > Here is the code fragment responsible for writing and measuring: > > sync(); > gettimeofday(&tstart, &tz); > for (int i=0; i<Ntot/N; ++i) > sdb4->write(buf,N); > sdb4->flush(); > sdb4->close(); > sync(); > gettimeofday(&tstop, &tz); > > N is the block size and sdb4 is > > ofstream* sdb4 = new ofstream("/dev/sdb4", ofstream::binary); > > > I think Stan is right that this may be something in the ahci kernel driver. Could be. Could just need tweaking, say queue_depth, elevator, etc. Did you test with all 3 elevators or just one? > I have some 3ware controller laying around. I might repeat the > experiments with this and post them here if someone is interested. If it doesn't have the hardware write cache enabled you will likely see worse performance than with the current drive/controller. -- Stan -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/4e47b7eb.9080...@hardwarefreak.com