Hello guys, After running a 16-thread sync-random-write test against qcow2, It is observed that QCOW2 seems to be serializing all its metadata-related writes. If qcow2 is designed to do this,* then what is the concern?* What would go wrong if this ordering is relaxed? By providing less features, raw-file and QED scales well on parallel I/O workload. I believe qcow2 does this with clear reasons. Thanks!
Here the qcow2 image is plugged in /dev/nbd0 via qemu-nbd. The underlying device is a spinning disk with cfq scheduler on Linux 3.16.1. Two pieces of the trace: (The comments are based on my estimation and guess. Please correct me if I misunderstood their behavior) Seen from the trace below, the requests are issued by several different threads. On top of the nbd0 the writes are completely unordered. Passing through the qemu-nbd and qcow2 image file, we see mostly serialized requests. ====QCOW2 8,32 7 4054 30.087620855 21061 *D WS 4096* + 128 [qemu-nbd] ----metadata? 8,32 7 4055 30.087626023 21061 *D WS 363008* + 624 [qemu-nbd] ----data? It get issued in parallel by chance. 8,32 7 4056 30.087992341 0 C WS 4096 + 128 [0] 8,32 7 4057 30.089205833 0 C WS 363008 + 624 [0] 8,32 7 4058 30.089264151 21061 Q FWS [qemu-nbd] 8,32 7 4059 30.089265478 21061 G FWS [qemu-nbd] 8,32 7 4060 30.089266386 21061 I FWS [qemu-nbd] ----Flush (Q-> G ->I ->C) 8,32 7 4061 30.102978117 0 C WS 0 [0] 8,32 4 4930 30.103082669 21058 *D WS 363632* + 16 [qemu-nbd] ----In very rare cases we can see two writes on 6-digit-sector# being issued in parallel. not this one! 8,32 0 4655 30.103243164 0 C WS 363632 + 16 [0] 8,32 4 4931 30.103261463 21058 Q FWS [qemu-nbd] 8,32 4 4932 30.103263349 21058 G FWS [qemu-nbd] 8,32 4 4933 30.103264326 21058 I FWS [qemu-nbd] 8,32 2 3772 30.103266142 21010 Q FWS [qemu-nbd] 8,32 2 3773 30.103268936 21010 G FWS [qemu-nbd] 8,32 2 3774 30.103270612 21010 I FWS [qemu-nbd] 8,32 3 3717 30.111390919 0 C WS 0 [0] 8,32 4 4934 30.129806741 0 C WS 0 [0] 8,32 6 4407 30.129880842 21062 Q FWS [qemu-nbd] 8,32 6 4408 30.129882728 21062 G FWS [qemu-nbd] 8,32 6 4409 30.129884125 21062 I FWS [qemu-nbd] 8,32 5 4807 30.130019058 0 C WS 0 [0] 8,32 5 4808 30.130033376 0 *D WS 1280* + 128 [swapper/0] ----This one looks like a metadata write. 8,32 3 3718 30.130417014 20895 C WS 1280 + 128 [0] 8,32 7 4062 30.130442436 20925 Q FWS [qemu-nbd] 8,32 7 4063 30.130450258 20925 G FWS [qemu-nbd] 8,32 7 4064 30.130451166 20925 I FWS [qemu-nbd] 8,32 6 4410 30.133539827 0 C WS 0 [0] 8,32 4 4935 30.133609250 20892 Q FWS [qemu-nbd] 8,32 4 4936 30.133625662 20892 G FWS [qemu-nbd] 8,32 4 4937 30.133626710 20892 I FWS [qemu-nbd] 8,32 7 4065 30.133758570 0 C WS 0 [0] 8,32 6 4411 30.133773516 21008 *D WS 2048* + 128 [qemu-nbd] 8,32 6 4412 30.134165396 0 C WS 2048 + 128 [0] 8,32 6 4413 30.134191167 21008 Q FWS [qemu-nbd] 8,32 6 4414 30.134192285 21008 G FWS [qemu-nbd] 8,32 6 4415 30.134193193 21008 I FWS [qemu-nbd] 8,32 4 4938 30.136255117 0 C WS 0 [0] 8,32 1 4780 30.136316368 21057 Q FWS [qemu-nbd] 8,32 1 4781 30.136318743 21057 G FWS [qemu-nbd] 8,32 1 4782 30.136320069 21057 I FWS [qemu-nbd] 8,32 5 4809 30.136467435 20891 C WS 0 [0] ==== On the raw partition things happen as I expected, the writes are issued in parallel. ==== raw partition 8,32 0 269 5.998464860 21154 D WS 335548672 + 128 [fio] 8,32 3 391 5.998474708 21146 D WS 67113216 + 128 [fio] 8,32 7 243 5.998483857 21159 D WS 503320832 + 128 [fio] 8,32 5 506 5.998494264 21149 D WS 167776512 + 128 [fio] 8,32 2 339 5.998509489 21156 D WS 402657536 + 128 [fio] 8,32 6 879 5.998522968 21158 D WS 469766400 + 128 [fio] 8,32 1 497 5.998537286 21151 D WS 234885376 + 128 [fio] 8,32 5 507 5.998553908 21144 D WS 4352 + 128 [fio] 8,32 2 340 5.998562568 21155 D WS 369103104 + 128 [fio] 8,32 6 880 5.998571159 21150 D WS 201330944 + 128 [fio] 8,32 5 508 5.998591064 21147 D WS 100667648 + 128 [fio] 8,32 2 341 5.998603635 21152 D WS 268439808 + 128 [fio] 8,32 6 881 5.998610410 21153 D WS 301994240 + 128 [fio] 8,32 6 882 5.998640860 21157 D WS 436211968 + 128 [fio] 8,32 2 342 5.998650429 21148 D WS 134222080 + 128 [fio] 8,32 7 244 5.998825870 0 C WS 33558784 + 128 [0] 8,32 7 245 5.998848638 21145 Q FWS [fio] 8,32 7 246 5.998850175 21145 G FWS [fio] 8,32 7 247 5.998851153 21145 I FWS [fio] 8,32 0 270 5.999112918 0 C WS 335548672 + 128 [0] 8,32 0 271 5.999142600 21154 Q FWS [fio] 8,32 0 272 5.999144137 21154 G FWS [fio] 8,32 0 273 5.999145045 21154 I FWS [fio] 8,32 3 392 5.999388302 0 C WS 67113216 + 128 [0] .... -- Cheers! 吴兴博 Wu, Xingbo <wux...@gmail.com>