On Thu, Sep 30, 2010 at 8:19 PM, Venkateswararao Jujjuri (JV) <jv...@linux.vnet.ibm.com> wrote: > On 9/30/2010 2:13 AM, Stefan Hajnoczi wrote: >> >> On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV) >> <jv...@linux.vnet.ibm.com> wrote: >>> >>> Code: Mainline QEMU (git://git.qemu.org/qemu.git) >>> Machine: LS21 blade. >>> Disk: Local disk through VirtIO. >>> Did not select any cache option. Defaulting to writethrough. >>> >>> Command tested: >>> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k >>> count=100000 >>> >>> QEMU with smp=1 >>> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s >>> >>> QEMU with smp=4 >>> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s >>> >>> Is this expected? >> >> Did you configure with --enable-io-thread? > > Yes I did. >> >> Also, try using dd oflag=direct to eliminate effects introduced by the >> guest page cache and really hit the disk. > > With oflag=direct , I see no difference and the throughput is so slow and I > would not > expect to see any difference. > It is 225 kb/s for each thread either with smp=1 or with smp=4.
If I understand correctly you are getting: QEMU oflag=direct with smp=1 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s QEMU oflag=direct with smp=4 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s This suggests the degradation for smp=4 is guest kernel page cache or buffered I/O related. Perhaps lockholder preemption? Stefan