On Fri, Oct 1, 2010 at 4:04 PM, Venkateswararao Jujjuri (JV) <jv...@linux.vnet.ibm.com> wrote: > On 10/1/2010 6:38 AM, Ryan Harper wrote: >> >> * Stefan Hajnoczi<stefa...@gmail.com> [2010-10-01 03:48]: >>> >>> On Thu, Sep 30, 2010 at 8:19 PM, Venkateswararao Jujjuri (JV) >>> <jv...@linux.vnet.ibm.com> wrote: >>>> >>>> On 9/30/2010 2:13 AM, Stefan Hajnoczi wrote: >>>>> >>>>> On Thu, Sep 30, 2010 at 1:50 AM, Venkateswararao Jujjuri (JV) >>>>> <jv...@linux.vnet.ibm.com> wrote: >>>>>> >>>>>> Code: Mainline QEMU (git://git.qemu.org/qemu.git) >>>>>> Machine: LS21 blade. >>>>>> Disk: Local disk through VirtIO. >>>>>> Did not select any cache option. Defaulting to writethrough. >>>>>> >>>>>> Command tested: >>>>>> 3 parallel instances of : dd if=/dev/zero of=/pmnt/my_pw bs=4k >>>>>> count=100000 >>>>>> >>>>>> QEMU with smp=1 >>>>>> 19.3 MB/s + 19.2 MB/s + 18.6 MB/s = 57.1 MB/s >>>>>> >>>>>> QEMU with smp=4 >>>>>> 15.3 MB/s + 14.1 MB/s + 13.6 MB/s = 43.0 MB/s >>>>>> >>>>>> Is this expected? >>>>> >>>>> Did you configure with --enable-io-thread? >>>> >>>> Yes I did. >>>>> >>>>> Also, try using dd oflag=direct to eliminate effects introduced by the >>>>> guest page cache and really hit the disk. >>>> >>>> With oflag=direct , I see no difference and the throughput is so slow >>>> and I >>>> would not >>>> expect to see any difference. >>>> It is 225 kb/s for each thread either with smp=1 or with smp=4. >>> >>> If I understand correctly you are getting: >>> >>> QEMU oflag=direct with smp=1 >>> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s >>> >>> QEMU oflag=direct with smp=4 >>> 225 KB/s + 225 KB/s + 225 KB/s = 675 KB/s >>> >>> This suggests the degradation for smp=4 is guest kernel page cache or >>> buffered I/O related. Perhaps lockholder preemption? >> >> or just a single spindle maxed out because the blade hard drive doesn't >> have writecache enabled (it's disabled by default). > > Yes, I am sure we are hitting the max limit on the blade local disk. > Question is why the smp=4 degraded the performance in the cached mode. > > I am running latest kernel from upstream on the guest(2.6.36-rc5)..and using > block IO. > Do we have any know issues in there which could explain performance > degradation?
I suggested that lockholder preemption might be the issue. If you check /proc/lock_stat in a guest debug kernel after seeing poor performance, do the lock statistics look suspicious (very long hold times)? Stefan