First guess is the increased memory pressure caused by the Lustre 1.8 read cache. Many times "slow" messages are caused by memory allocatons taking a long time.
You could try disabling the read cache and see if that clears up the slow messages. Kevin On Apr 20, 2011, at 4:29 AM, James Rose <[email protected]> wrote: > Hi > > We have been experiencing degraded performance for a few days on a > fresh install of lustre 1.8.5 (on RHEL5 using sun ext4 rpms). The > initial bulk load of the data will be fine but once in use for a > while writes become very slow to individual ost. This will block io > for a few minutes and then carry on as normal. The slow writes will > then move to another ost. This can be seen in iostat and many slow > IO messages will be seen in the logs (example included) > > The osts are between 87 90 % full. Not ideal but has not caused any > issues running 1.6.7.2 on the same hardware. > > The osts are RAID6 on external raid chassis (Infortrend). Each ost > is 5.4T (small). The server is Dual AMD (4 cores). 16G Ram. Qlogic > FC HBA. > > I mounted the osts as ldiskfs and tried a few write tests. These > also show the same behaviour. > > While the write operation is blocked there will be hundreds of read > tps and a very small kb/s read from the raid but now writes. As > soon as this completes writes will go through at a more expected > speed. > > Any idea what is going on? > > Many thanks > > James. > > Example error messages: > > Apr 20 04:53:04 oss5r-mgmt kernel: LustreError: dumping log to /tmp/ > lustre-log.1303271584.3935 > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow quota > init 286s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > start 39s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 39 previous > similar messages > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > brw_start 39s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 38 previous > similar messages > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > start 133s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous > similar messages > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > brw_start 133s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous > similar messages > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > start 236s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex > 40s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 2 previous > similar messages > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 6 previous > similar messages > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex > 277s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > direct_io 286s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 3 previous > similar messages > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > start 285s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous > similar message > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > commitrw commit 285s due to heavy IO load > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous > similar message > Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow parent > lock 236s due to heavy IO load > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
