I think we are in agreement here, but my point is adding a read cache to Lustre (which is feature I rely heavily on) isn't the fundmental problem, it just exposed it. The fundamental problem is with ext[34] performance on larger disks.
Jeremy On Wed, Apr 20, 2011 at 8:57 AM, Kevin Van Maren <[email protected] > wrote: > Yes, difficulty finding free disk space can also be a problem, but I > could not recall big changes in > how that worked since 1.6, other than memory pressure from the read > cache pushing out the bitmaps. > See http://jira.whamcloud.com/browse/LU-15 > > Kevin > > > James Rose wrote: > > Hi Kevin, > > > > Thanks for the suggestion. I will try this out. > > > > For the moment it seems that it may be disk space related. I have > > removed some data from the file system. Performance returned to where I > > would expect it to be as space freed up (currently at 83% full). Since > > free space I have seen two messages on an OSS where the number of > > threads is tuned to the amount of RAM in the host and six on an OSS that > > has the number of threads set higher than it should. This is a much > > better situation than the steady stream I was experiencing last night. > > Maybe disabling the read cache will remove the last few. > > > > I am still very curious what the rapid small reads seen when writing are > > as this showed up while mounted ldiskfs so not doing regular lustre > > operations at all. > > > > Thanks again for your help, > > > > James. > > > > > > On Wed, 2011-04-20 at 08:48 -0300, Kevin Van Maren wrote: > > > >> First guess is the increased memory pressure caused by the Lustre 1.8 > >> read cache. Many times "slow" messages are caused by memory > >> allocatons taking a long time. > >> > >> You could try disabling the read cache and see if that clears up the > >> slow messages. > >> > >> Kevin > >> > >> > >> On Apr 20, 2011, at 4:29 AM, James Rose <[email protected]> > >> wrote: > >> > >> > >>> Hi > >>> > >>> We have been experiencing degraded performance for a few days on a > >>> fresh install of lustre 1.8.5 (on RHEL5 using sun ext4 rpms). The > >>> initial bulk load of the data will be fine but once in use for a > >>> while writes become very slow to individual ost. This will block io > >>> for a few minutes and then carry on as normal. The slow writes will > >>> then move to another ost. This can be seen in iostat and many slow > >>> IO messages will be seen in the logs (example included) > >>> > >>> The osts are between 87 90 % full. Not ideal but has not caused any > >>> issues running 1.6.7.2 on the same hardware. > >>> > >>> The osts are RAID6 on external raid chassis (Infortrend). Each ost > >>> is 5.4T (small). The server is Dual AMD (4 cores). 16G Ram. Qlogic > >>> FC HBA. > >>> > >>> I mounted the osts as ldiskfs and tried a few write tests. These > >>> also show the same behaviour. > >>> > >>> While the write operation is blocked there will be hundreds of read > >>> tps and a very small kb/s read from the raid but now writes. As > >>> soon as this completes writes will go through at a more expected > >>> speed. > >>> > >>> Any idea what is going on? > >>> > >>> Many thanks > >>> > >>> James. > >>> > >>> Example error messages: > >>> > >>> Apr 20 04:53:04 oss5r-mgmt kernel: LustreError: dumping log to /tmp/ > >>> lustre-log.1303271584.3935 > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow quota > >>> init 286s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > >>> start 39s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 39 previous > >>> similar messages > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > >>> brw_start 39s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 38 previous > >>> similar messages > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > >>> start 133s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous > >>> similar messages > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > >>> brw_start 133s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 44 previous > >>> similar messages > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > >>> start 236s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex > >>> 40s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 2 previous > >>> similar messages > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 6 previous > >>> similar messages > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow i_mutex > >>> 277s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > >>> direct_io 286s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 3 previous > >>> similar messages > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow journal > >>> start 285s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous > >>> similar message > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow > >>> commitrw commit 285s due to heavy IO load > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: Skipped 1 previous > >>> similar message > >>> Apr 20 04:53:40 oss5r-mgmt kernel: Lustre: rho-OST0012: slow parent > >>> lock 236s due to heavy IO load > >>> > >>> > >>> _______________________________________________ > >>> Lustre-discuss mailing list > >>> [email protected] > >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >>> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> [email protected] > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > > > > > > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
