Re: [ceph-users] slow requests

Győrvári Gábor Fri, 23 May 2014 22:22:17 -0700

Hello,

No i dont see any backfill log in ceph.log during that period, drivesare WD2000FYYZ-01UL1B1 but i did not find any informations in SMART, andyes i will check other drives too.


Could i determine somehow, in which PG placed the file?

Thanks

2014.05.23. 20:51 keltezéssel, Craig Lewis írta:

On 5/22/14 11:51 , Győrvári Gábor wrote:
Hello,
Got this kind of logs in two node of 3 node cluster both node has 2OSD, only affected 2 OSD on two separate node thats why i dontunderstand the situation. There wasnt any extra io on the system atthe given time.
Using radosgw with s3 api to store objects under ceph average opsaround 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb /sec written.
osd_op(client.7821.0:67251068default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattruser.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),callrefcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from[2] **
*
Are any of your PGs in recovery or backfill?
I've seen this happen two different ways. The first time was becauseI had the recovery and backfill parameters set too high for mycluster. If your journals aren't SSDs, the default parameters are toohigh. The recovery operation will use most of the IOps, and starvethe clients.
The second time I saw this is when one disk was starting to fail.Sectors starting failing, and the drive spent a lot of time readingand remapping bad sectors. Consumer class SATA disks will retry badsectors for 30+ second. It happens in the drive firmware, so it's notsomething you can stop. Enterprise class drives will give up quicker,since they know you have another copy of the data. (Nobody usesenterprise class drives stand-alone; they're always in some sort ofstorage array).
I've had reports of 6+ OSDs blocking subops, and I traced it back toone disk that was blocking others. I replaced that disk, and thewarnings went away.
If your cluster is healthy, check the SMART attributes for osd.2. Ifosd.2 looks good, it might another osd. Check osd.2 logs, and checkany osd that are blocking osd.2. If your cluster is small, it mightbe faster to just check all disks instead of following the trail.
--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>


--
Győrvári Gábor - Scr34m
scr...@frontember.hu

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] slow requests

Reply via email to