Hello,
No i dont see any backfill log in ceph.log during that period, drives
are WD2000FYYZ-01UL1B1 but i did not find any informations in SMART, and
yes i will check other drives too.
Could i determine somehow, in which PG placed the file?
Thanks
2014.05.23. 20:51 keltezéssel, Craig Lewis írta:
On 5/22/14 11:51 , Győrvári Gábor wrote:
Hello,
Got this kind of logs in two node of 3 node cluster both node has 2
OSD, only affected 2 OSD on two separate node thats why i dont
understand the situation. There wasnt any extra io on the system at
the given time.
Using radosgw with s3 api to store objects under ceph average ops
around 20-150 and bw usage 100-2000kb read / sec and only 50-1000kb /
sec written.
osd_op(client.7821.0:67251068
default.4181.1_products/800x600/537e28022fdcc.jpg [cmpxattr
user.rgw.idtag (22) op 1 mode 1,setxattr user.rgw.idtag (33),call
refcount.put] 11.fe53a6fb e590) v4 *currently waiting for subops from
[2] **
*
Are any of your PGs in recovery or backfill?
I've seen this happen two different ways. The first time was because
I had the recovery and backfill parameters set too high for my
cluster. If your journals aren't SSDs, the default parameters are too
high. The recovery operation will use most of the IOps, and starve
the clients.
The second time I saw this is when one disk was starting to fail.
Sectors starting failing, and the drive spent a lot of time reading
and remapping bad sectors. Consumer class SATA disks will retry bad
sectors for 30+ second. It happens in the drive firmware, so it's not
something you can stop. Enterprise class drives will give up quicker,
since they know you have another copy of the data. (Nobody uses
enterprise class drives stand-alone; they're always in some sort of
storage array).
I've had reports of 6+ OSDs blocking subops, and I traced it back to
one disk that was blocking others. I replaced that disk, and the
warnings went away.
If your cluster is healthy, check the SMART attributes for osd.2. If
osd.2 looks good, it might another osd. Check osd.2 logs, and check
any osd that are blocking osd.2. If your cluster is small, it might
be faster to just check all disks instead of following the trail.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
--
Győrvári Gábor - Scr34m
scr...@frontember.hu
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com