Hi Greg,

> Yes, "bad crc" indicates that the checksums on an incoming message did
> not match what was provided — ie, the message got corrupted. You
> shouldn't try and fix that by playing around with the peering settings
> as it's not a peering bug.
> Unless there's a bug in the messaging layer causing this (very
> unlikely), you have bad hardware or a bad network configuration
> (people occasionally talk about MTU settings?). Fix that and things
> will work; don't and the only software tweaks you could apply are more
> likely to result in lost data than a happy cluster.
> -Greg


I thought of the network initially but I didn't observe packet loss between the 
two hosts and neither host is having trouble talking to the rest of its peers. 
It's these two OSDs that can't talk to each other so I figured it's not likely 
to be a network issue. Network monitoring does show virtually non-existent 
inbound traffic over those links compared to the other ports on the switch but 
no other peerings fail.

Is there something you can suggest to do to drill down deeper?
Also, am I correct in assuming that I can pull one of these OSDs from the 
cluster as a last resort to cause a remapping to a different to potentially 
give this a quick/temp fix and get the cluster serving I/O properly again?


Many thanks for your help,

George
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to