On Mon, Aug 25, 2014 at 10:56 AM, Bruce McFarland
<bruce.mcfarl...@taec.toshiba.com> wrote:
> Thank you very much for the help.
>
> I'm moving osd_heartbeat_grace to the global section and trying to figure out 
> what's going on between  the osd's. Since increasing the osd_heartbeat_grace 
> in the [mon] section of ceph.conf on the monitor I still see failures, but 
> now they are 2 seconds > osd_heartbeat_grace. It seems that no matter how 
> much I increase this value osd's are reporting just outside of it.
>
> I've looked at netstat -s for all of the nodes and will go back and look at 
> the network stat's much closer.
>
> Would it help to put the monitor on a 10G link to the storage nodes? 
> Everything is setup, but we chose to leave the monitor on a 1G link to the 
> storage nodes.

No. They're being marked down because they aren't heartbeating the
OSDs, and those OSDs are reporting the failures to the monitor (whose
connection is apparently working fine). The most likely guess without
more data is that you've got firewall rules set up blocking the ports
the OSDs are using to send their heartbeats...but it could be many
things in your network stack or your cpu scheduler or whatever.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to