Thank you very much for the help. I'm moving osd_heartbeat_grace to the global section and trying to figure out what's going on between the osd's. Since increasing the osd_heartbeat_grace in the [mon] section of ceph.conf on the monitor I still see failures, but now they are 2 seconds > osd_heartbeat_grace. It seems that no matter how much I increase this value osd's are reporting just outside of it.
I've looked at netstat -s for all of the nodes and will go back and look at the network stat's much closer. Would it help to put the monitor on a 10G link to the storage nodes? Everything is setup, but we chose to leave the monitor on a 1G link to the storage nodes. -----Original Message----- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Monday, August 25, 2014 10:50 AM To: Bruce McFarland Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail for grace > 20 Each daemon only reads conf values from its section (or its daemon-type section, or the global section). You'll need to either duplicate the "osd heartbeat grace" value in the [mon] section or put it in the [global] section instead. This is one of the misleading values; sorry about that... Anyway, as Christian said in your other thread, this isn't your issue — the OSD heartbeat failures are your issue. You'll need to sort out whatever's going on there. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Aug 25, 2014 at 10:45 AM, Bruce McFarland <bruce.mcfarl...@taec.toshiba.com> wrote: > That's something that was been puzzling to me. The monitor ceph.conf is set > to 35, but it's runtime config reports 20. I've restarted it after initial > creation to try and get it to reload the ceph.conf settings, but it stays's > at 20. > > [root@ceph-mon01 ceph]# ceph --admin-daemon > /var/run/ceph/ceph-mon.ceph-mon01.asok config show | grep osd_heartbeat_grace > "osd_heartbeat_grace": "20", > [root@ceph-mon01 ceph]# > > [root@ceph-mon01 ceph]# cat ceph.conf > [global] > auth_service_required = cephx > filestore_xattr_use_omap = true > auth_client_required = cephx > auth_cluster_required = cephx > mon_host = 209.243.160.84 > mon_initial_members = ceph-mon01 > fsid = 94bbb882-42e4-4a6c-bfda-125790616fcc > > osd_pool_default_pg_num = 4096 > osd_pool_default_pgp_num = 4096 > > osd_pool_default_size = 3 # Write an object 3 times - number of replicas. > osd_pool_default_min_size = 1 # Allow writing one copy in a degraded state. > > [mon] > mon_osd_min_down_reporters = 2 > > [osd] > debug_ms = 1 > debug_osd = 20 > public_network = 209.243.160.0/24 > cluster_network = 10.10.50.0/24 > osd_journal_size = 96000 > osd_heartbeat_grace = 35 > > [osd.0] > . > . > . > -----Original Message----- > From: Gregory Farnum [mailto:g...@inktank.com] > Sent: Monday, August 25, 2014 10:39 AM > To: Bruce McFarland > Cc: ceph-us...@ceph.com > Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's > still fail for grace > 20 > > On Sat, Aug 23, 2014 at 11:06 PM, Bruce McFarland > <bruce.mcfarl...@taec.toshiba.com> wrote: >> I see osd’s being failed for heartbeat reporting > default >> osd_heartbeat_grace of 20 but the run time config shows that the >> grace is set to 30. Is there another variable for the osd or the mon >> I need to set for the non default osd_heartbeat_grace of 30 to take effect? > > You need to also set the osd heartbeat grace on the monitors. If I > were to guess, the OSDs are actually seeing each other as slow (after > 30 seconds) and reporting it in, but the monitors have a grace of 20 seconds > set so that's what they're using to generate output. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com