Thank you very much for the help. 

I'm moving osd_heartbeat_grace to the global section and trying to figure out 
what's going on between  the osd's. Since increasing the osd_heartbeat_grace in 
the [mon] section of ceph.conf on the monitor I still see failures, but now 
they are 2 seconds > osd_heartbeat_grace. It seems that no matter how much I 
increase this value osd's are reporting just outside of it. 

I've looked at netstat -s for all of the nodes and will go back and look at the 
network stat's much closer.

Would it help to put the monitor on a 10G link to the storage nodes? Everything 
is setup, but we chose to leave the monitor on a 1G link to the storage nodes.


-----Original Message-----
From: Gregory Farnum [mailto:g...@inktank.com] 
Sent: Monday, August 25, 2014 10:50 AM
To: Bruce McFarland
Cc: ceph-us...@ceph.com
Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's still fail 
for grace > 20

Each daemon only reads conf values from its section (or its daemon-type 
section, or the global section). You'll need to either duplicate the "osd 
heartbeat grace" value in the [mon] section or put it in the [global] section 
instead. This is one of the misleading values; sorry about that...

Anyway, as Christian said in your other thread, this isn't your issue — the OSD 
heartbeat failures are your issue. You'll need to sort out whatever's going on 
there.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Mon, Aug 25, 2014 at 10:45 AM, Bruce McFarland 
<bruce.mcfarl...@taec.toshiba.com> wrote:
> That's something that was been puzzling to me. The monitor ceph.conf is set 
> to 35, but it's runtime config reports 20. I've restarted it after initial 
> creation to try and get it to reload the ceph.conf settings, but it stays's 
> at 20.
>
> [root@ceph-mon01 ceph]# ceph --admin-daemon 
> /var/run/ceph/ceph-mon.ceph-mon01.asok config show | grep osd_heartbeat_grace
>   "osd_heartbeat_grace": "20",
> [root@ceph-mon01 ceph]#
>
> [root@ceph-mon01 ceph]# cat ceph.conf
> [global]
> auth_service_required = cephx
> filestore_xattr_use_omap = true
> auth_client_required = cephx
> auth_cluster_required = cephx
> mon_host = 209.243.160.84
> mon_initial_members = ceph-mon01
> fsid = 94bbb882-42e4-4a6c-bfda-125790616fcc
>
> osd_pool_default_pg_num = 4096
> osd_pool_default_pgp_num = 4096
>
> osd_pool_default_size = 3  # Write an object 3 times - number of replicas.
> osd_pool_default_min_size = 1 # Allow writing one copy in a degraded state.
>
> [mon]
> mon_osd_min_down_reporters = 2
>
> [osd]
> debug_ms = 1
> debug_osd = 20
> public_network = 209.243.160.0/24
> cluster_network = 10.10.50.0/24
> osd_journal_size = 96000
> osd_heartbeat_grace = 35
>
> [osd.0]
> .
> .
> .
> -----Original Message-----
> From: Gregory Farnum [mailto:g...@inktank.com]
> Sent: Monday, August 25, 2014 10:39 AM
> To: Bruce McFarland
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] osd_heartbeat_grace set to 30 but osd's 
> still fail for grace > 20
>
> On Sat, Aug 23, 2014 at 11:06 PM, Bruce McFarland 
> <bruce.mcfarl...@taec.toshiba.com> wrote:
>> I see osd’s being failed for heartbeat reporting > default 
>> osd_heartbeat_grace of 20 but the run time config shows that the 
>> grace is set to 30. Is there another variable for the osd or the mon 
>> I need to set for the non default osd_heartbeat_grace of 30 to take effect?
>
> You need to also set the osd heartbeat grace on the monitors. If I 
> were to guess, the OSDs are actually seeing each other as slow (after
> 30 seconds) and reporting it in, but the monitors have a grace of 20 seconds 
> set so that's what they're using to generate output.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to