Hi David,

What about memory usage?

1] 23 OSD nodes: 15x 10TB Seagate Ironwolf filestore with journals on Intel
DC P3700, 70% full cluster, Dual Socket E5-2620 v4 @ 2.10GHz, 128GB RAM.

If you upgrade to bluestore, memory usage will likely increase. 15x10TB ~~
150GB RAM needed especially in recovery/backfilling scenario's like these.

Kind regards,
Caspar


2018-03-15 21:53 GMT+01:00 Dan van der Ster <d...@vanderster.com>:

> Did you use perf top or iotop to try to identify where the osd is stuck?
> Did you try increasing the op thread suicide timeout from 180s?
>
> Splitting should log at the beginning and end of an op, so it should be
> clear if it's taking longer than the timeout.
>
> .. Dan
>
>
>
> On Mar 15, 2018 9:23 PM, "David Turner" <drakonst...@gmail.com> wrote:
>
> I am aware of the filestore splitting happening.  I manually split all of
> the subfolders a couple weeks ago on this cluster, but every time we have
> backfilling the newly moved PGs have a chance to split before the
> backfilling is done.  When that has happened in the past it causes some
> blocked requests and will flap OSDs if we don't increase the
> osd_heartbeat_grace, but it has never consistently killed the OSDs during
> the task.  Maybe that's new in Luminous due to some of the priority and
> timeout settings.
>
> This problem in general seems unrelated to the subfolder splitting,
> though, since it started to happen very quickly into the backfilling
> process.  Definitely before many of the recently moved PGs would have
> reached that point.  I've also confirmed that the OSDs that are dying are
> not just stuck on a process (like it looks like with filestore splitting),
> but actually segfaulting and restarting.
>
> On Thu, Mar 15, 2018 at 4:08 PM Dan van der Ster <d...@vanderster.com>
> wrote:
>
>> Hi,
>>
>> Do you see any split or merge messages in the osd logs?
>> I recall some surprise filestore splitting on a few osds after the
>> luminous upgrade.
>>
>> .. Dan
>>
>>
>> On Mar 15, 2018 6:04 PM, "David Turner" <drakonst...@gmail.com> wrote:
>>
>> I upgraded a [1] cluster from Jewel 10.2.7 to Luminous 12.2.2 and last
>> week I added 2 nodes to the cluster.  The backfilling has been ATROCIOUS.
>> I have OSDs consistently [2] segfaulting during recovery.  There's no
>> pattern of which OSDs are segfaulting, which hosts have segfaulting OSDs,
>> etc... It's all over the cluster.  I have been trying variants on all of
>> these following settings with different levels of success, but I cannot
>> eliminate the blocked requests and segfaulting
>> OSDs.  osd_heartbeat_grace, osd_max_backfills, osd_op_thread
>> _suicide_timeout, osd_recovery_max_active, osd_recovery_slee
>> p_hdd, osd_recovery_sleep_hybrid, osd_recovery_thread_timeout,
>> and osd_scrub_during_recovery.  Except for setting nobackfilling on the
>> cluster I can't stop OSDs from segfaulting during recovery.
>>
>> Does anyone have any ideas for this?  I've been struggling with this for
>> over a week now.  For the first couple days I rebalanced the cluster and
>> had this exact same issue prior to adding new storage.  Even setting
>> osd_max_backfills to 1 and recovery_sleep to 1.0, with everything else on
>> defaults, doesn't help.
>>
>> Backfilling caused things to slow down on Jewel, but I wasn't having OSDs
>> segfault multiple times/hour like I am on Luminous.  So many OSDs are going
>> down that I had to set nodown to prevent potential data instability of OSDs
>> on multiple hosts going up and down all the time.  That blocks IO for every
>> OSD that dies either until it comes back up or I manually mark it down.  I
>> hope someone has some ideas for me here.  Our plan moving forward is to
>> only use half of the capacity of the drives by pretending they're 5TB
>> instead of 10TB to increase the spindle speed per TB.  Also migrating to
>> bluestore will hopefully help.
>>
>>
>> [1] 23 OSD nodes: 15x 10TB Seagate Ironwolf filestore with journals on
>> Intel DC P3700, 70% full cluster, Dual Socket E5-2620 v4 @ 2.10GHz, 128GB
>> RAM.
>>
>> [2]    -19> 2018-03-15 16:42:17.998074 7fe661601700  5 --
>> 10.130.115.25:6811/2942118 >> 10.130.115.48:0/372681 conn(0x55e3ea087000
>> :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pg
>> s=1920 cs=1 l=1). rx osd.254 seq 74507 0x55e3eb8e2e00 osd_ping(ping
>> e93182 stamp 2018-03-15 16:42:17.990698) v4
>>    -18> 2018-03-15 16:42:17.998091 7fe661601700  1 --
>> 10.130.115.25:6811/2942118 <== osd.254 10.130.115.48:0/372681 74507 ====
>> osd_ping(ping e93182 stamp 2018-03-15 16:42:17.990698)
>>  v4 ==== 2004+0+0 (492539280 0 0) 0x55e3eb8e2e00 con 0x55e3ea087000
>>    -17> 2018-03-15 16:42:17.998109 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>>    -16> 2018-03-15 16:42:17.998111 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>>    -15> 2018-03-15 16:42:17.998120 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out after 60
>>    -14> 2018-03-15 16:42:17.998123 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe63af75700' had timed out after 60
>>    -13> 2018-03-15 16:42:17.998126 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe63b776700' had timed out after 60
>>    -12> 2018-03-15 16:42:17.998129 7fe661601700  1 heartbeat_map
>> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had timed out after 60
>>    -11> 2018-03-15 16:42:18.004203 7fe661601700  5 --
>> 10.130.115.25:6811/2942118 >> 10.130.115.33:0/3348055
>> conn(0x55e3eb5f0000 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH p
>> gs=1894 cs=1 l=1). rx osd.169 seq 74633 0x55e3eb8e2e00 osd_ping(ping
>> e93182 stamp 2018-03-15 16:42:17.998828) v4
>>    -10> 2018-03-15 16:42:18.004230 7fe661601700  1 --
>> 10.130.115.25:6811/2942118 <== osd.169 10.130.115.33:0/3348055 74633
>> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.998828
>> ) v4 ==== 2004+0+0 (2306332339 0 0) 0x55e3eb8e2e00 con 0x55e3eb5f0000
>>     -9> 2018-03-15 16:42:18.004241 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe639772700' had timed out after 60
>>     -8> 2018-03-15 16:42:18.004244 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe639f73700' had timed out after 60
>>     -7> 2018-03-15 16:42:18.004246 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe63a774700' had timed out after 60
>>     -6> 2018-03-15 16:42:18.004248 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe63af75700' had timed out after 60
>>     -5> 2018-03-15 16:42:18.004249 7fe661601700  1 heartbeat_map
>> is_healthy 'OSD::osd_op_tp thread 0x7fe63b776700' had timed out after 60
>>     -4> 2018-03-15 16:42:18.004251 7fe661601700  1 heartbeat_map
>> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had timed out after 60
>>     -3> 2018-03-15 16:42:18.004256 7fe661601700  1 heartbeat_map
>> is_healthy 'FileStore::op_tp thread 0x7fe654854700' had suicide timed out
>> after 180
>>     -2> 2018-03-15 16:42:18.004462 7fe6605ff700  5 --
>> 10.130.113.25:6811/2942118 >> 10.130.113.33:0/3348055
>> conn(0x55e3eb599800 :6811 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH p
>> gs=1937 cs=1 l=1). rx osd.169 seq 74633 0x55e3eef6d200 osd_ping(ping
>> e93182 stamp 2018-03-15 16:42:17.998828) v4
>>     -1> 2018-03-15 16:42:18.004502 7fe6605ff700  1 --
>> 10.130.113.25:6811/2942118 <== osd.169 10.130.113.33:0/3348055 74633
>> ==== osd_ping(ping e93182 stamp 2018-03-15 16:42:17.998828
>> ) v4 ==== 2004+0+0 (2306332339 0 0) 0x55e3eef6d200 con 0x55e3eb599800
>>      0> 2018-03-15 16:42:18.015185 7fe654854700 -1 *** Caught signal
>> (Aborted) **
>>  in thread 7fe654854700 thread_name:tp_fstore_op
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to