Re: [ceph-users] CephFS and many small files

2019-04-01 Thread Clausen , Jörn

Hi Paul!

Thanks for your answer. Yep, bluestore_min_alloc_size and your 
calculation sounds very reasonable to me :)


Am 29.03.2019 um 23:56 schrieb Paul Emmerich:

Are you running on HDDs? The minimum allocation size is 64kb by
default here. You can control that via the parameter
bluestore_min_alloc_size during OSD creation.
64 kb times 8 million files is 512 GB which is the amount of usable
space you reported before running the test, so that seems to add up.


My test cluster is virtualized on vSphere, but the OSDs are reported as 
HDDs. And our production cluster also uses HDDs only. All OSDs use the 
default value for bluestore_min_alloc_size.


If we should really consider tinkering with bluestore_min_alloc_size: As 
this is probably not tunable afterwards, we would need to replace all 
OSDs in a rolling update. Should we expect any problems while we have 
OSDs with mixed min_alloc_sizes?



There's also some metadata overhead etc. You might want to consider
enabling inline data in cephfs to handle small files in a
store-efficient way (note that this feature is officially marked as
experimental, though).
http://docs.ceph.com/docs/master/cephfs/experimental-features/#inline-data


I'll give it a try on my test cluster.

--
Jörn Clausen
Daten- und Rechenzentrum
GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel
Düsternbrookerweg 20
24105 Kiel



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck in active+clean+remapped

2019-04-01 Thread Vladimir Prokofev
As we fixed failed node next day, cluster rebalanced to it's original state
without any issues, so crush dump would be irrelevant at this point I
guess. Will have to wait for next occurence.
Here's a tunables part, maybe it will help to shed some light:

"tunables": {
"choose_local_tries": 0,
"choose_local_fallback_tries": 0,
"choose_total_tries": 50,
"chooseleaf_descend_once": 1,
"chooseleaf_vary_r": 1,
"chooseleaf_stable": 0,
"straw_calc_version": 1,
"allowed_bucket_algs": 22,
"profile": "firefly",
"optimal_tunables": 0,
"legacy_tunables": 0,
"minimum_required_version": "firefly",
"require_feature_tunables": 1,
"require_feature_tunables2": 1,
"has_v2_rules": 0,
"require_feature_tunables3": 1,
"has_v3_rules": 0,
"has_v4_buckets": 0,
"require_feature_tunables5": 0,
"has_v5_rules": 0
},

вс, 31 мар. 2019 г. в 13:28, huang jun :

> seems like the crush cannot get enough osds for this pg,
> what the output of 'ceph osd crush dump' and especially the 'tunables'
> section values?
>
> Vladimir Prokofev  于2019年3月27日周三 上午4:02写道:
> >
> > CEPH 12.2.11, pool size 3, min_size 2.
> >
> > One node went down today(private network interface started flapping, and
> after a while OSD processes crashed), no big deal, cluster recovered, but
> not completely. 1 PG stuck in active+clean+remapped state.
> >
> > PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
>  LOG  DISK_LOG STATE STATE_STAMPVERSION
>  REPORTEDUP UP_PRIMARY ACTING ACTING_PRIMARY
> LAST_SCRUB  SCRUB_STAMPLAST_DEEP_SCRUB
> DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
> > 20.a2   511  00   511   0
> 1584410172 1500 1500 active+clean+remapped 2019-03-26 20:50:18.639452
>   96149'18920496861:935872[26,14] 26  [26,14,9]
>  2696149'189204 2019-03-26 10:47:36.17476995989'187669 2019-03-22
> 23:29:02.322848 0
> >
> > it states it's placed on 26,14 OSDs, should be on 26,14,9. As far as I
> can see there's nothing wrong with any of those OSDs, they work, host other
> PGs, peer with each other, etc. I tried restarting all of them one after
> another, but without any success.
> > OSD 9 hosts 95 other PGs, don't think it's PG overdose.
> >
> > Last line of log from osd.9 mentioning PG 20.a2:
> > 2019-03-26 20:50:16.294500 7fe27963a700  1 osd.9 pg_epoch: 96860
> pg[20.a2( v 96149'189204 (95989'187645,96149'189204]
> local-lis/les=96857/96858 n=511 ec=39164/39164 lis/c 96857/96855 les/c/f
> 96858/96856/66611 96859/96860/96855) [26,14]/[26,14,9] r=2 lpr=96860
> pi=[96855,96860)/1 crt=96149'189204 lcod 0'0 remapped NOTIFY mbc={}]
> state: transitioning to Stray
> >
> > Nothing else out of ordinary, just usual scrubs/deep-scrubs
> notifications.
> > Any ideas what it it can be, or any other steps to troubleshoot this?
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Thank you!
> HuangJun
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-04-01 Thread mart.v
"















"
Thanks for this advice. It helped me to identify a subset of devices (only 3
of the whole cluster) where was this problem happening. The SAS adapter (LSI
SAS 3008) on my Supermicro board was the issue. There is a RAID mode enabled
by default. I have flashed the latest firmware (v16) and switched to IT mode
(no raid).




Issues with slow requests immediately ceased. I hope it will help someone 
else with the same issue :-)




Best, 

Martin

 
"














I am afraid I was not clear enough. Suppose that ceph health detail reports
a slow request involving osd.14




In osd.14 log I see this line:





2019-02-24 16:58:39.475740 7fe25a84d700  0 log_channel(cluster) log [WRN] :
slow request 30.328572 seconds old, received at 2019-02-24 16:58:09.147037:
osd_op(client.148580771.0:476351313 8.1d6 8:6ba6a916:::rbd_data.ba32e7238e1f
29.04b3:head [set-alloc-hint object_size 4194304 write_size 
4194304,write 3776512~4096] snapc 0=[] ondisk+write+known_if_redirected e
1242718) currently op_applied





Here the pg_num is 8.1d6





# ceph pg map 8.1d6

osdmap e1247126 pg 8.1d6 (8.1d6) -> up [14,38,24] acting [14,38,24]

[root@ceph-osd-02 ceph]# ceph pg map 8.1d6





So the problem is not necessarily is osd.14. It could also in osd.38 or osd.
24, or in the relevant hosts

 
"
"





 "
"







"___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] co-located cephfs client deadlock

2019-04-01 Thread Dan van der Ster
Hi all,

We have been benchmarking a hyperconverged cephfs cluster (kernel
clients + osd on same machines) for awhile. Over the weekend (for the
first time) we had one cephfs mount deadlock while some clients were
running ior.

All the ior processes are stuck in D state with this stack:

[] wait_on_page_bit+0x83/0xa0
[] __filemap_fdatawait_range+0x111/0x190
[] filemap_fdatawait_range+0x14/0x30
[] filemap_write_and_wait_range+0x56/0x90
[] ceph_fsync+0x55/0x420 [ceph]
[] do_fsync+0x67/0xb0
[] SyS_fsync+0x10/0x20
[] system_call_fastpath+0x22/0x27
[] 0x

We tried restarting the co-located OSDs, and tried evicting the
client, but the processes stay deadlocked.

We've seen the recent issue related to co-location
(https://bugzilla.redhat.com/show_bug.cgi?id=1665248) but we don't
have the `usercopy` warning in dmesg.

Are there other known issues related to co-locating?

Thanks!
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and many small files

2019-04-01 Thread Paul Emmerich
There are no problems with mixed bluestore_min_alloc_size; that's an
abstraction layer lower than the concept of multiple OSDs. (Also, you
always have that when mixing SSDs and HDDs)

I'm not sure about the real-world impacts of a lower min alloc size or
the rationale behind the default values for HDDs (64) and SSDs (16kb).

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Apr 1, 2019 at 10:36 AM Clausen, Jörn  wrote:
>
> Hi Paul!
>
> Thanks for your answer. Yep, bluestore_min_alloc_size and your
> calculation sounds very reasonable to me :)
>
> Am 29.03.2019 um 23:56 schrieb Paul Emmerich:
> > Are you running on HDDs? The minimum allocation size is 64kb by
> > default here. You can control that via the parameter
> > bluestore_min_alloc_size during OSD creation.
> > 64 kb times 8 million files is 512 GB which is the amount of usable
> > space you reported before running the test, so that seems to add up.
>
> My test cluster is virtualized on vSphere, but the OSDs are reported as
> HDDs. And our production cluster also uses HDDs only. All OSDs use the
> default value for bluestore_min_alloc_size.
>
> If we should really consider tinkering with bluestore_min_alloc_size: As
> this is probably not tunable afterwards, we would need to replace all
> OSDs in a rolling update. Should we expect any problems while we have
> OSDs with mixed min_alloc_sizes?
>
> > There's also some metadata overhead etc. You might want to consider
> > enabling inline data in cephfs to handle small files in a
> > store-efficient way (note that this feature is officially marked as
> > experimental, though).
> > http://docs.ceph.com/docs/master/cephfs/experimental-features/#inline-data
>
> I'll give it a try on my test cluster.
>
> --
> Jörn Clausen
> Daten- und Rechenzentrum
> GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel
> Düsternbrookerweg 20
> 24105 Kiel
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] co-located cephfs client deadlock

2019-04-01 Thread Paul Emmerich
Which kernel version are you using? We've had lots of problems with
random deadlocks in kernels with cephfs but 4.19 seems to be pretty
stable.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Apr 1, 2019 at 12:45 PM Dan van der Ster  wrote:
>
> Hi all,
>
> We have been benchmarking a hyperconverged cephfs cluster (kernel
> clients + osd on same machines) for awhile. Over the weekend (for the
> first time) we had one cephfs mount deadlock while some clients were
> running ior.
>
> All the ior processes are stuck in D state with this stack:
>
> [] wait_on_page_bit+0x83/0xa0
> [] __filemap_fdatawait_range+0x111/0x190
> [] filemap_fdatawait_range+0x14/0x30
> [] filemap_write_and_wait_range+0x56/0x90
> [] ceph_fsync+0x55/0x420 [ceph]
> [] do_fsync+0x67/0xb0
> [] SyS_fsync+0x10/0x20
> [] system_call_fastpath+0x22/0x27
> [] 0x
>
> We tried restarting the co-located OSDs, and tried evicting the
> client, but the processes stay deadlocked.
>
> We've seen the recent issue related to co-location
> (https://bugzilla.redhat.com/show_bug.cgi?id=1665248) but we don't
> have the `usercopy` warning in dmesg.
>
> Are there other known issues related to co-locating?
>
> Thanks!
> Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] co-located cephfs client deadlock

2019-04-01 Thread Dan van der Ster
It's the latest CentOS 7.6 kernel. Known pain there?

The user was running a 1.95TiB ior benchmark -- so, trying to do
parallel writes to one single 1.95TiB file.
We have
  max_file_size 219902322  (exactly 2 TiB)
so it should fit.

Thanks!
Dan


On Mon, Apr 1, 2019 at 1:06 PM Paul Emmerich  wrote:
>
> Which kernel version are you using? We've had lots of problems with
> random deadlocks in kernels with cephfs but 4.19 seems to be pretty
> stable.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Mon, Apr 1, 2019 at 12:45 PM Dan van der Ster  wrote:
> >
> > Hi all,
> >
> > We have been benchmarking a hyperconverged cephfs cluster (kernel
> > clients + osd on same machines) for awhile. Over the weekend (for the
> > first time) we had one cephfs mount deadlock while some clients were
> > running ior.
> >
> > All the ior processes are stuck in D state with this stack:
> >
> > [] wait_on_page_bit+0x83/0xa0
> > [] __filemap_fdatawait_range+0x111/0x190
> > [] filemap_fdatawait_range+0x14/0x30
> > [] filemap_write_and_wait_range+0x56/0x90
> > [] ceph_fsync+0x55/0x420 [ceph]
> > [] do_fsync+0x67/0xb0
> > [] SyS_fsync+0x10/0x20
> > [] system_call_fastpath+0x22/0x27
> > [] 0x
> >
> > We tried restarting the co-located OSDs, and tried evicting the
> > client, but the processes stay deadlocked.
> >
> > We've seen the recent issue related to co-location
> > (https://bugzilla.redhat.com/show_bug.cgi?id=1665248) but we don't
> > have the `usercopy` warning in dmesg.
> >
> > Are there other known issues related to co-locating?
> >
> > Thanks!
> > Dan
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Samsung 983 NVMe M.2 - experiences?

2019-04-01 Thread Martin Overgaard Hansen
Hi Fabian,

We've just started building a cluster using the PM983 for the bucket index. Let 
me know if you want us to perform any test on them. 

Thanks,
Martin

> -Original Message-
> From: ceph-users  On Behalf Of
> Fabian Figueredo
> Sent: 30. marts 2019 07:55
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Samsung 983 NVMe M.2 - experiences?
> 
> Hello,
> I'm in the process of building a new ceph cluster, this time around i was
> considering going with nvme ssd drives.
> In searching for something in the line of 1TB per ssd drive, i found "Samsung
> 983 DCT 960GB NVMe M.2 Enterprise SSD for Business".
> 
> More info:
> https://www.samsung.com/us/business/products/computing/ssd/enterpris
> e/983-dct-960gb-mz-1lb960ne/
> 
> The idea is buy 10 units.
> 
> Anyone have any thoughts/experiences with this drives?
> 
> Thanks,
> Fabian

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] op_w_latency

2019-04-01 Thread Glen Baars
Hello Ceph Users,

I am finding that the write latency across my ceph clusters isn't great and I 
wanted to see what other people are getting for op_w_latency. Generally I am 
getting 70-110ms latency.

I am using: ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok perf dump | 
grep -A3 '\"op_w_latency' | grep 'avgtime'

Ram, CPU and network don't seem to be the bottleneck. The drives are behind a 
dell H810p raid card with a 1GB writeback cache and battery. I have tried with 
LSI JBOD cards and haven't found it faster ( as you would expect with write 
cache ). The disks through iostat -xyz 1 show 10-30% usage with general service 
+ write latency around 3-4ms. Queue depth is normally less than one. RocksDB 
write latency is around 0.6ms, read 1-2ms. Usage is RBD backend for Cloudstack.

Dumping the ops seems to show the latency here: (ceph --admin-daemon 
/var/run/ceph/ceph-osd.102.asok dump_historic_ops_by_duration  |less)

{
"time": "2019-04-01 22:24:38.432000",
"event": "queued_for_pg"
},
{
"time": "2019-04-01 22:24:38.438691",
"event": "reached_pg"
},
{
"time": "2019-04-01 22:24:38.438740",
"event": "started"
},
{
"time": "2019-04-01 22:24:38.727820",
"event": "sub_op_started"
},
{
"time": "2019-04-01 22:24:38.728448",
"event": "sub_op_committed"
},
{
"time": "2019-04-01 22:24:39.129175",
"event": "commit_sent"
},
{
"time": "2019-04-01 22:24:39.129231",
"event": "done"
}
]
}
}

This write was around a very slow one and I am wondering if I have a few ops 
that are taking along time and most that are good

What else can I do to figure out where the issue is?
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] co-located cephfs client deadlock

2019-04-01 Thread Yan, Zheng
On Mon, Apr 1, 2019 at 6:45 PM Dan van der Ster  wrote:
>
> Hi all,
>
> We have been benchmarking a hyperconverged cephfs cluster (kernel
> clients + osd on same machines) for awhile. Over the weekend (for the
> first time) we had one cephfs mount deadlock while some clients were
> running ior.
>
> All the ior processes are stuck in D state with this stack:
>
> [] wait_on_page_bit+0x83/0xa0
> [] __filemap_fdatawait_range+0x111/0x190
> [] filemap_fdatawait_range+0x14/0x30
> [] filemap_write_and_wait_range+0x56/0x90
> [] ceph_fsync+0x55/0x420 [ceph]
> [] do_fsync+0x67/0xb0
> [] SyS_fsync+0x10/0x20
> [] system_call_fastpath+0x22/0x27
> [] 0x
>

are there hang osd requests in /sys/kernel/debug/ceph/xxx/osdc?

> We tried restarting the co-located OSDs, and tried evicting the
> client, but the processes stay deadlocked.
>
> We've seen the recent issue related to co-location
> (https://bugzilla.redhat.com/show_bug.cgi?id=1665248) but we don't
> have the `usercopy` warning in dmesg.
>
> Are there other known issues related to co-locating?
>
> Thanks!
> Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS and many small files

2019-04-01 Thread Sergey Malinin
I haven't had any issues with 4k allocation size in cluster holding 189M files.

April 1, 2019 2:04 PM, "Paul Emmerich"  wrote:

> I'm not sure about the real-world impacts of a lower min alloc size or
> the rationale behind the default values for HDDs (64) and SSDs (16kb).
> 
> Paul
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph nautilus upgrade problem

2019-04-01 Thread Mark Schouten

Hi,

Please let us know how this ended for you!
--
Mark Schouten 
Tuxis, Ede, https://www.tuxis.nl
T: +31 318 200208 
 
- Originele bericht -
Van: Stadsnet (jwil...@stads.net)
Datum: 26-03-2019 16:42
Naar: Ashley Merrick (singap...@amerrick.co.uk)
Cc: ceph-users@lists.ceph.com
Onderwerp: Re: [ceph-users] Ceph nautilus upgrade problem

On 26-3-2019 16:39, Ashley Merrick wrote:
Have you upgraded any OSD's?


No didn't go through with the osd's



On a test cluster I saw the same and as I upgraded / restarted the OSD's the 
PG's started to show online till it was 100%.

I know it says to not change anything to do with pool's during the upgrade so I 
am guessing there is a code change that cause this till all is on the same 
version.

will continue


On Tue, Mar 26, 2019 at 11:37 PM Stadsnet  wrote:

We did a upgrade from luminous to nautilus

after upgrading the three monitors we got that all our pgs where inactive

cluster:
id: 5bafad08-31b2-4716-be77-07ad2e2647eb
health: HEALTH_ERR
noout flag(s) set
1 scrub errors
Reduced data availability: 1429 pgs inactive
316 pgs not deep-scrubbed in time
520 pgs not scrubbed in time
3 monitors have not enabled msgr2

services:
mon: 3 daemons, quorum Ceph-Mon1,Ceph-Mon2,Ceph-Mon3 (age 51m)
mgr: Ceph-Mon1(active, since 23m), standbys: Ceph-Mon3, Ceph-Mon2
osd: 103 osds: 103 up, 103 in
flags noout
rgw: 2 daemons active (S3-Ceph1, S3-Ceph2)

data:
pools: 26 pools, 3248 pgs
objects: 134.92M objects, 202 TiB
usage: 392 TiB used, 486 TiB / 879 TiB avail
pgs: 100.000% pgs unknown
3248 unknown

System seems to keep working.

Did we loose reference "-1 0 root default" ?

is there a fix for that ?

ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-18 16.0 root ssd
-10 2.0 host Ceph-Stor1-SSD
80 nvme 2.0 osd.80 up 1.0 1.0
-11 2.0 host Ceph-Stor2-SSD
81 nvme 2.0 osd.81 up 1.0 1.0
-12 2.0 host Ceph-Stor3-SSD
82 nvme 2.0 osd.82 up 1.0 1.0
-13 2.0 host Ceph-Stor4-SSD
83 nvme 2.0 osd.83 up 1.0 1.0
-14 2.0 host Ceph-Stor5-SSD
84 nvme 2.0 osd.84 up 1.0 1.0
-15 2.0 host Ceph-Stor6-SSD
85 nvme 2.0 osd.85 up 1.0 1.0
-16 2.0 host Ceph-Stor7-SSD
86 nvme 2.0 osd.86 up 1.0 1.0
-17 2.0 host Ceph-Stor8-SSD
87 nvme 2.0 osd.87 up 1.0 1.0
-1 865.93420 root default
-2 110.96700 host Ceph-Stor1
0 hdd 9.09599 osd.0 up 1.0 1.0
1 hdd 9.09599 osd.1 up 1.0 1.0
2 hdd 9.09599 osd.2 up 1.0 1.0
3 hdd 9.09599 osd.3 up 1.0 1.0
4 hdd 9.09599 osd.4 up 1.0 1.0
5 hdd 9.09599 osd.5 up 1.0 1.0
6 hdd 9.09599 osd.6 up 1.0 1.0
7 hdd 9.09599 osd.7 up 1.0 1.0
8 hdd 9.09599 osd.8 up 1.0 1.0
9 hdd 9.09599 osd.9 up 1.0 1.0
88 hdd 9.09599 osd.88 up 1.0 1.0
89 hdd 9.09599 osd.89 up 1.0 1.0
-3 109.15189 host Ceph-Stor2
10 hdd 9.09599 osd.10 up 1.0 1.0
11 hdd 9.09599 osd.11 up 1.0 1.0
12 hdd 9.09599 osd.12 up 1.0 1.0
13 hdd 9.09599 osd.13 up 1.0 1.0
14 hdd 9.09599 osd.14 up 1.0 1.0
15 hdd 9.09599 osd.15 up 1.0 1.0
16 hdd 9.09599 osd.16 up 1.0 1.0
17 hdd 9.09599 osd.17 up 1.0 1.0
18 hdd 9.09599 osd.18 up 1.0 1.0
19 hdd 9.09599 osd.19 up 1.0 1.0
90 hdd 9.09598 osd.90 up 1.0 1.0
91 hdd 9.09598 osd.91 up 1.0 1.0
-4 109.15189 host Ceph-Stor3
20 hdd 9.09599 osd.20 up 1.0 1.0
21 hdd 9.09599 osd.21 up 1.0 1.0
22 hdd 9.09599 osd.22 up 1.0 1.0
23 hdd 9.09599 osd.23 up 1.0 1.0
24 hdd 9.09599 osd.24 up 1.0 1.0
25 hdd 9.09599 osd.25 up 1.0 1.0
26 hdd 9.09599 osd.26 up 1.0 1.0
27 hdd 9.09599 osd.27 up 1.0 1.0
28 hdd 9.09599 osd.28 up 1.0 1.0
29 hdd 9.09599 osd.29 up 1.0 1.0
92 hdd 9.09598 osd.92 up 1.0 1.0
93 hdd 9.09598 osd.93 up 0.80002 1.0
-5 109.15189 host Ceph-Stor4
30 hdd 9.09599 osd.30 up 1.0 1.0
31 hdd 9.09599 osd.31 up 1.0 1.0
32 hdd 9.09599 osd.32 up 1.0 1.0
33 hdd 9.09599 osd.33 up 1.0 1.0
34 hdd 9.09599 osd.34 up 0.90002 1.0
35 hdd 9.09599 osd.35 up 1.0 1.0
36 hdd 9.09599 osd.36 up 1.0 1.0
37 hdd 9.09599 osd.37 up 1.0 1.0
38 hdd 9.09599 osd.38 up 1.0 1.0
39 hdd 9.09599 osd.39 up 1.0 1.0
94 hdd 9.09598 osd.94 up 1.0 1.0
95 hdd 9.09598 osd.95 up 1.0 1.0
-6 109.15189 host Ceph-Stor5
40 hdd 9.09599 osd.40 up 1.0 1.0
41 hdd 9.09599 osd.41 up 1.0 1.0
42 hdd 9.09599 osd.42 up 1.0 1.0
43 hdd 9.09599 osd.43 up 1.0 1.0
44 hdd 9.09599 osd.44 up 1.0 1.0
45 hdd 9.09599 osd.45 up 1.0 1.0
46 hdd 9.09599 osd.46 up 1.0 1.0
47 hdd 9.09599 osd.47 up 1.0 1.0
48 hdd 9.09599 osd.48 up 1.0 1.0
49 hdd 9.09599 osd.49 up 1.0 1.0
96 hdd 9.09598 osd.96 up 1.0 1.0
97 hdd 9.09598 osd.97 up 1.0 1.0
-7 109.15187 host Ceph-Stor6
50 hdd 9.09599 osd.5

[ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-01 Thread Pickett, Neale T
Hello


We are experiencing an issue where our ceph MDS gobbles up 500G of RAM, is 
killed by the kernel, dies, then repeats. We have 3 MDS daemons on different 
machines, and all are exhibiting this behavior. We are running the following 
versions (from Docker):


  *   ceph/daemon:v3.2.1-stable-3.2-luminous-centos-7
  *   ceph/daemon:v3.2.1-stable-3.2-luminous-centos-7
  *   ceph/daemon:v3.1.0-stable-3.1-luminous-centos-7 (downgraded in last-ditch 
effort to resolve, didn't help)

The machines hosting the MDS instances have 512G RAM. We tried adding swap, and 
the MDS just started eating into the swap (and got really slow, eventually 
being kicked out for exceeding the mds_beacon_grace of 240). 
mds_cache_memory_limit has been many values ranging from 200G to the default of 
1073741824, and the result of replay is always the same: keep allocating memory 
until the kernel OOM killer stops it (or the mds_beacon_grace period expires, 
if swap is enabled).

Before it died, the active MDS reported 1.592 million inodes to Prometheus 
(ceph_mds_inodes) and 1.493 million caps (ceph_mds_caps).

This appears to be the same problem as 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030872.html

At this point I feel like my best option is to try to destroy the journal and 
hope things come back, but while we can probably recover from this, I'd like to 
prevent it happening in the future. Any advice?


Neale Pickett 
A-4: Advanced Research in Cyber Systems
Los Alamos National Laboratory
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-iscsi: (Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object

2019-04-01 Thread Jason Dillaman
What happens when you run "rados -p rbd lock list gateway.conf"?

On Fri, Mar 29, 2019 at 12:19 PM Matthias Leopold
 wrote:
>
> Hi,
>
> I upgraded my test Ceph iSCSI gateways to
> ceph-iscsi-3.0-6.g433bbaa.el7.noarch.
> I'm trying to use the new parameter "cluster_client_name", which - to me
> - sounds like I don't have to access the ceph cluster as "client.admin"
> anymore. I created a "client.iscsi" user and watched what happened. The
> gateways can obviously read the config (which I created when I was still
> client.admin), but when I try to change anything (like create a new disk
> in pool "iscsi") I get the following error:
>
> (Config.lock) Timed out (30s) waiting for excl lock on gateway.conf object
>
> I suspect this is related to the privileges of "client.iscsi", but I
> couldn't find the correct settings yet. The last thing I tried was:
>
> caps: [mon] allow r, allow command "osd blacklist"
> caps: [osd] allow * pool=rbd, profile rbd pool=iscsi
>
> Can anybody tell me how to solve this?
> My Ceph version is 12.2.10 on CentOS 7.
>
> thx
> Matthias
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-01 Thread Pickett, Neale T
We decided to go ahead and try truncating the journal, but before we did, we 
would try to back it up. However, there are ridiculous values in the header. It 
can't write a journal this large because (I presume) my ext4 filesystem can't 
seek to this position in the (sparse) file.


I would not be surprised to learn that memory allocation is trying to do 
something similar, hence the allocation of all available memory. This seems 
like a new kind of journal corruption that isn't being reported correctly.

[root@lima /]# time cephfs-journal-tool --cluster=prodstore journal export 
backup.bin
journal is 24652730602129~673601102
2019-04-01 17:49:52.776977 7fdcb999e040 -1 Error 22 ((22) Invalid argument) 
seeking to 0x166be9401291
Error ((22) Invalid argument)

real0m27.832s
user0m2.028s
sys 0m3.438s
[root@lima /]# cephfs-journal-tool --cluster=prodstore event get summary
Events by type:
  EXPORT: 187
  IMPORTFINISH: 182
  IMPORTSTART: 182
  OPEN: 3133
  SUBTREEMAP: 129
  UPDATE: 42185
Errors: 0
[root@lima /]# cephfs-journal-tool --cluster=prodstore header get
{
"magic": "ceph fs volume v011",
"write_pos": 24653404029749,
"expire_pos": 24652730602129,
"trimmed_pos": 24652730597376,
"stream_format": 1,
"layout": {
"stripe_unit": 4194304,
"stripe_count": 1,
"object_size": 4194304,
"pool_id": 2,
"pool_ns": ""
}
}

[root@lima /]# printf "%x\n" "24653404029749"
166c1163c335
[root@lima /]# printf "%x\n" "24652730602129"
166be9401291

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-01 Thread Pickett, Neale T
Since my problem is going to be archived on the Internet I'll keep following 
up, so the next person with this problem might save some time.


The seek was because ext4 can't seek to 23TB, but changing to an xfs mount to 
create this file resulted in success.


Here is what I wound up doing to fix this:


  *   Bring down all MDSes so they stop flapping
  *   Back up journal (as seen in previous message)
  *   Apply journal manually
  *   Reset journal manually
  *   Clear session table
  *   Clear other tables (not sure I needed to do this)
  *   Mark FS down
  *   Mark the rank 0 MDS as failed
  *   Reset the FS (yes, I really mean it)
  *   Restart MDSes
  *   Finally get some sleep

If anybody has any idea what may have caused this situation, I am keenly 
interested. If not, hopefully I at least helped someone else.



From: Pickett, Neale T
Sent: Monday, April 1, 2019 12:31
To: ceph-users@lists.ceph.com
Subject: Re: MDS allocates all memory (>500G) replaying, OOM-killed, repeat


We decided to go ahead and try truncating the journal, but before we did, we 
would try to back it up. However, there are ridiculous values in the header. It 
can't write a journal this large because (I presume) my ext4 filesystem can't 
seek to this position in the (sparse) file.


I would not be surprised to learn that memory allocation is trying to do 
something similar, hence the allocation of all available memory. This seems 
like a new kind of journal corruption that isn't being reported correctly.

[root@lima /]# time cephfs-journal-tool --cluster=prodstore journal export 
backup.bin
journal is 24652730602129~673601102
2019-04-01 17:49:52.776977 7fdcb999e040 -1 Error 22 ((22) Invalid argument) 
seeking to 0x166be9401291
Error ((22) Invalid argument)

real0m27.832s
user0m2.028s
sys 0m3.438s
[root@lima /]# cephfs-journal-tool --cluster=prodstore event get summary
Events by type:
  EXPORT: 187
  IMPORTFINISH: 182
  IMPORTSTART: 182
  OPEN: 3133
  SUBTREEMAP: 129
  UPDATE: 42185
Errors: 0
[root@lima /]# cephfs-journal-tool --cluster=prodstore header get
{
"magic": "ceph fs volume v011",
"write_pos": 24653404029749,
"expire_pos": 24652730602129,
"trimmed_pos": 24652730597376,
"stream_format": 1,
"layout": {
"stripe_unit": 4194304,
"stripe_count": 1,
"object_size": 4194304,
"pool_id": 2,
"pool_ns": ""
}
}

[root@lima /]# printf "%x\n" "24653404029749"
166c1163c335
[root@lima /]# printf "%x\n" "24652730602129"
166be9401291

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-01 Thread Sergey Malinin
These steps pretty well correspond to 
http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/ 
(http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/)
Were you able to replay journal manually with no issues? IIRC, 
"cephfs-journal-tool recover_dentries" would lead to OOM in case of MDS doing 
so, and it has already been discussed on this list.
April 2, 2019 1:37 AM, "Pickett, Neale T" mailto:ne...@lanl.gov?to=%22Pickett,%20Neale%20T%22%20)> wrote:
Here is what I wound up doing to fix this: 
* Bring down all MDSes so they stop flapping 
* Back up journal (as seen in previous message) 
* Apply journal manually 
* Reset journal manually 
* Clear session table 
* Clear other tables (not sure I needed to do this) 
* Mark FS down 
* Mark the rank 0 MDS as failed 
* Reset the FS (yes, I really mean it) 
* Restart MDSes 
* Finally get some sleep
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to list rbd block > images in nautilus dashboard

2019-04-01 Thread Wes Cilldhaire
Hi all,

I've been having an issue with the dashboard being unable to list block images. 
 In the mimic and luminous dashboards it would take a very long time to load, 
eventually telling me it was showing a cached list, and after a few auto 
refreshes it would finally show all rbd images and their properties.  In the 
nautilus dashboard however it just times out and never tries again, display 
'Could not load data. Please check the cluster health' - cluster however 
reports healthy.  Using the cli to retrieve information works, however it can 
be slow to calculate du for every image.

I know this isn't a problem with the dashboard itself but instead whatever 
mechanism its using under the hood to retrieve the information regarding the 
block images.  I'm not sure what component needs to be diagnosed here though.  
The cluster itself is performant, VMs running from the rbd images are 
performant.  I do have multiple rbd pools though, one on EC hdds for slow/large 
storage and another on replicated ssd for fast storage.  Is this an issue with 
having multiple rbd pools?  Or is this an issue with mon health?  I should 
mention that this is a relatively small cluster of just a couple nodes, single 
mon, single mds, 2 rgw, 13 osd - it's basically a lab and home storage.

Thanks,
Wes Cilldhaire
Sol1
(null)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unable to list rbd block > images in nautilus dashboard

2019-04-01 Thread Wes Cilldhaire
Sorry slight correction, nautilus dashboard has finally listed the images, it 
just took even longer still.  It's also reporting the same "Warning  Displaying 
previously cached data for pools rbd, rbd_repl_ssd." messages as before and is 
clearly struggling.

Thanks,
Wes Cilldhaire
Sol1


- Original Message -
From: "Wes Cilldhaire" 
To: "ceph-users" 
Sent: Tuesday, 2 April, 2019 11:38:30 AM
Subject: [ceph-users] Unable to list rbd block > images in nautilus dashboard

Hi all, 

I've been having an issue with the dashboard being unable to list block images. 
In the mimic and luminous dashboards it would take a very long time to load, 
eventually telling me it was showing a cached list, and after a few auto 
refreshes it would finally show all rbd images and their properties. In the 
nautilus dashboard however it just times out and never tries again, display 
'Could not load data. Please check the cluster health' - cluster however 
reports healthy. Using the cli to retrieve information works, however it can be 
slow to calculate du for every image. 

I know this isn't a problem with the dashboard itself but instead whatever 
mechanism its using under the hood to retrieve the information regarding the 
block images. I'm not sure what component needs to be diagnosed here though. 
The cluster itself is performant, VMs running from the rbd images are 
performant. I do have multiple rbd pools though, one on EC hdds for slow/large 
storage and another on replicated ssd for fast storage. Is this an issue with 
having multiple rbd pools? Or is this an issue with mon health? I should 
mention that this is a relatively small cluster of just a couple nodes, single 
mon, single mds, 2 rgw, 13 osd - it's basically a lab and home storage. 

Thanks, 
Wes Cilldhaire 
Sol1 
(null) 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
(null)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Update crushmap when monitors are down

2019-04-01 Thread Pardhiv Karri
Hi,

Our ceph production cluster is down when updating crushmap. Now we can't
get out monitors to come online and when they come online for a fraction of
a second we see crush map errors in logs. How can we update crushmap when
monitors are down as none of the ceph commands are working.

Thanks,
Pardhiv Karri
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Update crushmap when monitors are down

2019-04-01 Thread huang jun
Can you provide detail error logs  when mon crash?

Pardhiv Karri  于2019年4月2日周二 上午9:02写道:
>
> Hi,
>
> Our ceph production cluster is down when updating crushmap. Now we can't get 
> out monitors to come online and when they come online for a fraction of a 
> second we see crush map errors in logs. How can we update crushmap when 
> monitors are down as none of the ceph commands are working.
>
> Thanks,
> Pardhiv Karri
>
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Update crushmap when monitors are down

2019-04-01 Thread Pardhiv Karri
Hi Huang,

We are on ceph Luminous 12.2.11

The primary is sh1ora1300 but that is not coming up at all. sh1ora1301 and
sh1ora1302 are coming up and are in quorum as per log but still not able to
run any ceph commands. Below is part of the log.

2019-04-02 00:48:51.644339 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 105 :
cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:51:41.706135 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 292 :
cluster [WRN] overall HEALTH_WARN crush map has legacy tunables (require
bobtail, min is firefly); 399 osds down; 14 hosts (17 osds) down;
785718/146017356 objects misplaced (0.538%); 10/48672452 objects unfound
(0.000%); Reduced data availability: 11606 pgs inactive, 86 pgs down, 779
pgs peering, 3081 pgs stale; Degraded data redundancy: 59329035/146017356
objects degraded (40.631%), 16508 pgs degraded, 19795 pgs undersized; 1/3
mons down, quorum sh1ora1301,sh1ora1302
2019-04-02 00:52:15.583292 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 293 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.224838 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 294 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.256251 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 295 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:39.810572 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 296 :
cluster [INF] mon.sh1ora1301 is new leader, mons sh1ora1301,sh1ora1302 in
quorum (ranks 1,2)
2019-04-02 00:48:06.751139 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 104 :
cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:48:51.644339 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 105 :
cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:51:41.706135 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 292 :
cluster [WRN] overall HEALTH_WARN crush map has legacy tunables (require
bobtail, min is firefly); 399 osds down; 14 hosts (17 osds) down;
785718/146017356 objects misplaced (0.538%); 10/48672452 objects unfound
(0.000%); Reduced data availability: 11606 pgs inactive, 86 pgs down, 779
pgs peering, 3081 pgs stale; Degraded data redundancy: 59329035/146017356
objects degraded (40.631%), 16508 pgs degraded, 19795 pgs undersized; 1/3
mons down, quorum sh1ora1301,sh1ora1302
2019-04-02 00:52:15.583292 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 293 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.224838 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 294 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.256251 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 295 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:39.810572 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 296 :
cluster [INF] mon.sh1ora1301 is new leader, mons sh1ora1301,sh1ora1302 in
quorum (ranks 1,2)
2019-04-02 00:48:06.751139 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 104 :
cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:48:51.644339 mon.sh1ora1302 mon.2 10.15.29.21:6789/0 105 :
cluster [INF] mon.sh1ora1302 calling monitor election
2019-04-02 00:51:41.706135 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 292 :
cluster [WRN] overall HEALTH_WARN crush map has legacy tunables (require
bobtail, min is firefly); 399 osds down; 14 hosts (17 osds) down;
785718/146017356 objects misplaced (0.538%); 10/48672452 objects unfound
(0.000%); Reduced data availability: 11606 pgs inactive, 86 pgs down, 779
pgs peering, 3081 pgs stale; Degraded data redundancy: 59329035/146017356
objects degraded (40.631%), 16508 pgs degraded, 19795 pgs undersized; 1/3
mons down, quorum sh1ora1301,sh1ora1302
2019-04-02 00:52:15.583292 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 293 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.224838 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 294 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:31.256251 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 295 :
cluster [INF] mon.sh1ora1301 calling monitor election
2019-04-02 00:52:39.810572 mon.sh1ora1301 mon.1 10.15.29.15:6789/0 296 :
cluster [INF] mon.sh1ora1301 is new leader, mons sh1ora1301,sh1ora1302 in
quorum (ranks 1,2)

Thanks,
Pardhiv Karri

On Mon, Apr 1, 2019 at 6:16 PM huang jun  wrote:

> Can you provide detail error logs  when mon crash?
>
> Pardhiv Karri  于2019年4月2日周二 上午9:02写道:
> >
> > Hi,
> >
> > Our ceph production cluster is down when updating crushmap. Now we can't
> get out monitors to come online and when they come online for a fraction of
> a second we see crush map errors in logs. How can we update crushmap when
> monitors are down as none of the ceph commands are working.
> >
> > Thanks,
> > Pardhiv Karri
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Thank you!
> HuangJun
>


-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
__

[ceph-users] MDS stuck at replaying status

2019-04-01 Thread Albert Yue
Hi,

This happens after we restart the active MDS, and somehow the standby MDS
daemon cannot take over successfully and is stuck at up:replaying. It is
showing the following log. Any idea on how to fix this?

2019-04-02 12:54:00.985079 7f6f70670700  1 mds.WXS0023 respawn
2019-04-02 12:54:00.985095 7f6f70670700  1 mds.WXS0023  e:
'/usr/bin/ceph-mds'
2019-04-02 12:54:00.985097 7f6f70670700  1 mds.WXS0023  0:
'/usr/bin/ceph-mds'
2019-04-02 12:54:00.985099 7f6f70670700  1 mds.WXS0023  1: '-f'
2019-04-02 12:54:00.985100 7f6f70670700  1 mds.WXS0023  2: '--cluster'
2019-04-02 12:54:00.985101 7f6f70670700  1 mds.WXS0023  3: 'ceph'
2019-04-02 12:54:00.985102 7f6f70670700  1 mds.WXS0023  4: '--id'
2019-04-02 12:54:00.985103 7f6f70670700  1 mds.WXS0023  5: 'WXS0023'
2019-04-02 12:54:00.985104 7f6f70670700  1 mds.WXS0023  6: '--setuser'
2019-04-02 12:54:00.985105 7f6f70670700  1 mds.WXS0023  7: 'ceph'
2019-04-02 12:54:00.985106 7f6f70670700  1 mds.WXS0023  8: '--setgroup'
2019-04-02 12:54:00.985107 7f6f70670700  1 mds.WXS0023  9: 'ceph'
2019-04-02 12:54:00.985142 7f6f70670700  1 mds.WXS0023 respawning with exe
/usr/bin/ceph-mds
2019-04-02 12:54:00.985145 7f6f70670700  1 mds.WXS0023  exe_path
/proc/self/exe
2019-04-02 12:54:02.139272 7ff8a739a200  0 ceph version 12.2.5
(cad919881333ac92274171586c827e01f554a70a) luminous (stable), process
(unknown), pid 3369045
2019-04-02 12:54:02.141565 7ff8a739a200  0 pidfile_write: ignore empty
--pid-file
2019-04-02 12:54:06.675604 7ff8a0ecd700  1 mds.WXS0023 handle_mds_map
standby
2019-04-02 12:54:26.114757 7ff8a0ecd700  1 mds.0.136021 handle_mds_map i am
now mds.0.136021
2019-04-02 12:54:26.114764 7ff8a0ecd700  1 mds.0.136021 handle_mds_map
state change up:boot --> up:replay
2019-04-02 12:54:26.114779 7ff8a0ecd700  1 mds.0.136021 replay_start
2019-04-02 12:54:26.114784 7ff8a0ecd700  1 mds.0.136021  recovery set is
2019-04-02 12:54:26.114789 7ff8a0ecd700  1 mds.0.136021  waiting for osdmap
14333 (which blacklists prior instance)
2019-04-02 12:54:26.141256 7ff89a6c0700  0 mds.0.cache creating system
inode with ino:0x100
2019-04-02 12:54:26.141454 7ff89a6c0700  0 mds.0.cache creating system
inode with ino:0x1
2019-04-02 12:54:50.148022 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:50.148049 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:54:52.143637 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:54.148122 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:54.148157 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:54:57.143730 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:58.148239 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:54:58.148249 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:55:02.143819 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:02.148311 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:02.148330 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:55:06.148393 7ff89dec7700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:06.148416 7ff89dec7700  1 mds.beacon.WXS0023 _send
skipping beacon, heartbeat map not healthy
2019-04-02 12:55:07.143914 7ff8a1ecf700  1 heartbeat_map is_healthy
'MDSRank' had timed out after 15
2019-04-02 12:55:07.615602 7ff89e6c8700  1 heartbeat_map reset_timeout
'MDSRank' had timed out after 15
2019-04-02 12:55:07.618294 7ff8a0ecd700  1 mds.WXS0023 map removed me
(mds.-1 gid:7441294) from cluster due to lost contact; respawning
2019-04-02 12:55:07.618296 7ff8a0ecd700  1 mds.WXS0023 respawn
2019-04-02 12:55:07.618314 7ff8a0ecd700  1 mds.WXS0023  e:
'/usr/bin/ceph-mds'
2019-04-02 12:55:07.618318 7ff8a0ecd700  1 mds.WXS0023  0:
'/usr/bin/ceph-mds'
2019-04-02 12:55:07.618319 7ff8a0ecd700  1 mds.WXS0023  1: '-f'
2019-04-02 12:55:07.618320 7ff8a0ecd700  1 mds.WXS0023  2: '--cluster'
2019-04-02 12:55:07.618320 7ff8a0ecd700  1 mds.WXS0023  3: 'ceph'
2019-04-02 12:55:07.618321 7ff8a0ecd700  1 mds.WXS0023  4: '--id'
2019-04-02 12:55:07.618321 7ff8a0ecd700  1 mds.WXS0023  5: 'WXS0023'
2019-04-02 12:55:07.618322 7ff8a0ecd700  1 mds.WXS0023  6: '--setuser'
2019-04-02 12:55:07.618323 7ff8a0ecd700  1 mds.WXS0023  7: 'ceph'
2019-04-02 12:55:07.618323 7ff8a0ecd700  1 mds.WXS0023  8: '--setgroup'
2019-04-02 12:55:07.618325 7ff8a0ecd700  1 mds.WXS0023  9: 'ceph'
2019-04-02 12:55:07.618352 7ff8a0ecd700  1 mds.WXS0023 respawning with exe
/usr/bin/ceph-mds
2019-04-02 12:55:07.618353 7ff8a0ecd700  1 mds.WXS0023  exe_path
/proc/self/exe
2019-04-02 12:55:09.174064 7f4c596be200  0 ceph version 12.2.5
(cad919881333ac92274171586c827e01f554a70a) luminous (stabl