Re: [ceph-users] cephfs causing high load on vm, taking down 15 min later another cephfs vm

2019-05-23 Thread Frank Schilder
Hi Marc, if you can exclude network problems, you can ignore this message. The only time we observed something that might be similar to your problem was, when a network connection was overloaded. Potential causes include - broadcast storm - the "too much cache memory" issues https://www.suse.c

Re: [ceph-users] RGW metadata pool migration

2019-05-23 Thread Janne Johansson
Den ons 22 maj 2019 kl 17:43 skrev Nikhil Mitra (nikmitra) < nikmi...@cisco.com>: > Hi All, > > What are the metadata pools in an RGW deployment that need to sit on the > fastest medium to better the client experience from an access standpoint ? > > Also is there an easy way to migrate these pools

Re: [ceph-users] RGW metadata pool migration

2019-05-23 Thread Konstantin Shalygin
What are the metadata pools in an RGW deployment that need to sit on the fastest medium to better the client experience from an access standpoint ? Also is there an easy way to migrate these pools in a PROD scenario with minimal to no-outage if possible ? Just change crush rule to place defaul

Re: [ceph-users] Crush rule for "ssd first" but without knowing how much

2019-05-23 Thread Dan van der Ster
Did I understand correctly: you have a crush tree with both ssd and hdd devices, and you want to direct PGs to the ssds, until they reach some fullness threshold, and only then start directing PGs to the hdds? I can't think of a crush rule alone to achieve that. But something you could do is add a

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
Hi, we have set the PGs to recover and now they are stuck in active+recovery_wait+degraded and instructing them to deep-scrub does not change anything. Hence, the rados report is empty. Is there a way to stop the recovery wait to start the deep-scrub and get the output? I guess the recovery_w

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Dan van der Ster
What's the full ceph status? Normally recovery_wait just means that the relevant osd's are busy recovering/backfilling another PG. On Thu, May 23, 2019 at 10:53 AM Kevin Flöh wrote: > > Hi, > > we have set the PGs to recover and now they are stuck in > active+recovery_wait+degraded and instructi

[ceph-users] Update minic to nautilus documentation error

2019-05-23 Thread Andres Rojas Guerrero
Hi all, I have followed the Ceph documentation in order to update from Mimic to Nautilus: https://ceph.com/releases/v14-2-0-nautilus-released/ The process gone well but I have seen that two links with important information doesn't work: "v2 network protocol" "Updating ceph.conf and mon_host" h

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Marc Roos
I have been following this thread for a while, and thought I need to have "major ceph disaster" alert on the monitoring ;) http://www.f1-outsourcing.eu/files/ceph-disaster.mp4 -Original Message- From: Kevin Flöh [mailto:kevin.fl...@kit.edu] Sent: donderdag 23 mei 2019 10:51 To:

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
This is the current status of ceph:   cluster:     id: 23e72372-0d44-4cad-b24f-3641b14b86f4     health: HEALTH_ERR     9/125481144 objects unfound (0.000%)     Degraded data redundancy: 9/497011417 objects degraded (0.000%), 7 pgs degraded     9 stuck requests are bl

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Dan van der Ster
I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer their degraded PGs. Open a window with `watch ceph -s`, then in another window slowly do ceph osd down 1 # then wait a minute or so for that osd.1 to re-peer fully. ceph osd down 11 ... Continue that for each o

[ceph-users] Ceph dovecot

2019-05-23 Thread Marc Roos
Sorry for not waiting until it is published on the ceph website but, anyone attended this talk? Is it production ready? https://cephalocon2019.sched.com/event/M7j8 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.c

Re: [ceph-users] Ceph dovecot

2019-05-23 Thread Wido den Hollander
On 5/23/19 12:02 PM, Marc Roos wrote: > > Sorry for not waiting until it is published on the ceph website but, > anyone attended this talk? Is it production ready? > Danny from Deutsche Telekom can answer this better, but no, it's not production ready. It seems it's more challenging to get

Re: [ceph-users] Ceph dovecot

2019-05-23 Thread Kai Wagner
Hi Marc, let me add Danny so he's aware of your request. Kai On 23.05.19 12:13, Wido den Hollander wrote: > > On 5/23/19 12:02 PM, Marc Roos wrote: >> Sorry for not waiting until it is published on the ceph website but, >> anyone attended this talk? Is it production ready? >> > Danny from Deut

Re: [ceph-users] Update minic to nautilus documentation error

2019-05-23 Thread Andres Rojas Guerrero
I have found that it's better to follow this links from the documentation not from the Ceph Blog: http://docs.ceph.com/docs/nautilus/releases/nautilus/ Here the links are working. On 23/5/19 10:56, Andres Rojas Guerrero wrote: > Hi all, I have followed the Ceph documentation in order to upd

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
thank you for this idea, it has improved the situation. Nevertheless, there are still 2 PGs in recovery_wait. ceph -s gives me:   cluster:     id: 23e72372-0d44-4cad-b24f-3641b14b86f4     health: HEALTH_WARN     3/125481112 objects unfound (0.000%)     Degraded data redundanc

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Alexandre Marangone
The PGs will stay active+recovery_wait+degraded until you solve the unfound objects issue. You can follow this doc to look at which objects are unfound[1] and if no other recourse mark them lost [1] http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects . On T

[ceph-users] large omap object in usage_log_pool

2019-05-23 Thread shubjero
Hi there, We have an old cluster that was built on Giant that we have maintained and upgraded over time and are now running Mimic 13.2.5. The other day we received a HEALTH_WARN about 1 large omap object in the pool '.usage' which is our usage_log_pool defined in our radosgw zone. I am trying to

[ceph-users] [events] Ceph Day Netherlands July 2nd - CFP ends June 3rd

2019-05-23 Thread Mike Perez
Hi everyone, We will be having Ceph Day Netherlands July 2nd! https://ceph.com/cephdays/netherlands-2019/ The CFP will be ending June 3rd, so there is still time to get your Ceph related content in front of the Ceph community ranging from all levels of expertise: https://zfrmz.com/E3ouYm0NiPF1b

Re: [ceph-users] Ceph and multiple RDMA NICs

2019-05-23 Thread Lazuardi Nasution
Hi David and Justinas, I'm interested with this old thread. Have it been solved? Would you mind to share the solution and reference regarding to David statement of some threads on the ML about RDMA? Best regards, > Date: Fri, 02 Mar 2018 06:12:18 + > From: David Turner > To: Justinas LINGY

[ceph-users] Cephfs free space vs ceph df free space disparity

2019-05-23 Thread Robert Ruge
Ceph newbie question. I have a disparity between the free space that my cephfs file system is showing and what ceph df is showing. As you can see below my cephfs file system says there is 9.5TB free however ceph df says there is 186TB which with replication size 3 should equate to 62TB free spa

Re: [ceph-users] large omap object in usage_log_pool

2019-05-23 Thread Konstantin Shalygin
in the config. ```"rgw_override_bucket_index_max_shards": "8",```. Should this be increased? Should be decreased to default `0`, I think. Modern Ceph releases resolve large omaps automatically via bucket dynamic resharding: ``` {     "option": {     "name": "rgw_dynamic_resharding",