[ceph-users] Re: bad balacing (octopus)

2020-06-05 Thread Eugen Block
Hi, osd.52 has a crush weight of 3.6 while osd.34 has a weight of 1.0 (although it's a 2.7 TB disk), that seems odd but it explains the imbalance, probably caused by the reweight cronjob. Maybe the combination of balancer module and the reweight cronjob messed things up, hard to say. I'm

[ceph-users] Re: log_channel(cluster) log [ERR] : Error -2 reading object

2020-06-05 Thread Eugen Block
Hi, is it an EC pool with fast_read enabled? [1] sounds like a possible explanation if your cluster is not up-to-date, I guess. Regards, Eugen [1] https://github.com/ceph/ceph/pull/24225 Zitat von Frank Schilder : Hi all, I found these messages today: 2020-06-04 17:07:57.471 7fa0aa16e

[ceph-users] mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Francois Legrand
Hi all, We have a ceph nautilus cluster (14.2.8) with two cephfs filesystem and 3 mds (1 active for each fs + one failover). We are transfering all the datas (~600M files) from one FS (which was in EC 3+2) to the other FS (in R3). On the old FS we first removed the snapshots (to avoid strays pro

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-05 Thread Frank Schilder
Hi George, yes, I believe your interpretation is correct. because the chassis buckets have new bucket IDs, the distribution hashing will change. I also believe that the trick to avoid data movement in your situation is, to export the new crush map, swap the IDs between corresponding host and bu

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Francois Legrand
Hi, Thanks for your answer. I have : osd_op_queue=wpq osd_op_queue_cut_off=low I can try to set osd_op_queue_cut_off to high, but it will be useful only if the mds get active, true ? For now, the mds_cache_memory_limit is set to 8 589 934 592 (so 8GB which seems reasonable for a mds server with

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Francois Legrand
I was also wondering if setting mds dump cache after rejoin could help ? Le 05/06/2020 à 12:49, Frank Schilder a écrit : Out of interest, I did the same on a mimic cluster a few months ago, running up to 5 parallel rsync sessions without any problems. I moved about 120TB. Each rsync was runni

[ceph-users] Re: Change mon bind address / Change IPs with the orchestrator

2020-06-05 Thread Simon Sutter
Hello, Ok, thanks Wido. I have now sorted out the correct network configuration and deployed new mon's with the new IP. Everything is now on the new IP and works so far. Simon Von: Wido den Hollander Gesendet: Donnerstag, 4. Juni 2020 08:47:29 An: Simon Su

[ceph-users] Re: Cephadm and Ceph versions

2020-06-05 Thread Simon Sutter
Hello Andy, I had mixed experiences with cephadm. What I would do: Check if all your daemons indeed are running in the according containers on every node. You can check it with "ceph orch ps" If that is the case, you can get rid of the old rpms and install the new ceph-common v15 rpm. You

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Francois Legrand
Le 05/06/2020 à 14:18, Frank Schilder a écrit : Hi Francois, I was also wondering if setting mds dump cache after rejoin could help ? Haven't heard of that option. Is there some documentation? I found it on : https://docs.ceph.com/docs/nautilus/cephfs/mds-config-ref/ mds dump cache after re

[ceph-users] Cephadm and Ceph versions

2020-06-05 Thread biohazd
Hi I had a cluster on v13 (mimic) and have converted it to Octopus (15.2.3) and using Cephadm. In the dashboard is showing as all v15 What do I need to do with the Ceph rpms that are installed as they are all Ceph version 13. Do I remove them and install Ceph rpms with version 15 ? Regard

[ceph-users] Re: log_channel(cluster) log [ERR] : Error -2 reading object

2020-06-05 Thread Frank Schilder
Hi Eugen, thanks, yes it sounds like that. Its an EC pool with fast read enabled. Our cluster is on 13.2.8 and I plan to upgrade to 13.2.10 soonish. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Euge

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Frank Schilder
Out of interest, I did the same on a mimic cluster a few months ago, running up to 5 parallel rsync sessions without any problems. I moved about 120TB. Each rsync was running on a separate client with its own cache. I made sure that the sync dirs were all disjoint (no overlap of files/directorie

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-05 Thread Kyriazis, George
Hmm, Sounds quite dangerous. On the other hand, and from prior experience, it could take weeks/months for the cluster to rebalance, so I give it a try. From the looks of it, there is no other reference to IDs, is that correct? Just swap IDs between chassis and host and I should be OK? (Sorry

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-05 Thread Kyriazis, George
Hmm, From what I see in the crush map, “nodes” refers to other “nodes” by name, not by ID. In fact, I don’t see anything in the crush map referred to by ID. As we said before, though, the crush algorithm figures out the hashes based on the IDs. I am not sure what else refers to them, though

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-05 Thread Anthony D'Atri
> > Why don't you do the crush map change together with that? All the data will > be reshuffled then any w Couldn’t the upmap trick be used to manage this? Compute a full set to pin the before mappings, then incrementally remove them so that data moves in a controlled fashion? I ssupect tha

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Frank Schilder
Hi Francois, > I was also wondering if setting mds dump cache after rejoin could help ? Haven't heard of that option. Is there some documentation? > I have : > osd_op_queue=wpq > osd_op_queue_cut_off=low > I can try to set osd_op_queue_cut_off to high, but it will be useful > only if the mds ge

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Frank Schilder
Hi Francois, thanks for the link. The option "mds dump cache after rejoin" is for debugging purposes only. It will write the cache after rejoin to a file, but not drop the cache. This will not help you. I think this was implemented recently to make it possible to send a cache dump file to devel

[ceph-users] Re: changing acces vlan for all the OSDs - potential downtime ?

2020-06-05 Thread Stan Lea
I would add as well: ceph osd set norecover Stan Lea ‐‐‐ Original Message ‐‐‐ On Friday, June 5, 2020 1:03 AM, Konstantin Shalygin wrote: > On 6/4/20 4:26 PM, Adrian Nicolae wrote: > > > Hi all, > > I have a Ceph cluster with a standard setup : > > > > - the public network : MONs and

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-05 Thread Kyriazis, George
I’m hesitant to do this, too. I think I’ll pass and just wait for the remapping. :-) George > On Jun 5, 2020, at 12:58 PM, Frank Schilder wrote: > > I never changed IDs before, I'm just extra cautious. If they do not show up > explicitly anywhere else than inside the bucket definitions, then

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-05 Thread Frank Schilder
Wido replied to you, check this thread. You really need to understand the file you get exactly. The IDs are used to refer to items from within other items. You need to make sure that any such cross-reference is updated as well. It is not just changing the ID tag in a bucket item, you also need

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-06-05 Thread Gencer W . Genç
Hi Sebastian, I know ceph doesn't meant for that. See, We have 3 clusters. 2 of them has 9 nodes each and 3 mons and 3 managers. Only one of them is 2 node. We use this 2-node only for testing and development purposes. We didn't want to spend more resources on test-only environment. Thank you

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Francois Legrand
Hi, Unfortunately adding swap did not solve the problem ! I added 400 GB of swap. It used about 18GB of swap after consuming all the ram and stopped with the following logs : 2020-06-05 21:33:31.967 7f251e7eb700  1 mds.lpnceph-mds04.in2p3.fr Updating MDS map to version 324691 from mon.1 2020-0

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-05 Thread Frank Schilder
I never changed IDs before, I'm just extra cautious. If they do not show up explicitly anywhere else than inside the bucket definitions, then it is probably an easy edit and just swapping them. If you try this, could you please report back to the list if it works as expected, maybe with example

[ceph-users] Re: dealing with spillovers

2020-06-05 Thread Reed Dier
I'm going to piggy back on this somewhat. I've battled RocksDB spillovers over the course of the life of the cluster since moving to bluestore, however I have always been able to compact it well enough. But now I am stumped at getting this to compact via $ceph tell osd.$osd compact, which has

[ceph-users] Re: crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-06-05 Thread Igor Fedotov
Hi Simon, On 6/2/2020 10:59 PM, Simon Leinen wrote: Igor Fedotov writes: 2) Main device space is highly fragmented - 0.84012572151981013 where 1.0 is the maximum. Can't say for sure but I presume it's pretty full as well. As I said, these disks aren't that full as far as bytes are concerned. B

[ceph-users] Re: dealing with spillovers

2020-06-05 Thread Igor Fedotov
This might help -see comment #4 at https://tracker.ceph.com/issues/44509 And just for the sake of information collection - what Ceph version is used in this cluster? Did you setup DB volume along with OSD deployment or they were added later as  was done in the ticket above? Thanks, Igor

[ceph-users] Re: Cephadm Hangs During OSD Apply

2020-06-05 Thread m
Cool. Cool cool cool. Looks like the issue I was experiencing was resolved by https://github.com/ceph/ceph/pull/34745. Didn't know encrypted OSD's weren't supported at all. v15.2.0 did seem to handle them fine, looks like 15.2.1 and 15.2.2 have some regression there. Under 15.2.1 OSD's are now

[ceph-users] Re: mds behind on trimming - replay until memory exhausted

2020-06-05 Thread Frank Schilder
Hi Francois, yes, the beacon grace needs to be higher due to the latency of swap. Not sure if 60s will do. For this particular recovery operation, you might want to go much higher (1h) and watch the cluster health closely. Good luck and best regards, = Frank Schilder AIT Risø Ca

[ceph-users] Re: dealing with spillovers

2020-06-05 Thread Reed Dier
The WAL/DB was part of the OSD deployment. OSD is running 14.2.9. Would grabbing the ceph-kvstore-tool bluestore-kv stats as in that ticket be of any usefulness to this? Thanks, Reed > On Jun 5, 2020, at 5:27 PM, Igor Fedotov wrote: > > This might help -see comment #4 at https://tracker.ce