[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-19 Thread Stefan Kooman
Hi, On 19-06-2024 11:15, Dietmar Rieder wrote: Please follow https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts. OK, when I run the cephfs-journal-tool I get an error: # cephfs-journal-tool journal export backup.bin Error ((22) Invalid argument) My

[ceph-users] Re: [EXTERN] Urgent help with degraded filesystem needed

2024-07-01 Thread Stefan Kooman
Hi Dietmar, On 29-06-2024 10:50, Dietmar Rieder wrote: Hi all, finally we were able to repair the filesystem and it seems that we did not lose any data. Thanks for all suggestions and comments. Here is a short summary of our journey: Thanks for writing this up. This might be useful for som

[ceph-users] Re: [EXTERN] Urgent help with degraded filesystem needed

2024-07-02 Thread Stefan Kooman
Hi Venky, On 02-07-2024 09:45, Venky Shankar wrote: Hi Stefan, On Mon, Jul 1, 2024 at 2:30 PM Stefan Kooman wrote: Hi Dietmar, On 29-06-2024 10:50, Dietmar Rieder wrote: Hi all, finally we were able to repair the filesystem and it seems that we did not lose any data. Thanks for all

[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-07-08 Thread Stefan Kooman
On 02-04-2024 15:09, Zakhar Kirpichenko wrote: Hi, I'm adding a few OSDs to an existing cluster, the cluster is running with `osd noout,noin`: cluster: id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86 health: HEALTH_WARN noout,noin flag(s) set Specifically `noin` is docum

[ceph-users] Re: [EXTERN] Urgent help with degraded filesystem needed

2024-07-10 Thread Stefan Kooman
Hi, On 01-07-2024 10:34, Stefan Kooman wrote: Not that I know of. But changes in behavior of Ceph (daemons) and or Ceph kernels would be good to know about indeed. I follow the ceph-kernel mailing list to see what is going on with the development of kernel CephFS. And there is a thread

[ceph-users] cephadm for Ubuntu 24.04

2024-07-10 Thread Stefan Kooman
Hi, Is it possible to only build "cephadm", so not the other ceph packages / daemons? Or can we think about a way to have cephadm packages build for all supported mainstream linux releases during the supported lifetime of a Ceph release: i.e. debian, Ubuntu LTS, CentOS Stream? I went ahead a

[ceph-users] Re: cephadm for Ubuntu 24.04

2024-07-11 Thread Stefan Kooman
On 11-07-2024 09:55, Malte Stroem wrote: Hello Stefan, have a look: https://docs.ceph.com/en/latest/cephadm/install/#curl-based-installation Yeah, I have read that part. Just download cephadm. It will work on any distro. curl --silent --remote-name --location https://download.ceph.com/r

[ceph-users] Re: cephadm for Ubuntu 24.04

2024-07-11 Thread Stefan Kooman
On 11-07-2024 14:20, John Mulligan wrote: On Thursday, July 11, 2024 4:22:28 AM EDT Stefan Kooman wrote: On 11-07-2024 09:55, Malte Stroem wrote: Hello Stefan, have a look: https://docs.ceph.com/en/latest/cephadm/install/#curl-based-installation Yeah, I have read that part. Just download

[ceph-users] Re: cephadm for Ubuntu 24.04

2024-07-12 Thread Stefan Kooman
On 12-07-2024 09:33, tpDev Tester wrote: Hi, Am 11.07.2024 um 14:20 schrieb John Mulligan: ... as far as I know, we still have an issue https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2063456 with ceph on 24.04. I tried the offered fix, but was still unable to establish a running clust

[ceph-users] Re: Cephadm has a small wart

2024-07-19 Thread Stefan Kooman
On 19-07-2024 14:04, Tim Holloway wrote: Ah. Makes sense. Might be nice if the container build appended something like "cephadm container" to the redhat-release string, though. A more concerning item is that the container is based on CentOS 8 Stream. I'd feel more comfortable if the base OS was

[ceph-users] Re: v19.1.0 Squid RC0 released

2024-07-19 Thread Stefan Kooman
Hi, On 12-07-2024 00:27, Yuri Weinstein wrote: ... * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ I see that only packages have been build for Ubuntu 22.04 LTS. Will there also be packages built for 24.04 LTS (the current LTS)? Thanks, Gr. Stefan _

[ceph-users] Re: Ceph on Ubuntu 24.04 - Arm64

2024-07-30 Thread Stefan Kooman
On 30-07-2024 15:48, John Mulligan wrote: On Tuesday, July 30, 2024 8:46:31 AM EDT Daniel Brown wrote: Is there any workable solution for running Ceph on Ubuntu 24.04 on Arm64? I’ve tried about every package install method I could think of, short of compiling it myself. I’m aiming for a “cepha

[ceph-users] Re: bluefs _allocate unable to allocate on bdev 2

2024-09-12 Thread Stefan Kooman
On 12-09-2024 06:43, Szabo, Istvan (Agoda) wrote: Maybe we are running into this bug Igor? https://github.com/ceph/ceph/pull/48854 That would be a solution for the bug you might be hitting (unable to allocate 64K aligned blocks for RocksDB). I would not be surprised if you hit this issue if

[ceph-users] Re: bluefs _allocate unable to allocate on bdev 2

2024-09-12 Thread Stefan Kooman
*From:* Stefan Kooman *Sent:* Thursday, September 12, 2024 3:54 PM *To:* Szabo, Istvan (Agoda) ; igor.fedo...@croit.io *Cc:* Ceph Users *Subject:* Re: [ceph-users] Re: bluefs _allocate unable to allocate on bdev 2 Email received from the internet. If in doubt, don't click any

[ceph-users] Re: Numa pinning best practices

2024-09-13 Thread Stefan Kooman
On 07-05-2024 22:37, Szabo, Istvan (Agoda) wrote: Hi, Haven't really found a proper descripton in case of 2 socket how to pin osds to numa node, only this: https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#Ceph-Storage-Node-NUMA-Tuning Tuning for All Flash Deployment

[ceph-users] Re: Slow ops on OSDs

2020-10-06 Thread Stefan Kooman
On 2020-10-06 13:05, Igor Fedotov wrote: > > On 10/6/2020 1:04 PM, Kristof Coucke wrote: >> Another strange thing is going on: >> >> No client software is using the system any longer, so we would expect >> that all IOs are related to the recovery (fixing of the degraded PG). >> However, the disks

[ceph-users] Re: Slow ops on OSDs

2020-10-06 Thread Stefan Kooman
On 2020-10-06 14:18, Kristof Coucke wrote: > Ok, I did the compact on 1 osd. > The utilization is back to normal, so that's good... Thumbs up to you guys! We learned the hard way, but happy to spot the issue and share the info. > Though, one thing I want to get out of the way before adapting the

[ceph-users] Re: Slow ops on OSDs

2020-10-06 Thread Stefan Kooman
On 2020-10-06 15:27, Igor Fedotov wrote: > I'm working on improving PG removal in master, see: > https://github.com/ceph/ceph/pull/37496 > > Hopefully this will help in case of "cleanup after rebalancing" issue > which you presumably had. That would be great. Does the offline compaction with the

[ceph-users] Re: Ceph User Survey 2020 - Working Group Invite

2020-10-09 Thread Stefan Kooman
On 2020-10-09 19:12, anantha.ad...@intel.com wrote: > Hello all, > > This is an invite to all interested to join a working group being formed > for 2020 Ceph User Survey planning. I'm interested. How and when will this working group come together? Gr. Stefan

[ceph-users] Re: Ubuntu 20 with octopus

2020-10-12 Thread Stefan Kooman
On 2020-10-12 08:58, Seena Fallah wrote: > I've seen this PR that reverts the latest ubuntu version from 20.04 to > 18.04 because of some failures! > Are there any updates on this? > https://github.com/ceph/ceph/pull/35110 Apparently there have been attempts to get Ceph built on Focal. I did not g

[ceph-users] Re: Ubuntu 20 with octopus

2020-10-12 Thread Stefan Kooman
On 2020-10-12 09:28, Robert Sander wrote: > Hi, > > Am 12.10.20 um 02:31 schrieb Seena Fallah: >> >> Does anyone has any production cluster with ubuntu 20 (focal) or any >> suggestion or any bugs that prevents to deploy Ceph octopus on Ubuntu 20? > > The underlying distribution does not matter an

[ceph-users] Re: Huge RAM Ussage on OSD recovery

2020-10-21 Thread Stefan Kooman
On 2020-10-20 23:57, Ing. Luis Felipe Domínguez Vega wrote: > Hi, today mi Infra provider has a blackout, then the Ceph was try to > recover but are in an inconsistent state because many OSD can recover > itself because the kernel kill it by OOM. Even now one OSD that was OK, > go down by OOM kille

[ceph-users] Re: Hardware needs for MDS for HPC/OpenStack workloads?

2020-10-23 Thread Stefan Kooman
On 2020-10-22 14:34, Matthew Vernon wrote: > Hi, > > We're considering the merits of enabling CephFS for our main Ceph > cluster (which provides object storage for OpenStack), and one of the > obvious questions is what sort of hardware we would need for the MDSs > (and how many!). Is it a many pa

[ceph-users] Re: Ceph not showing full capacity

2020-10-24 Thread Stefan Kooman
On 2020-10-24 14:53, Amudhan P wrote: > Hi, > > I have created a test Ceph cluster with Ceph Octopus using cephadm. > > Cluster total RAW disk capacity is 262 TB but it's allowing to use of only > 132TB. > I have not set quota for any of the pool. what could be the issue? Unbalance? What does ce

[ceph-users] Re: Ceph not showing full capacity

2020-10-25 Thread Stefan Kooman
On 2020-10-25 05:33, Amudhan P wrote: > Yes, There is a unbalance in PG's assigned to OSD's. > `ceph osd df` output snip > ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META   >    AVAIL    %USE   VAR   PGS  STATUS >  0    hdd  5.45799   1.0  5.5 TiB  3.6 TiB  3.6 TiB  9.7 M

[ceph-users] Re: Ceph not showing full capacity

2020-10-25 Thread Stefan Kooman
On 2020-10-25 15:20, Amudhan P wrote: > Hi, > > For my quick understanding How PG's are responsible for allowing space > allocation to a pool? > > My understanding that PG's basically helps in object placement when the > number of PG's for a OSD's is high there is a high possibility that PG > get

[ceph-users] Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

2020-10-26 Thread Stefan Kooman
On 2020-09-14 16:22, Igor Fedotov wrote: > Thanks! > > Now got the root cause. The fix is on its way... What is the commit / PR for this fix? Is this fixed in 14.2.12? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an

[ceph-users] Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

2020-10-26 Thread Stefan Kooman
On 2020-09-27 22:11, Igor Fedotov wrote: > > On 9/25/2020 6:07 PM, sa...@planethoster.info wrote: >> Hi Igor, >> >> The only thing abnormal about this osdstore is that it was created by >> Mimic 13.2.8 and I can see that the OSDs size of this osdstore are not >> the same as the others in the clust

[ceph-users] Re: Monitor persistently out-of-quorum

2020-10-29 Thread Stefan Kooman
On 2020-10-29 01:26, Ki Wong wrote: > Hello, > > I am at my wit's end. > > So I made a mistake in the configuration of my router and one > of the monitors (out of 3) dropped out of the quorum and nothing > I’ve done allow it to rejoin. That includes reinstalling the > monitor with ceph-ansible.

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Stefan Kooman
On 2020-10-29 06:55, Mark Johnson wrote: > I've been struggling with this one for a few days now. We had an OSD report > as near full a few days ago. Had this happen a couple of times before and a > reweight-by-utilization has sorted it out in the past. Tried the same again > but this time we

[ceph-users] MDS restarts after enabling msgr2

2020-10-29 Thread Stefan Kooman
Hi List, After a successful upgrade from Mimic 13.2.8 to Nautilus 14.2.12 we enabled msgr2. Soon after that both of the MDS servers (active / active-standby) restarted. We did not hit any ASSERTS this time, so that's good :>. However, I have not seen this happening on four different test cluster

[ceph-users] Re: Beginner's installation questions about network

2020-11-13 Thread Stefan Kooman
On 2020-11-13 21:19, E Taka wrote: > Hello, > > I want to install Ceph Octopus on Ubuntu 20.04. The nodes for have 2 > network interfaces: 192.168.1.0/24 for the cluster network, and a > 10.10.0.0/16 is the public network. When I bootstrap with cephadm, which > Network do I use? That means, do i u

[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Stefan Kooman
On 2020-11-30 15:55, Oliver Weinmann wrote: > I have another error "pgs undersized", maybe this is also causing trouble? This is a result of the loss of one OSD, and the PGs located on it. As you only have 1 OSDs left, the cluster cannot recover on a third OSD (assuming defaults here). The cluste

[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Stefan Kooman
On 2020-12-01 10:21, Oliver Weinmann wrote: > Hi Stefan, > > unfortunately It doesn't start. > > The failed osd (osd.0) is located on gedaopl02 > > I can start the service but then after a minute or so it fails. Maybe > I'm looking at the wrong log file, but it's empty: Maybe it hits a timeout

[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Stefan Kooman
On 2020-12-01 13:19, Oliver Weinmann wrote: > > podman ps -a didn't show that container. So I googled and stumbled over > this post: > > https://github.com/containers/podman/issues/2553 > > I was able to fix it by running: > > podman rm --storage > e43f8533d6418267d7e6f3a408a566b4221df4fb51b13

[ceph-users] Re: slow down keys/s in recovery

2020-12-02 Thread Stefan Kooman
On 12/1/20 12:37 AM, Seena Fallah wrote: Hi all, Is there any configuration to slow down keys/s in recovery mode? Not just keys, but you can limit recovery / backfill like this: ceph tell 'osd.*' injectargs '--osd_max_backfills 1' ceph tell 'osd.*' injectargs '--osd_recovery_max_active 1' Gr

[ceph-users] Re: slow down keys/s in recovery

2020-12-02 Thread Stefan Kooman
On 12/2/20 2:46 PM, Seena Fallah wrote: I did the same but it moved 200K keys/s! You might also want to decrease the op priority (as in _increasing_ the number) of "osd_recovery_op_priority". Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.

[ceph-users] Re: slow down keys/s in recovery

2020-12-02 Thread Stefan Kooman
On 12/2/20 2:55 PM, Seena Fallah wrote: This is what I used in recovery: osd max backfills = 1 osd recovery max active = 1 osd recovery op priority = 1 ^^ Shouldn't this go to 63 instead of 1? At least if I read this post from SUSE correctly I think it should [1]. osd recovery priority = 1

[ceph-users] Re: slow down keys/s in recovery

2020-12-02 Thread Stefan Kooman
On 12/2/20 3:04 PM, Seena Fallah wrote: I don't think so! I want to slow down the recovery not speed up and it says I should reduce these values. osd recovery op priority: This is the priority set for recovery operation. Lower the number, higher the recovery priority. Higher recovery priority

[ceph-users] Re: slow down keys/s in recovery

2020-12-02 Thread Stefan Kooman
On 12/2/20 5:36 PM, Seena Fallah wrote: If it uses PriorityQueue  Data Structure an element with high priority should be dequeued before an element with low prio

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Stefan Kooman
On 12/16/20 10:21 AM, Matthew Vernon wrote: Hi, On 15/12/2020 20:44, Suresh Rama wrote: TL;DR: use a real NTP client, not systemd-timesyncd +1. We have a lot of "ntp" daemons running, but on Ceph we use "chrony", and it's way faster with converging (especially with very unstable clock sourc

[ceph-users] cephfs flags question

2020-12-17 Thread Stefan Kooman
Hi List, In order te reproduce an issue we see on a production cluster (cephFS client: ceph-fuse outperform kernel client by a factor of 5) we would like to have a test cluster to have the same cephfs "flags" as production. However, it's not completely clear how certain features influence the

[ceph-users] Re: cephfs flags question

2020-12-17 Thread Stefan Kooman
On 12/17/20 5:54 PM, Patrick Donnelly wrote: file system flags are not the same as the "feature" flags. See this doc for the feature flags: https://docs.ceph.com/en/latest/cephfs/administration/#minimum-client-version Thanks for making that clear. Note that the new "fs feature" and "fs requ

[ceph-users] Re: cephfs flags question

2020-12-17 Thread Stefan Kooman
On 12/17/20 7:45 PM, Patrick Donnelly wrote: When a file system is newly created, it's assumed you want all the stable features on, including multiple MDS, directory fragmentation, snapshots, etc. That's what those flags are for. If you've been upgrading your cluster, you need to turn those on

[ceph-users] Re: cephfs flags question

2020-12-18 Thread Stefan Kooman
Hi, On 12/17/20 8:57 PM, Patrick Donnelly wrote: On Thu, Dec 17, 2020 at 11:35 AM Stefan Kooman wrote: On 12/17/20 7:45 PM, Patrick Donnelly wrote: When a file system is newly created, it's assumed you want all the stable features on, including multiple MDS, directory fragment

[ceph-users] Re: cephfs flags question

2020-12-23 Thread Stefan Kooman
On 12/19/20 7:37 PM, Patrick Donnelly wrote: Well that's interesting. I don't have an explanation unfortunately. You upgraded the MDS too, right? Only scenario that could cause this I can think of is that the MDS were never restarted/upgraded to nautilus. Yes, the MDSes were upgraded and rest

[ceph-users] Re: High read IO on RocksDB/WAL since upgrade to Octopus

2020-12-31 Thread Stefan Kooman
On 12/31/20 4:16 AM, Glen Baars wrote: Hello Ceph Users, Since upgrading from Nautilus to Octopus ( cluster started in luminous ) I have been trying to debug why the RocksDB/WAL is maxing out the SSD drives. ( QD > 32, 12000 read IOPS, 200 write IOPS ). From what Nautilus release did you upg

[ceph-users] Re: CEPHFS - MDS gracefull handover of rank 0

2021-01-27 Thread Stefan Kooman
On 1/27/21 3:51 PM, Konstantin Shalygin wrote: Martin, also before restart - issue cache drop command to active mds Don't do this if you have a large cache. It will make your MDS unresponsive and replaced by a standby if available. There is a PR to fix this: https://github.com/ceph/ceph/pull/

[ceph-users] Re: CEPHFS - MDS gracefull handover of rank 0

2021-01-29 Thread Stefan Kooman
On 1/28/21 5:10 AM, Konstantin Shalygin wrote: Interesting, thanks. Do you know tracker ticket for this? No, not even sure if there is a tracker for this. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ce

[ceph-users] Re: CEPHFS - MDS gracefull handover of rank 0

2021-02-12 Thread Stefan Kooman
On 1/27/21 9:08 AM, Martin Hronek wrote: So before the next MDS the FS config where changed to one active and one standby-replay node, the idea was that since the MDS replay nodes follows the active one the handover would be smoother. The active state was reached faster, but we still noticed

[ceph-users] Re: 10G stackabe lacp switches

2021-02-15 Thread Stefan Kooman
On 2/15/21 12:16 PM, mj wrote: As we would like to be able to add more storage hosts, we need to loose the meshed network setup. My idea is to add two stacked 10G ethernet switches to the setup, so we can start using lacp bonded networking over two physical switches. Looking around, we can

[ceph-users] Re: Network design issues

2021-02-15 Thread Stefan Kooman
On 2/15/21 5:38 PM, Frank Schilder wrote: Hi Stefan, I think you gave me the right pointers. Last summer I was looking up exactly this, how do Dell switches hash connections onto members of a LAG. What I found was, that the only option was by MAC. I did a test with iperf using several connect

[ceph-users] Re: POC Hardware questions

2021-02-16 Thread Stefan Kooman
On 2/16/21 9:01 AM, Oliver Weinmann wrote: Dear All, A questions that probalby has been asked by many other users before. I want to do a POC. For the POC I can use old decomissioned hardware. Currently I have 3 x IBM X3550 M5 with: 1 Dualport 10G NIC Intel(R) Xeon(R) CPU E5-2637 v3 @ 3.50

[ceph-users] Re: Network design issues

2021-02-23 Thread Stefan Kooman
On 2/21/21 9:51 AM, Frank Schilder wrote: Hi Stefan, thanks for the additional info. Dell will put me in touch with their deployment team soonish and then I can ask about matching abilities. It turns out that the problem I observed might have a much more profane reason. I saw really long peri

[ceph-users] Re: mds lost very frequently

2021-02-24 Thread Stefan Kooman
On 2/6/20 6:04 PM, Stefan Kooman wrote: Hi, After setting: ceph config set mds mds_recall_max_caps 1 (5000 before change) and ceph config set mds mds_recall_max_decay_rate 1.0 (2.5 before change) And the: ceph tell 'mds.*' injectargs '--mds_recall_max_caps 1&

[ceph-users] cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-02 Thread Stefan Kooman
Hi, On a CentOS 7 VM with mainline kernel (5.11.2-1.el7.elrepo.x86_64 #1 SMP Fri Feb 26 11:54:18 EST 2021 x86_64 x86_64 x86_64 GNU/Linux) and with Ceph Octopus 15.2.9 packages installed. The MDS server is running Nautilus 14.2.16. Messenger v2 has been enabled. Poort 3300 of the monitors is r

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-02 Thread Stefan Kooman
On 3/2/21 5:16 PM, Jeff Layton wrote: On Tue, 2021-03-02 at 09:25 +0100, Stefan Kooman wrote: Hi, On a CentOS 7 VM with mainline kernel (5.11.2-1.el7.elrepo.x86_64 #1 SMP Fri Feb 26 11:54:18 EST 2021 x86_64 x86_64 x86_64 GNU/Linux) and with I'm guessing this is a stable series kernel

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-02 Thread Stefan Kooman
On 3/2/21 5:42 PM, Ilya Dryomov wrote: On Tue, Mar 2, 2021 at 9:26 AM Stefan Kooman wrote: Hi, On a CentOS 7 VM with mainline kernel (5.11.2-1.el7.elrepo.x86_64 #1 SMP Fri Feb 26 11:54:18 EST 2021 x86_64 x86_64 x86_64 GNU/Linux) and with Ceph Octopus 15.2.9 packages installed. The MDS server

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-02 Thread Stefan Kooman
On 3/2/21 6:00 PM, Jeff Layton wrote: v2 support in the kernel is keyed on the ms_mode= mount option, so that has to be passed in if you're connecting to a v2 port. Until the mount helpers get support for that option you'll need to specify the address and port manually if you want to use v2. I

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-02 Thread Stefan Kooman
On 3/2/21 6:54 PM, Ilya Dryomov wrote: --- snip --- osd.0 up in weight 1 up_from 98071 up_thru 98719 down_at 98068 last_clean_interval [96047,98067) [v2:[2001:7b8:80:1:0:1:2:1]:6848/505534,v1:[2001:7b8:80:1:0:1:2:1]:6854/505534,v2:0.0.0.0:6860/505534,v1:0.0.0.0:6866/505534] Where did "v

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-02 Thread Stefan Kooman
On 3/2/21 7:17 PM, Stefan Kooman wrote: What is output of "ceph daemon osd.0 config get ms_bind_ipv4" on the osd0 node? ceph daemon osd.0 config get ms_bind_ipv4 {     "ms_bind_ipv4": "true" } And ceph daemon mds.mds1 config get ms_bind_ipv4 {     &quo

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-03 Thread Stefan Kooman
On 3/2/21 6:00 PM, Jeff Layton wrote: v2 support in the kernel is keyed on the ms_mode= mount option, so that has to be passed in if you're connecting to a v2 port. Until the mount helpers get support for that option you'll need to specify the address and port manually if you want to use v2.

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-03 Thread Stefan Kooman
On 3/3/21 1:16 PM, Ilya Dryomov wrote: I have tested with 5.11 kernel (5.11.2-arch1-1 #1 SMP PREEMPT Fri, 26 Feb 2021 18:26:41 + x86_64 GNU/Linux) port 3300 and ms_mode=crc as well as ms_mode=prefer-crc and that works when cluster is running with ms_bind_ipv4=false. So the "fix" is to have

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-03 Thread Stefan Kooman
On 3/3/21 1:16 PM, Ilya Dryomov wrote: Sure. You are correct that the kernel client needs a bit a work as we haven't considered dual stack configurations there at all. https://tracker.ceph.com/issues/49581 Gr. Stefan ___ ceph-users mailing list --

[ceph-users] Re: cephfs: unable to mount share with 5.11 mainline, ceph 15.2.9, MDS 14.1.16

2021-03-03 Thread Stefan Kooman
On 3/3/21 1:16 PM, Ilya Dryomov wrote: And from this documentation: https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#ipv4-ipv6-dual-stack-mode we learned that dual stack is not possible for any current stable release, but might be possible with latest code. So the takeawa

[ceph-users] Re: balance OSD usage.

2021-03-08 Thread Stefan Kooman
On 3/8/21 6:40 AM, Norman.Kern wrote: I met the same problem, I set the reweight value to make it not worse. Do you solve it by setting balancer? My ceph verison is 14.2.5. Yes, since luminous there is also support for balancing PGs over OSDs through "upmap". That way the distribution should

[ceph-users] Re: Ceph server

2021-03-10 Thread Stefan Kooman
On 3/10/21 5:43 PM, Ignazio Cassano wrote: Hello, what do you think about of ceph cluster made up of 6 nodes each one with the following configuration ? A+ Server 1113S-WN10RT Barebone Supermicro A+ Server 1113S-WN10RT - 1U - 10x U.2 NVMe - 2x M.2 - Dual 10-Gigabit LAN - 750W Redundant Processor

[ceph-users] Re: Ceph server

2021-03-10 Thread Stefan Kooman
On 3/10/21 8:12 PM, Stefan Kooman wrote: On 3/10/21 5:43 PM, Ignazio Cassano wrote: Hello, what do you think about of ceph cluster made up of 6 nodes each one with the following configuration ? I forgot to ask: Are you planning on only OSDs or should this be OSDs and MONs and ? In case of

[ceph-users] Ceph 14.2.17 ceph-mgr module issue

2021-03-12 Thread Stefan Kooman
Hi, After upgrading a Ceph cluster to 14.2.17 with ceph-ansible (docker containers) the manager hits an issue: Module 'volumes' has failed dependency: No module named typing, python trace: 2021-03-12 17:04:22.358 7f299ac75e40 1 mgr[py] Loading python module 'volumes' 2021-03-12 17:04:22.4

[ceph-users] Re: Ceph 14.2.17 ceph-mgr module issue

2021-03-12 Thread Stefan Kooman
On 3/12/21 5:46 PM, David Caro wrote: I might be wrong, but maybe the containers are missing something? The easiest way to check if accessing those directly, but from the looks of it it seems some python packages/installation issue. Adding also more info like 'ceph versions', 'docker images'/

[ceph-users] Re: Ceph 14.2.17 ceph-mgr module issue

2021-03-12 Thread Stefan Kooman
On 3/12/21 6:18 PM, David Caro wrote: I got the latest docker image from the public docker repo: dcaro@vulcanus$ docker pull ceph/daemon:latest-nautilus latest-nautilus: Pulling from ceph/daemon 2d473b07cdd5: Pull complete 6ab62ee0cbfb: Pull complete 8d5f9072ae2b: Pull complete 5cf35aefd364: Pu

[ceph-users] Re: Networking Idea/Question

2021-03-16 Thread Stefan Kooman
On 3/15/21 5:34 PM, Dave Hall wrote: Hello, If anybody out there has tried this or thought about it, I'd like to know... I've been thinking about ways to squeeze as much performance as possible from the NICs  on a Ceph OSD node.  The nodes in our cluster (6 x OSD, 3 x MGR/MON/MDS/RGW) curre

[ceph-users] Re: Diskless boot for Ceph nodes

2021-03-16 Thread Stefan Kooman
On 3/16/21 6:37 PM, Stephen Smith6 wrote: Hey folks - thought I'd check and see if anyone has ever tried to use ephemeral (tmpfs / ramfs based) boot disks for Ceph nodes? croit.io does that quite succesfully I believe [1]. Gr. Stefan [1]: https://www.croit.io/software/features ___

[ceph-users] Re: Networking Idea/Question

2021-03-17 Thread Stefan Kooman
On 3/17/21 7:44 AM, Janne Johansson wrote: Den ons 17 mars 2021 kl 02:04 skrev Tony Liu : What's the purpose of "cluster" network, simply increasing total bandwidth or for some isolations? Not having client traffic (that only occurs on the public network) fight over bandwidth with OSD<->OSD t

[ceph-users] Re: Diskless boot for Ceph nodes

2021-03-17 Thread Stefan Kooman
On 3/17/21 12:34 AM, Nico Schottelius wrote: On 2021-03-16 22:06, Stefan Kooman wrote: On 3/16/21 6:37 PM, Stephen Smith6 wrote: Hey folks - thought I'd check and see if anyone has ever tried to use ephemeral (tmpfs / ramfs based) boot disks for Ceph nodes? croit.io does that

[ceph-users] Re: Quick quota question

2021-03-17 Thread Stefan Kooman
On 3/17/21 11:28 AM, Andrew Walker-Brown wrote: Hi Magnus, Thanks for the reply. Just to be certain (I’m having a slow day today), it’s the amount of data stored by the clients. As an example. a pool using 3 replicas and a quota 3TB : clients would be able to create up to 3TB of data and Ce

[ceph-users] Re: ceph-ansible in Pacific and beyond?

2021-03-17 Thread Stefan Kooman
On 3/17/21 7:51 PM, Martin Verges wrote: I am still not convinced that containerizing everything brings any benefits except the collocation of services. Is there even a benefit? Decoupling from underlying host OS. On a test cluster I'm running Ubuntu Focal on the host (and a bunch of othe

[ceph-users] Re: ceph-ansible in Pacific and beyond?

2021-03-18 Thread Stefan Kooman
On 3/18/21 9:09 AM, Janne Johansson wrote: Den ons 17 mars 2021 kl 20:17 skrev Matthew H : "A containerized environment just makes troubleshooting more difficult, getting access and retrieving details on Ceph processes isn't as straightforward as with a non containerized infrastructure. I am

[ceph-users] Re: ceph octopus mysterious OSD crash

2021-03-18 Thread Stefan Kooman
On 3/18/21 9:28 PM, Philip Brown wrote: I've been banging on my ceph octopus test cluster for a few days now. 8 nodes. each node has 2 SSDs and 8 HDDs. They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a db partition. service_type: osd service_id: osd_spec_default pl

[ceph-users] Re: ceph octopus mysterious OSD crash

2021-03-18 Thread Stefan Kooman
On 3/19/21 2:20 AM, Philip Brown wrote: yup cephadm and orch was used to set all this up. Current state of things: ceph osd tree shows 33hdd1.84698 osd.33 destroyed 0 1.0 ^^ Destroyed, ehh, this doesn't look good to me. Ceph thinks this OSD is dest

[ceph-users] Re: ceph-ansible in Pacific and beyond?

2021-03-19 Thread Stefan Kooman
On 3/17/21 5:50 PM, Matthew Vernon wrote: Hi, I caught up with Sage's talk on what to expect in Pacific ( https://www.youtube.com/watch?v=PVtn53MbxTc ) and there was no mention of ceph-ansible at all. Is it going to continue to be supported? We use it (and uncontainerised packages) for all

[ceph-users] Re: ceph octopus mysterious OSD crash

2021-03-19 Thread Stefan Kooman
On 3/19/21 3:53 PM, Philip Brown wrote: mkay. Sooo... what's the new and nifty proper way to clean this up? The outsider's view is, "I should just be able to run 'ceph orch osd rm 33'" Can you spawn a cephadm shell and run: ceph osd rm 33? And / or: ceph osd crush rm 33, or try to do it with

[ceph-users] Re: ceph octopus mysterious OSD crash

2021-03-19 Thread Stefan Kooman
On 3/19/21 6:22 PM, Philip Brown wrote: I made *some* progress for cleanup. I could already do "ceph osd rm 33" from my master. But doing the cleanup on the actual OSD node was problematical. ceph-volume lvm zap xxx wasnt working properly.. because the device wasnt fully released because a

[ceph-users] Re: high number of kernel clients per osd slow down

2021-03-19 Thread Stefan Kooman
On 3/19/21 7:20 PM, Andrej Filipcic wrote: Hi, I am testing 15.2.10 on a large cluster (RH8). cephfs pool (size=1) with 122 nvme OSDs works fine till the number of clients is relatively low. Writing from 400 kernel clients (ior benchmark), 8 streams each, causes issues. Writes are initially f

[ceph-users] Re: ceph octopus mysterious OSD crash

2021-03-19 Thread Stefan Kooman
On 3/19/21 7:47 PM, Philip Brown wrote: I see. I dont think it works when 7/8 devices are already configured, and the SSD is already mostly sliced. OK. If it is a test cluster you might just blow it all away. By doing this you are simulating a "SSD" failure taking down all HDDs with it. It

[ceph-users] Re: ceph octopus mysterious OSD crash

2021-03-19 Thread Stefan Kooman
On 3/19/21 9:11 PM, Philip Brown wrote: if we cant replace a drive on a node in a crash situation, without blowing away the entire node seems to me ceph octopus fails the "test" part of the "test cluster" :-/ I agree. This should not be necessary. And I'm sure there is, or there will be f

[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Stefan Kooman
On 3/22/21 3:52 PM, Nico Schottelius wrote: Hello, follow up from my mail from 2020 [0], it seems that OSDs sometimes have "multiple classes" assigned: [15:47:15] server6.place6:/var/lib/ceph/osd/ceph-4# ceph osd crush rm-device-class osd.4 done removing class of osd(s): 4 [15:47:17] server6.

[ceph-users] Re: Device class not deleted/set correctly

2021-03-23 Thread Stefan Kooman
On 3/23/21 11:00 AM, Nico Schottelius wrote: Stefan Kooman writes: OSDs from the wrong class (hdd). Does anyone have a hint on how to fix this? Do you have: osd_class_update_on_start enabled? So this one is a bit funky. It seems to be off, but the behaviour would indicate it isn&#

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Stefan Kooman
On 3/23/21 8:29 AM, Dan van der Ster wrote: Hi Sam, Yeah somehow `lo:` is not getting skipped, probably due to those patches. (I guess it is because the 2nd patch looks for `lo:` but in fact the ifa_name is probably just `lo` without the colon) https://github.com/ceph/ceph/blob/master/src/

[ceph-users] Re: Advice needed: stuck cluster halfway upgraded, comms issues and MON space usage

2021-03-23 Thread Stefan Kooman
On 3/23/21 2:52 PM, Dan van der Ster wrote: Not sure. But anyway ceph has been skipping interfaces named "lo" since v10, but then dropped that in 14.2.18 (by accident, IMO). You should be able to get your osds listening to the correct IP using cluster network = 10.1.50.0/8 public network = 10.1

[ceph-users] Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-03-25 Thread Stefan Kooman
On 3/25/21 8:47 PM, Simon Oosthoek wrote: On 25/03/2021 20:42, Dan van der Ster wrote: netstat -anp | grep LISTEN | grep mgr # netstat -anp | grep LISTEN | grep mgr tcp    0  0 127.0.0.1:6801  0.0.0.0:* LISTEN 1310/ceph-mgr tcp    0  0 127.0.0.1:6800  0.0.

[ceph-users] Re: Possible to update from luminous 12.2.8 to nautilus latest?

2021-03-26 Thread Stefan Kooman
On 3/26/21 6:35 AM, Szabo, Istvan (Agoda) wrote: Hi, Is it possible to do a big jump or needs to go slower to luminous latest, then mimic latest, then nautilus latest? You do that at once. But do read the upgrade documentation [1]: Especially this part: If your cluster was originally instal

[ceph-users] Re: Cephfs metadata and MDS on same node

2021-03-26 Thread Stefan Kooman
On 3/9/21 4:03 PM, Jesper Lykkegaard Karlsen wrote: Dear Ceph’ers I am about to upgrade MDS nodes for Cephfs in the Ceph cluster (erasure code 8+3 ) I am administrating. Since they will get plenty of memory and CPU cores, I was wondering if it would be a good idea to move metadata OSDs (NVMe'

[ceph-users] Re: memory consumption by osd

2021-03-29 Thread Stefan Kooman
On 3/28/21 4:58 AM, Tony Liu wrote: I don't see any problems yet. All OSDs are working fine. Just that 1.8GB free memory concerns me. I know 256GB memory for 10 OSDs (16TB HDD) is a lot, I am planning to reduce it or increate osd_memory_target (if that's what you meant) to boost performance. But

[ceph-users] Re: Nautilus: Reduce the number of managers

2021-03-29 Thread Stefan Kooman
On 3/28/21 3:52 PM, Dave Hall wrote: Hello, We are in the process of bringing new hardware online that will allow us to get all of the MGRs, MONs, MDSs, etc.  off of our OSD nodes and onto dedicated management nodes.   I've created MGRs and MONs on the new nodes, and I found procedures for di

[ceph-users] Re: Device class not deleted/set correctly

2021-03-30 Thread Stefan Kooman
On 3/25/21 1:05 PM, Nico Schottelius wrote: it seems there is no reference to it in the ceph documentation. Do you have any pointers to it? Not anymore with new Ceph documentation. Out of curiosity, do you have any clue why it's not in there anymore? It might still be, but I cannot find it

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Stefan Kooman
On 3/30/21 12:55 PM, Boris Behrens wrote: I just move one PG away from the OSD, but the diskspace will not get freed. How did you move? I would suggest you use upmap: ceph osd pg-upmap-items Invalid command: missing required parameter pgid() osd pg-upmap-items [(id|osd.id)>...] : set pg_upm

[ceph-users] Re: Preferred order of operations when changing crush map and pool rules

2021-03-30 Thread Stefan Kooman
On 3/30/21 3:00 PM, Thomas Hukkelberg wrote: Any thoughts or insight on how to achieve this with minimal data movement and risk of cluster downtime would be welcome! I would do so with Dan's "upmap-remap" script [1]. See [2] for his presentation. We have used that quite a few times now (also

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Stefan Kooman
On 3/30/21 3:02 PM, Boris Behrens wrote: I reweighted the OSD to .0 and then forced the backfilling. How long does it take for ceph to free up space? I looks like it was doing this, but it could also be the "backup cleanup job" that removed images from the buckets. I don't have any numbers o

[ceph-users] Re: First 6 nodes cluster with Octopus

2021-03-31 Thread Stefan Kooman
On 3/30/21 9:02 PM, mabi wrote: Hello, I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu nodes and have a few questions mostly regarding planning of the infra. 1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is it ok to use Ubuntu 20.04

  1   2   3   4   >