[ceph-users] Re: doc: Is there a ceph-users list archive that archives before June 2019?

2025-07-02 Thread Gregory Farnum
Looks like marc.info has some: https://marc.info/?l=ceph-users&r=1&w=2 :) -Greg On Wed, Jul 2, 2025 at 5:45 AM Zac Dover wrote: > > Here is the archive of the ceph-users mailing list that docs.ceph.com and > ceph.io link to: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/ > > This arch

[ceph-users] Re: cephfs + SELinux: .snap folder is unlabeled

2025-06-26 Thread Gregory Farnum
Hi Fabian, how are you running Samba to do this? It sounds like you're mounting cephfs with the kernel and re-exporting by Samba? There are a whole bunch of ways that will get pretty messy. In this particular case: snapshot folders are fundamentally read only with Ceph and there's not a plausible w

[ceph-users] Re: Question: What's special about subvolumes?

2025-06-24 Thread Gregory Farnum
On Sat, Jun 21, 2025 at 12:23 AM Patrick Begou wrote: > > Hi James, > > this is an interesting discussion for me. This raises 2 questions for my > understanding of the concept: > > 1) do you mean snapshots are not taken in account for quotas ? A user > can remove files after a snapshot to retrieve

[ceph-users] Re: Limits on the number of cephx keys used

2025-06-10 Thread Gregory Farnum
I'm sure at some scale on some hardware it is possible to run into bottlenecks, but no reported issues with scaling come to mind. CephX keys are durably stored in the monitor's RocksDB instance, which it uses to store all of its data. This scales well but not infinitely, but I don't think we've ru

[ceph-users] Re: Spurious CephFS backtrace objects in non-default pool

2025-06-03 Thread Gregory Farnum
any issues to be caused by having the backtrace objects. We were mostly right…) On Tue, Jun 3, 2025 at 5:48 PM Hector Martin wrote: > > > On 2025/06/04 0:49, Gregory Farnum wrote: > > On Mon, Jun 2, 2025 at 5:45 PM Hector Martin wrote: > >> > >> On 2025/06/03 1:3

[ceph-users] Re: Spurious CephFS backtrace objects in non-default pool

2025-06-03 Thread Gregory Farnum
On Mon, Jun 2, 2025 at 5:45 PM Hector Martin wrote: > > On 2025/06/03 1:33, Gregory Farnum wrote: > > On Mon, Jun 2, 2025 at 6:12 AM Hector Martin wrote: > >> I guess this is so existing clients can find the real pool if they have > >> the old pool cached or some

[ceph-users] Re: Spurious CephFS backtrace objects in non-default pool

2025-06-02 Thread Gregory Farnum
On Mon, Jun 2, 2025 at 6:12 AM Hector Martin wrote: > I guess this is so existing clients can find the real pool if they have > the old pool cached or something? Yes, that's why. We don't have any tools to help clean it up as it "shouldn't" have an impact and is a necessary part of the protocol i

[ceph-users] Re: How does Ceph delete objects / data on disk?

2025-05-29 Thread Gregory Farnum
As others have suggested, it's not in any sense a "secure erase" where data is overwritten. At the RADOS level, when an OSD receives a delete operation, it twiddles the metadata so the object isn't there any more, and BlueStore marks the space as unallocated, etc. You could go in and get it with di

[ceph-users] Re: Rogue EXDEV errors when hardlinking

2025-03-21 Thread Gregory Farnum
Sounds like the scenario addressed in this PR: https://github.com/ceph/ceph/pull/47399 The tracker ticket it links indicates it should be fixed in reasonably modern point releases, but the PR has a better description of the issue. :) So it is presumably an older mds version, and the workload invol

[ceph-users] Re: Unintuitive (buggy?) CephFS behaviour when dealing with pool_namespace layout attribute

2025-03-05 Thread Gregory Farnum
This is certainly intended behavior. If you checked the layout on the particular file, you would see it hasn’t changed. Directory layouts are the default for new files, not a control mechanism for existing files. It might be confusing, so we can talk about different presentations if there’s a bett

[ceph-users] Re: ceph iscsi gateway

2025-02-10 Thread Gregory Farnum
The iSCSI gateway is likely to disappear in the future and is definitely in minimal maintenance mode right now. However, as with all removed features, if we do that it will have plenty of warning — we will announce it is deprecated in a major release without changing or removing it, and then remove

[ceph-users] Re: CephFS subdatapool in practice?

2025-02-05 Thread Gregory Farnum
On Thu, Jan 30, 2025 at 5:20 PM Otto Richter (Codeberg e.V.) wrote: > Any caveats to be aware of, such as overhead, latency, long-term > maintainability? > > > Since certain "interesting" features have been deprecated in the past, such as > inline_data, I thought it would be better to ask here for

[ceph-users] Ceph Tentacle release timeline — when?

2025-02-05 Thread Gregory Farnum
Hi all, We in the Ceph Steering Committee are discussing when we want to target the Tentacle release for, as we find ourselves in an unusual scheduling situation: * Historically, we have targeted our major release in early Spring. I believe this was initially aligned to the Ubuntu LTS release. (Wit

[ceph-users] Re: CephFS: Revert snapshot

2024-12-05 Thread Gregory Farnum
On Thu, Dec 5, 2024 at 1:37 PM Andre Tann wrote: > Hi all, > > in CephFS, creating and removing a snapshot is as easy as > > - mkdir .snap/SNAPSHOT_NAME > - rmdir .snap/SNAPSHOT_NAME > > Both steps are completed in no time, because no data has to be moved > around. > > However, wh

[ceph-users] Re: CephFS empty files in a Frankenstein system

2024-11-26 Thread Gregory Farnum
I haven’t gone through all the details since you seem to know you’ve done some odd stuff, but the “.snap” issue is because you’ve run into a CephFS feature which I recently discovered is embarrassingly under-documented: https://docs.ceph.com/en/reef/dev/cephfs-snapshots So that’s a special fake di

[ceph-users] Ceph Steering Committee 2024-11-25

2024-11-25 Thread Gregory Farnum
Another light meeting (we're appreciating it after our heavy governance discussions!): * Cancel next week due to Cephalocon travel * We were blocked on the quincy release due to some build issues with Ganesha (apparently we were pointed at the wrong kind of CentOS repo, and then the team was asking

[ceph-users] Re: Ceph Octopus packages missing at download.ceph.com

2024-11-18 Thread Gregory Farnum
Octopus should be back; sorry for the inconvenience. That said, everybody should really have upgraded past that by now. :) -Greg On Sun, Nov 17, 2024 at 6:40 AM Tim Holloway wrote: > As to the comings and goings of Octopus from download.ceph.com I cannot > speak. I had enough grief when IBM Red

[ceph-users] Ceph Steering Committee 2024-11-11

2024-11-11 Thread Gregory Farnum
We had a short meeting today, with no pre-set topics. The main topic that came up is a bug detected on upgrades to squid when using the balancer in large clusters: https://tracker.ceph.com/issues/68657. The RADOS team would like to do a squid point release once our quincy release is out the door (i

[ceph-users] Re: MDS and stretched clusters

2024-11-01 Thread Gregory Farnum
On Thu, Oct 31, 2024 at 12:54 PM Sake Ceph wrote: > We're looking for the multiple mds daemons to be active in zone A and > standby(-replay) in zone B. > This scenario would also benefit people who have more powerfull hardware > in zone A than zone B. > > Kind regards, > Sake > > > Op 31-10-2024

[ceph-users] Re: MDS and stretched clusters

2024-10-29 Thread Gregory Farnum
No, unfortunately this needs to be done at a higher level and is not included in Ceph right now. Rook may be able to do this, but I don't think cephadm does. Adam, is there some way to finagle this with pod placement rules (ie, tagging nodes as mds and mds-standby, and then assigning special mds co

[ceph-users] Re: How Ceph cleans stale object on primary OSD failure?

2024-10-23 Thread Gregory Farnum
On Wed, Oct 23, 2024 at 5:44 AM Maged Mokhtar wrote: > > This is tricky but i think you are correct, B and C will keep the new > object copy and not revert to old object version (or not revert to > object removal if it had not existed). > To be 100% sure you have to dig into code to verify this.

[ceph-users] Re: The ceph monitor crashes every few days

2024-10-10 Thread Gregory Farnum
On Wed, Oct 9, 2024 at 7:28 AM 李明 wrote: > Hello, > > ceph version is 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) > nautilus (stable) > > and the rbd info command is also slow, some times it needs 6 seconds. rbd > snap create command takes 17 seconds. There is another cluster with the >

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Gregory Farnum
Yes, this was an old lesson and AFAIK nobody has intentionally pushed the bounds in a long time because it was a very painful lesson for anybody who ran into it. The main problem was the increase in ram use scaling with PGs, which in normal operation is often fine but as we all know balloons in fa

[ceph-users] (belated) CLT notes

2024-08-08 Thread Gregory Farnum
Hi folks, the CLT met on Monday August 5. We discussed a few topics: * The mailing lists are a problem to moderate right now with a huge increase in spam. We have two problems: 1) the moderation system's web front-end apparently isn't operational. That's getting fixed. 2) The moderation is a big l

[ceph-users] Re: Ceph tracker broken?

2024-07-01 Thread Gregory Farnum
You currently have "Email notifications" set to "For any event on all my projects". I believe that's the firehose setting, so I've gone ahead and changed it to "Only for things I watch or I'm involved in". I'm unaware of any reason that would have been changed on the back end, though there were som

[ceph-users] Re: How to setup NVMeoF?

2024-05-30 Thread Gregory Farnum
There's a major NVMe effort underway but it's not even merged to master yet, so I'm not sure how docs would have ended up in the Reef doc tree. :/ Zac, any idea? Can we pull this out? -Greg On Thu, May 30, 2024 at 7:03 AM Robert Sander wrote: > > Hi, > > On 5/30/24 14:18, Frédéric Nass wrote: >

[ceph-users] Re: cephfs-data-scan orphan objects while mds active?

2024-05-16 Thread Gregory Farnum
; --- > Olli Rajala - Lead TD > Anima Vitae Ltd. > www.anima.fi > --- > > > On Tue, May 14, 2024 at 9:41 AM Gregory Farnum wrote: > > > The cephfs-data-scan tools are built with the expectation that they'll > > be run off

[ceph-users] Re: cephfs-data-scan orphan objects while mds active?

2024-05-13 Thread Gregory Farnum
The cephfs-data-scan tools are built with the expectation that they'll be run offline. Some portion of them could be run without damaging the live filesystem (NOT all, and I'd have to dig in to check which is which), but they will detect inconsistencies that don't really exist (due to updates that

[ceph-users] Re: question about rbd_read_from_replica_policy

2024-04-04 Thread Gregory Farnum
On Thu, Apr 4, 2024 at 8:23 AM Anthony D'Atri wrote: > > Network RTT? No, it's sadly not that clever. There's a crush_location configurable that you can set on clients (to a host, or a datacenter, or any other CRUSH bucket), and as long as part of it matches the CRUSH map then it will feed IOs to

[ceph-users] Re: Are we logging IRC channels?

2024-03-22 Thread Gregory Farnum
I put it on the list for the next CLT. :) (though I imagine it will move to the infrastructure meeting from there.) On Fri, Mar 22, 2024 at 4:42 PM Mark Nelson wrote: > Sure! I think Wido just did it all unofficially, but afaik we've lost > all of those records now. I don't know if Wido still

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE, MDS_SLOW_METADATA_IO, and MDS_SLOW_REQUEST errors and slow osd_ops despite hardware being fine

2024-03-15 Thread Gregory Farnum
On Fri, Mar 15, 2024 at 6:15 AM Ivan Clayson wrote: > Hello everyone, > > We've been experiencing on our quincy CephFS clusters (one 17.2.6 and > another 17.2.7) repeated slow ops with our client kernel mounts > (Ceph 17.2.7 and version 4 Linux kernels on all clients) that seem to > originate fro

[ceph-users] Re: Telemetry endpoint down?

2024-03-11 Thread Gregory Farnum
We had a lab outage Thursday and it looks like this service wasn’t restarted after that occurred. Fixed now and we’ll look at how to prevent that in future. -Greg On Mon, Mar 11, 2024 at 6:46 AM Konstantin Shalygin wrote: > Hi, seems telemetry endpoint is down for a some days? We have connection

[ceph-users] Re: Ceph-storage slack access

2024-03-08 Thread Gregory Farnum
Much of our infrastructure (including website) was down for ~6 hours yesterday. Some information on the sepia list, and more in the slack/irc channel. -Greg On Fri, Mar 8, 2024 at 9:48 AM Zac Dover wrote: > > I ping www.ceph.io and ceph.io with no difficulty: > > > zdover@NUC8i7BEH:~$ ping www.ce

[ceph-users] Re: Minimum amount of nodes needed for stretch mode?

2024-03-07 Thread Gregory Farnum
On Thu, Mar 7, 2024 at 9:09 AM Stefan Kooman wrote: > > Hi, > > TL;DR > > Failure domain considered is data center. Cluster in stretch mode [1]. > > - What is the minimum amount of monitor nodes (apart from tie breaker) > needed per failure domain? You need at least two monitors per site. This is

[ceph-users] Re: Ceph-storage slack access

2024-03-07 Thread Gregory Farnum
The slack workspace is bridged to our also-published irc channels. I don't think we've done anything to enable xmpp (and two protocols is enough work to keep alive!). -Greg On Wed, Mar 6, 2024 at 9:07 AM Marc wrote: > > Is it possible to access this also with xmpp? > > > > > At the very bottom of

[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Gregory Farnum
On Wed, Mar 6, 2024 at 8:56 AM Matthew Vernon wrote: > > Hi, > > On 06/03/2024 16:49, Gregory Farnum wrote: > > Has the link on the website broken? https://ceph.com/en/community/connect/ > > We've had trouble keeping it alive in the past (getting a non-expiring > &

[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Gregory Farnum
Has the link on the website broken? https://ceph.com/en/community/connect/ We've had trouble keeping it alive in the past (getting a non-expiring invite), but I thought that was finally sorted out. -Greg On Wed, Mar 6, 2024 at 8:46 AM Matthew Vernon wrote: > > Hi, > > How does one get an invite t

[ceph-users] Re: cephfs inode backtrace information

2024-01-31 Thread Gregory Farnum
The docs recommend a fast SSD pool for the CephFS *metadata*, but the default data pool can be more flexible. The backtraces are relatively small — it's an encoded version of the path an inode is located at, plus the RADOS hobject, which is probably more of the space usage. So it should fit fine in

[ceph-users] Re: Debian 12 support

2023-11-15 Thread Gregory Farnum
There are versioning and dependency issues (both of packages, and compiler toolchain pieces) which mean that the existing reef releases do not build on Debian. Our upstream support for Debian has always been inconsistent because we don’t have anybody dedicated or involved enough in both Debian and

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-01 Thread Gregory Farnum
We have seen issues like this a few times and they have all been kernel client bugs with CephFS’ internal “capability” file locking protocol. I’m not aware of any extant bugs like this in our code base, but kernel patches can take a long and winding path before they end up on deployed systems. Mos

[ceph-users] Re: Not able to find a standardized restoration procedure for subvolume snapshots.

2023-09-27 Thread Gregory Farnum
Unfortunately, there’s not any such ability. We are starting long-term work on making this smoother, but CephFS snapshots are read-only and there’s no good way to do a constant-time or low-time “clone” operation, so you just have to copy the data somewhere and start work on it from that position :/

[ceph-users] Re: CVE-2023-43040 - Improperly verified POST keys in Ceph RGW?

2023-09-27 Thread Gregory Farnum
We discussed this in the CLT today and Casey can talk more about the impact and technical state of affairs. This was disclosed on the security list and it’s rated as a bug that did not require hotfix releases due to the limited escalation scope. -Greg On Wed, Sep 27, 2023 at 1:37 AM Christian Roh

[ceph-users] Ceph leadership team notes 9/27

2023-09-27 Thread Gregory Farnum
Hi everybody, The CLT met today as usual. We only had a few topics under discussion: * the User + Dev relaunch went off well! We’d like reliable recordings and have found Jitsi to be somewhat glitchy; Laura will communicate about workarounds for that while we work on a longer-term solution (self-ho

[ceph-users] Re: RHEL / CephFS / Pacific / SELinux unavoidable "relabel inode" error?

2023-08-02 Thread Gregory Farnum
I don't think we've seen this reported before. SELinux gets a hefty workout from Red Hat with their downstream ODF for OpenShift (Kubernetes), so it certainly works at a basic level. SELinux is a fussy beast though, so if you're eg mounting CephFS across RHEL nodes and invoking SELinux against it,

[ceph-users] Re: CephFS snapshots: impact of moving data

2023-07-06 Thread Gregory Farnum
Moving files around within the namespace never changes the way the file data is represented within RADOS. It’s just twiddling metadata bits. :) -Greg On Thu, Jul 6, 2023 at 3:26 PM Dan van der Ster wrote: > Hi Mathias, > > Provided that both subdirs are within the same snap context (subdirs belo

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-05-25 Thread Gregory Farnum
I haven’t checked the logs, but the most obvious way this happens is if the mtime set on the directory is in the future compared to the time on the client or server making changes — CephFS does not move times backwards. (This causes some problems but prevents many, many others when times are not sy

[ceph-users] Re: [EXTERN] Re: cephfs max_file_size

2023-05-24 Thread Gregory Farnum
On Tue, May 23, 2023 at 11:52 PM Dietmar Rieder wrote: > > On 5/23/23 15:58, Gregory Farnum wrote: > > On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder > > wrote: > >> > >> Hi, > >> > >> can the cephfs "max_file_size" setting be chan

[ceph-users] Re: [Help appreciated] ceph mds damaged

2023-05-23 Thread Gregory Farnum
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote: > > Dear All, > > After a unsuccessful upgrade to pacific, MDS were offline and could not get > back on. Checked the MDS log and found below. See cluster info from below as > well. Appreciate it if anyone can point me to the right direction. Thank

[ceph-users] Re: cephfs max_file_size

2023-05-23 Thread Gregory Farnum
On Tue, May 23, 2023 at 3:28 AM Dietmar Rieder wrote: > > Hi, > > can the cephfs "max_file_size" setting be changed at any point in the > lifetime of a cephfs? > Or is it critical for existing data if it is changed after some time? Is > there anything to consider when changing, let's say, from 1TB

[ceph-users] Re: mds dump inode crashes file system

2023-05-16 Thread Gregory Farnum
On Fri, May 12, 2023 at 5:28 AM Frank Schilder wrote: > > Dear Xiubo and others. > > >> I have never heard about that option until now. How do I check that and > >> how to I disable it if necessary? > >> I'm in meetings pretty much all day and will try to send some more info > >> later. > > > >

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Gregory Farnum
to head home > now ... > > Thanks and best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Gregory Farnum > Sent: Wednesday, May 10, 2023 4:26 PM > To: Frank Schilder >

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Gregory Farnum
This is a very strange assert to be hitting. From a code skim my best guess is the inode somehow has an xattr with no value, but that's just a guess and I've no idea how it would happen. Somebody recently pointed you at the (more complicated) way of identifying an inode path by looking at its RADOS

[ceph-users] Re: quincy 17.2.6 - write performance continuously slowing down until OSD restart needed

2023-05-02 Thread Gregory Farnum
On Tue, May 2, 2023 at 7:54 AM Igor Fedotov wrote: > > > On 5/2/2023 11:32 AM, Nikola Ciprich wrote: > > I've updated cluster to 17.2.6 some time ago, but the problem persists. > > This is > > especially annoying in connection with https://tracker.ceph.com/issues/56896 > > as restarting OSDs is q

[ceph-users] Re: Ceph stretch mode / POOL_BACKFILLFULL

2023-04-27 Thread Gregory Farnum
On Fri, Apr 21, 2023 at 7:26 AM Kilian Ries wrote: > > Still didn't find out what will happen when the pool is full - but tried a > little bit in our testing environment and i were not able to get the pool > full before an OSD got full. So in first place one OSD reached the full ratio > (pool n

[ceph-users] Re: Bug, pg_upmap_primaries.empty()

2023-04-26 Thread Gregory Farnum
Looks like you've somehow managed to enable the upmap balancer while allowing a client that's too told to understand it to mount. Radek, this is a commit from yesterday; is it a known issue? On Wed, Apr 26, 2023 at 7:49 AM Nguetchouang Ngongang Kevin wrote: > > Good morning, i found a bug on cep

[ceph-users] Re: CephFS thrashing through the page cache

2023-04-05 Thread Gregory Farnum
On Fri, Mar 17, 2023 at 1:56 AM Ashu Pachauri wrote: > > Hi Xiubo, > > As you have correctly pointed out, I was talking about the stipe_unit > setting in the file layout configuration. Here is the documentation for > that for anyone else's reference: > https://docs.ceph.com/en/quincy/cephfs/file-l

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Gregory Farnum
> Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Gregory Farnum > Sent: Wednesday, March 22, 2023 4:14 PM > To: Frank Schilder > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] Re: ln: failed to c

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Gregory Farnum
Do you have logs of what the nfs server is doing? Managed to reproduce it in terms of direct CephFS ops? On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder wrote: > I have to correct myself. It also fails on an export with "sync" mode. > Here is an strace on the client (strace ln envs/satwindspy/in

[ceph-users] Re: mds damage cannot repair

2023-02-13 Thread Gregory Farnum
A "backtrace" is an xattr on the RADOS object storing data for a given file, and it contains the file's (versioned) path from the root. So a bad backtrace means there's something wrong with that — possibly just that there's a bug in the version of the code that's checking it, because they're genera

[ceph-users] Re: Health warning - POOL_TARGET_SIZE_BYTES_OVERCOMMITED

2023-02-13 Thread Gregory Farnum
On Mon, Feb 13, 2023 at 4:16 AM Sake Paulusma wrote: > > Hello, > > I configured a stretched cluster on two datacenters. It's working fine, > except this weekend the Raw Capicity exceeded 50% and the error > POOL_TARGET_SIZE_BYTES_OVERCOMMITED showed up. > > The command "ceph df" is showing the

[ceph-users] Re: Frequent calling monitor election

2023-02-09 Thread Gregory Farnum
Also, that the current leader (ceph-01) is one of the monitors proposing an election each time suggests the problem is with getting commit acks back from one of its followers. On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster wrote: > > Hi Frank, > > Check the mon logs with some increased debug lev

[ceph-users] Re: MDS_DAMAGE dir_frag

2022-12-12 Thread Gregory Farnum
On Mon, Dec 12, 2022 at 12:10 PM Sascha Lucas wrote: > Hi Dhairya, > > On Mon, 12 Dec 2022, Dhairya Parmar wrote: > > > You might want to look at [1] for this, also I found a relevant thread > [2] > > that could be helpful. > > > > Thanks a lot. I already found [1,2], too. But I did not considere

[ceph-users] Re: what happens if a server crashes with cephfs?

2022-12-08 Thread Gregory Farnum
Manuel Holtgrewe > Sent: Thursday, December 8, 2022 12:38 PM > To: Charles Hedrick > Cc: Gregory Farnum ; Dhairya Parmar ; > ceph-users@ceph.io > Subject: Re: [ceph-users] Re: what happens if a server crashes with cephfs? > > Hi Charles, > > are you concerned with a singl

[ceph-users] Re: what happens if a server crashes with cephfs?

2022-12-08 Thread Gregory Farnum
t; >> thanks. I'm evaluating cephfs for a computer science dept. We have users >> that run week-long AI training jobs. They use standard packages, which they >> probably don't want to modify. At the moment we use NFS. It uses synchronous >> I/O, so if somethings goes wr

[ceph-users] Re: what happens if a server crashes with cephfs?

2022-12-07 Thread Gregory Farnum
More generally, as Manuel noted you can (and should!) make use of fsync et al for data safety. Ceph’s async operations are not any different at the application layer from how data you send to the hard drive can sit around in volatile caches until a consistency point like fsync is invoked. -Greg On

[ceph-users] Re: Implications of pglog_hardlimit

2022-11-29 Thread Gregory Farnum
On Tue, Nov 29, 2022 at 1:18 PM Joshua Timmer wrote: > I've got a cluster in a precarious state because several nodes have run > out of memory due to extremely large pg logs on the osds. I came across > the pglog_hardlimit flag which sounds like the solution to the issue, > but I'm concerned that

[ceph-users] Re: CephFS performance

2022-11-22 Thread Gregory Farnum
In addition to not having resiliency by default, my recollection is that BeeGFS also doesn't guarantee metadata durability in the event of a crash or hardware failure like CephFS does. There's not really a way for us to catch up to their "in-memory metadata IOPS" with our "on-disk metadata IOPS". :

[ceph-users] Re: 16.2.11 branch

2022-10-31 Thread Gregory Farnum
On Fri, Oct 28, 2022 at 8:51 AM Laura Flores wrote: > > Hi Christian, > > There also is https://tracker.ceph.com/versions/656 which seems to be > > tracking > > the open issues tagged for this particular point release. > > > > Yes, thank you for providing the link. > > If you don't mind me asking

[ceph-users] Re: Slow monitor responses for rbd ls etc.

2022-10-18 Thread Gregory Farnum
On Fri, Oct 7, 2022 at 7:53 AM Sven Barczyk wrote: > > Hello, > > > > we are encountering a strange behavior on our Ceph. (All Ubuntu 20 / All > mons Quincy 17.2.4 / Oldest OSD Quincy 17.2.0 ) > Administrative commands like rbd ls or create are so slow, that libvirtd is > running into timeouts and

[ceph-users] Re: disable stretch_mode possible?

2022-10-17 Thread Gregory Farnum
On Mon, Oct 17, 2022 at 4:40 AM Enrico Bocchi wrote: > > Hi, > > I have played with stretch clusters a bit but never managed to > un-stretch them fully. > > From my experience (using Pacific 16.2.9), once the stretch mode is > enabled, the replicated pools switch to the stretch_rule with size 4,

[ceph-users] Re: CLT meeting summary 2022-09-28

2022-09-28 Thread Gregory Farnum
On Wed, Sep 28, 2022 at 9:15 AM Adam King wrote: > Budget Discussion > >- Going to investigate current resources being used, see if any costs >can be cut >- What can be moved from virtual environments to internal ones? >- Need to take inventory of what resources we currently have

[ceph-users] Re: Power outage recovery

2022-09-15 Thread Gregory Farnum
Recovery from OSDs loses the mds and rgw keys they use to authenticate with cephx. You need to get those set up again by using the auth commands. I don’t have them handy but it is discussed in the mailing list archives. -Greg On Thu, Sep 15, 2022 at 3:28 PM Jorge Garcia wrote: > Yes, I tried res

[ceph-users] Re: data usage growing despite data being written

2022-09-07 Thread Gregory Farnum
gt; > Is there a way to find out how many osdmaps are currently being kept? > ____ > From: Gregory Farnum > Sent: Wednesday, September 7, 2022 10:58 AM > To: Wyll Ingersoll > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] data usage growing desp

[ceph-users] Re: data usage growing despite data being written

2022-09-07 Thread Gregory Farnum
tore-tool to migrate PGs to their proper destinations (letting the cluster clean up the excess copies if you can afford to — deleting things is always scary). But I haven't had to help recover a death-looping cluster in around a decade, so that's about all the options I can offer up. -Gre

[ceph-users] Re: cephfs blocklist recovery and recover_session mount option

2022-09-07 Thread Gregory Farnum
On Tue, Aug 16, 2022 at 3:14 PM Vladimir Brik wrote: > > Hello > > I'd like to understand what is the proper/safe way to > recover when a cephfs client becomes blocklisted by the MDS. > > The man page of mount.ceph talks about recover_session=clean > option, but it has the following text I am not

[ceph-users] Re: data usage growing despite data being written

2022-09-07 Thread Gregory Farnum
On Tue, Sep 6, 2022 at 2:08 PM Wyll Ingersoll wrote: > > > Our cluster has not had any data written to it externally in several weeks, > but yet the overall data usage has been growing. > Is this due to heavy recovery activity? If so, what can be done (if > anything) to reduce the data generate

[ceph-users] Re: [Help] Does MSGR2 protocol use openssl for encryption

2022-09-02 Thread Gregory Farnum
We partly rolled our own with AES-GCM. See https://docs.ceph.com/en/quincy/rados/configuration/msgr2/#connection-modes and https://docs.ceph.com/en/quincy/dev/msgr2/#frame-format -Greg On Wed, Aug 24, 2022 at 4:50 PM Jinhao Hu wrote: > > Hi, > > I have a question about the MSGR protocol Ceph used

[ceph-users] Re: CephFS MDS sizing

2022-09-02 Thread Gregory Farnum
On Sun, Aug 28, 2022 at 12:19 PM Vladimir Brik wrote: > > Hello > > Is there a way to query or get an approximate value of an > MDS's cache hit ratio without using "dump loads" command > (which seems to be a relatively expensive operation) for > monitoring and such? Unfortunately, I'm not seeing o

[ceph-users] Re: Changing the cluster network range

2022-09-02 Thread Gregory Farnum
On Mon, Aug 29, 2022 at 12:49 AM Burkhard Linke wrote: > > Hi, > > > some years ago we changed our setup from a IPoIB cluster network to a > single network setup, which is a similar operation. > > > The OSD use the cluster network for heartbeats and backfilling > operation; both use standard tcp c

[ceph-users] Re: Potential bug in cephfs-data-scan?

2022-08-25 Thread Gregory Farnum
On Fri, Aug 19, 2022 at 7:17 AM Patrick Donnelly wrote: > > On Fri, Aug 19, 2022 at 5:02 AM Jesper Lykkegaard Karlsen > wrote: > > > > Hi, > > > > I have recently been scanning the files in a PG with "cephfs-data-scan > > pg_files ...". > > Why? > > > Although, after a long time the scan was sti

[ceph-users] Re: CephFS perforamnce degradation in root directory

2022-08-15 Thread Gregory Farnum
I was wondering if it had something to do with quota enforcement. The other possibility that occurs to me is if other clients are monitoring the system, or an admin pane (eg the dashboard) is displaying per-volume or per-client stats, they may be poking at the mountpoint and interrupting exclusive

[ceph-users] Re: linux distro requirements for reef

2022-08-10 Thread Gregory Farnum
The immediate driver is both a switch to newer versions of python, and to newer compilers supporting more C++20 features. More generally, supporting multiple versions of a distribution is a lot of work and when Reef comes out next year, CentOS9 will be over a year old. We generally move new stable

[ceph-users] Re: Upgrade from Octopus to Pacific cannot get monitor to join

2022-07-28 Thread Gregory Farnum
ode' they are running because it's > all in docker containers. But maybe I'm missing something obvious > > Thanks > > > > > July 27, 2022 4:34 PM, "Gregory Farnum" wrote: > > On Wed, Jul 27, 2022 at 10:24 AM wrote: > > Currently running Oc

[ceph-users] Re: Ceph Stretch Cluster - df pool size (Max Avail)

2022-07-28 Thread Gregory Farnum
https://tracker.ceph.com/issues/56650 There's a PR in progress to resolve this issue now. (Thanks, Prashant!) -Greg On Thu, Jul 28, 2022 at 7:52 AM Nicolas FONTAINE wrote: > > Hello, > > We have exactly the same problem. Did you find an answer or should we > open a bug report? > > Sincerely, > >

[ceph-users] Re: cannot set quota on ceph fs root

2022-07-28 Thread Gregory Farnum
On Thu, Jul 28, 2022 at 1:01 AM Frank Schilder wrote: > > Hi all, > > I'm trying to set a quota on the ceph fs file system root, but it fails with > "setfattr: /mnt/adm/cephfs: Invalid argument". I can set quotas on any > sub-directory. Is this intentional? The documentation > (https://docs.cep

[ceph-users] Re: Cluster running without monitors

2022-07-28 Thread Gregory Farnum
On Thu, Jul 28, 2022 at 5:32 AM Johannes Liebl wrote: > > Hi Ceph Users, > > > I am currently evaluating different cluster layouts and as a test I stopped > two of my three monitors while client traffic was running on the nodes.? > > > Only when I restartet an OSD all PGs which were related to th

[ceph-users] Re: Upgrade from Octopus to Pacific cannot get monitor to join

2022-07-27 Thread Gregory Farnum
On Wed, Jul 27, 2022 at 10:24 AM wrote: > Currently running Octopus 15.2.16, trying to upgrade to Pacific using > cephadm. > > 3 mon nodes running 15.2.16 > 2 mgr nodes running 16.2.9 > 15 OSD's running 15.2.16 > > The mon/mgr nodes are running in lxc containers on Ubuntu running docker > from th

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-26 Thread Gregory Farnum
t us know. -Greg > > On Tue, Jul 26, 2022 at 3:16 PM Gregory Farnum wrote: > > > > We can’t do the final release until the recent mgr/volumes security > fixes get merged in, though. > > https://github.com/ceph/ceph/pull/47236 > > > > On Tue, Jul 26, 202

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-26 Thread Gregory Farnum
We can’t do the final release until the recent mgr/volumes security fixes get merged in, though. https://github.com/ceph/ceph/pull/47236 On Tue, Jul 26, 2022 at 3:12 PM Ramana Krisna Venkatesh Raja < rr...@redhat.com> wrote: > On Thu, Jul 21, 2022 at 10:28 AM Yuri Weinstein > wrote: > > > > Deta

[ceph-users] Re: LibCephFS Python Mount Failure

2022-07-26 Thread Gregory Farnum
It looks like you’re setting environment variables that force your new keyring, it you aren’t telling the library to use your new CephX user. So it opens your new keyring and looks for the default (client.admin) user and doesn’t get anything. -Greg On Tue, Jul 26, 2022 at 7:54 AM Adam Carrgilson

[ceph-users] Re: ceph-fs crashes on getfattr

2022-07-12 Thread Gregory Farnum
roken more or less all the way around. > > > > Best regards, > > = > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > > > From: Gregory Farnum > > Sent: 11 July

[ceph-users] Re: ceph-fs crashes on getfattr

2022-07-11 Thread Gregory Farnum
On Mon, Jul 11, 2022 at 8:26 AM Frank Schilder wrote: > > Hi all, > > we made a very weird observation on our ceph test cluster today. A simple > getfattr with a misspelled attribute name sends the MDS cluster into a > crash+restart loop. Something as simple as > > getfattr -n ceph.dir.layout.

[ceph-users] Re: cephfs client permission restrictions?

2022-06-23 Thread Gregory Farnum
On Thu, Jun 23, 2022 at 8:18 AM Wyll Ingersoll wrote: > > Is it possible to craft a cephfs client authorization key that will allow the > client read/write access to a path within the FS, but NOT allow the client to > modify the permissions of that path? > For example, allow RW access to /cephfs

[ceph-users] Re: Possible to recover deleted files from CephFS?

2022-06-14 Thread Gregory Farnum
On Tue, Jun 14, 2022 at 8:50 AM Michael Sherman wrote: > > Hi, > > We discovered that a number of files were deleted from our cephfs filesystem, > and haven’t been able to find current backups or snapshots. > > Is it possible to “undelete” a file by modifying metadata? Using > `cephfs-journal-to

[ceph-users] Re: Feedback/questions regarding cephfs-mirror

2022-06-10 Thread Gregory Farnum
On Wed, Jun 8, 2022 at 12:36 AM Andreas Teuchert wrote: > > > Hello, > > we're currently evaluating cephfs-mirror. > > We have two data centers with one Ceph cluster in each DC. For now, the > Ceph clusters are only used for CephFS. On each cluster we have one FS > that contains a directory for cu

[ceph-users] Re: Ceph on RHEL 9

2022-06-10 Thread Gregory Farnum
We aren't building for Centos 9 yet, so I guess the python dependency declarations don't work with the versions in that release. I've put updating to 9 on the agenda for the next CLT. (Do note that we don't test upstream packages against RHEL, so if Centos Stream does something which doesn't match

[ceph-users] Re: Stretch cluster questions

2022-05-16 Thread Gregory Farnum
I'm not quite clear where the confusion is coming from here, but there are some misunderstandings. Let me go over it a bit: On Tue, May 10, 2022 at 1:29 AM Frank Schilder wrote: > > > What you are missing from stretch mode is that your CRUSH rule wouldn't > > guarantee at least one copy in surviv

[ceph-users] Re: repairing damaged cephfs_metadata pool

2022-05-16 Thread Gregory Farnum
On Tue, May 10, 2022 at 2:47 PM Horvath, Dustin Marshall wrote: > > Hi there, newcomer here. > > I've been trying to figure out if it's possible to repair or recover cephfs > after some unfortunate issues a couple of months ago; these couple of nodes > have been offline most of the time since th

[ceph-users] Re: Incomplete file write/read from Ceph FS

2022-05-06 Thread Gregory Farnum
Do you have any locking which guarantees that nodes don't copy files which are still in the process of being written? CephFS will guarantee any readers see the results of writes which are already reported complete while reading, but I don't see any guarantees about atomicity in https://docs.microso

[ceph-users] Re: [progress WARNING root] complete: ev ... does not exist, oh my!

2022-05-06 Thread Gregory Farnum
On Fri, May 6, 2022 at 5:58 AM Harry G. Coin wrote: > > I tried searching for the meaning of a ceph Quincy all caps WARNING > message, and failed. So I need help. Ceph tells me my cluster is > 'healthy', yet emits a bunch of 'progress WARNING root] comlete ev' ... > messages. Which I score rig

  1   2   >