[ceph-users] Re: cephadm and remoto package

2023-06-26 Thread Florian Haas
Hi Shashi, I just ran into this myself, and I thought I'd share the solution/workaround that I applied. On 15/05/2023 22:08, Shashi Dahal wrote: Hi, I followed this documentation: https://docs.ceph.com/en/pacific/cephadm/adoption/ This is the error I get when trying to enable cephadm. ceph

[ceph-users] Re: A change in Ceph leadership...

2021-10-18 Thread Florian Haas
On 15/10/2021 17:13, Josh Durgin wrote: Thanks so much Sage, it's difficult to put into words how much you've done over the years. You're always a beacon of the best aspects of open source - kindness, wisdom, transparency, and authenticity. So many folks have learned so much from you, and that's

[ceph-users] BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-09 Thread Florian Haas
Hi everyone, it seems there have been several reports in the past related to BlueStore OSDs crashing from unhandled errors in _txc_add_transaction: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/03.html http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-January/032172.ht

[ceph-users] Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-09 Thread Florian Haas
Hi Sage! Whoa that was quick. :) On 09/08/2019 16:27, Sage Weil wrote: >> https://tracker.ceph.com/issues/38724#note-26 > > { > "op_num": 2, > "op_name": "truncate", > "collection": "2.293_head", > "oid": > "#-4:c96337db:::temp_recovering_

[ceph-users] Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-14 Thread Florian Haas
Hi Tom, responding back on this briefly so that people are in the loop; I'll have more details in a blog post that I hope to get around to writing. On 12/08/2019 11:34, Thomas Byrne - UKRI STFC wrote: >> And bluestore should refuse to start if the configured limit is > 4GB. Or >> something alon

[ceph-users] Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-14 Thread Florian Haas
On 12/08/2019 21:07, Alexandre Marangone wrote: >> rados -p volumes stat 'obj-vS6RN9\uQwvXU9DP' >> error stat-ing volumes/obj-vS6RN9\uQwvXU9DP: (2) No such file or directory > I believe you need to substitute \u with _ Yes indeed, thank you! Cheers, Florian __

[ceph-users] RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended?

2019-08-21 Thread Florian Haas
Hi everyone, apologies in advance; this will be long. It's also been through a bunch of edits and rewrites, so I don't know how well I'm expressing myself at this stage — please holler if anything is unclear and I'll be happy to try to clarify. I am currently in the process of investigating the b

[ceph-users] Re: RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended?

2019-08-21 Thread Florian Haas
Hi Jason! Thanks for the quick reply. On 21/08/2019 16:51, Jason Dillaman wrote:> > It just looks like this was an oversight from the OpenStack developers > when Nova RBD "direct" ephemeral image snapshot support was added [1]. > I would open a bug ticket against Nova for the issue.> > [1] https:/

[ceph-users] Re: RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended?

2019-08-21 Thread Florian Haas
On 21/08/2019 18:05, dhils...@performair.com wrote: > Florian; > > Forgive my lack of knowledge of OpenStack, and your environment / use case. > > Why would you need / want to snapshot an ephemeral disk? Isn't the point of > ephemeral storage to not be persistent? Fair point, but please consid

[ceph-users] Re: RBD, OpenStack Nova, libvirt, qemu-guest-agent, and FIFREEZE: is this working as intended?

2019-08-23 Thread Florian Haas
Just following up here to report back and close the loop: On 21/08/2019 16:51, Jason Dillaman wrote: > It just looks like this was an oversight from the OpenStack developers > when Nova RBD "direct" ephemeral image snapshot support was added [1]. > I would open a bug ticket against Nova for the is

[ceph-users] Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

2019-08-23 Thread Florian Haas
Hi everyone, there are a couple of bug reports about this in Redmine but only one (unanswered) mailing list message[1] that I could find. So I figured I'd raise the issue here again and copy the original reporters of the bugs (they are BCC'd, because in case they are no longer subscribed it wouldn

[ceph-users] Re: Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

2019-08-23 Thread Florian Haas
On 23/08/2019 13:34, Paul Emmerich wrote: > Is this reproducible with crushtool? Not for me. > ceph osd getcrushmap -o crushmap > crushtool -i crushmap --update-item XX 1.0 osd.XX --loc host > hostname-that-doesnt-exist-yet -o crushmap.modified > Replacing XX with the osd ID you tried to add. Ju

[ceph-users] Re: Luminous and mimic: adding OSD can crash mon(s) and lead to loss of quorum

2019-08-26 Thread Florian Haas
On 23/08/2019 22:14, Paul Emmerich wrote: > On Fri, Aug 23, 2019 at 3:54 PM Florian Haas wrote: >> >> On 23/08/2019 13:34, Paul Emmerich wrote: >>> Is this reproducible with crushtool? >> >> Not for me. >> >>> ceph osd getcrushmap -o crushmap &g

[ceph-users] Heavily-linked lists.ceph.com pipermail archive now appears to lead to 404s

2019-08-29 Thread Florian Haas
Hi, is there any chance the list admins could copy the pipermail archive from lists.ceph.com over to lists.ceph.io? It seems to contain an awful lot of messages referred elsewhere by their archive URL, many (all?) of which appear to now lead to 404s. Example: google "Set existing pools to use hdd

[ceph-users] Re: Heavily-linked lists.ceph.com pipermail archive now appears to lead to 404s

2019-09-03 Thread Florian Haas
Hi, replying to my own message here in a shameless attempt to re-up this. I really hope that the list archive can be resurrected in one way or another... Cheers, Florian On 29/08/2019 15:00, Florian Haas wrote: > Hi, > > is there any chance the list admins could copy the pipermai

[ceph-users] Re: Heavily-linked lists.ceph.com pipermail archive now appears to lead to 404s

2019-09-05 Thread Florian Haas
On 03/09/2019 18:42, Ilya Dryomov wrote: > On Tue, Sep 3, 2019 at 6:29 PM Florian Haas wrote: >> >> Hi, >> >> replying to my own message here in a shameless attempt to re-up this. I >> really hope that the list archive can be resurrected in one way or >> an

[ceph-users] Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

2019-10-09 Thread Florian Haas
Hi, I am currently dealing with a cluster that's been in use for 5 years and during that time, has never had its radosgw usage log trimmed. Now that the cluster has been upgraded to Nautilus (and has completed a full deep-scrub), it is in a permanent state of HEALTH_WARN because of one large omap

[ceph-users] Re: Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

2019-10-09 Thread Florian Haas
On 09/10/2019 09:07, Florian Haas wrote: > Also, is anyone aware of any adverse side effects of increasing these > thresholds, and/or changing the usage log sharding settings, that I > should keep in mind here? Sorry, I should have checked the latest in the list archives; Paul Emmerich

[ceph-users] Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
Hello, I am running into an "interesting" issue with a PG that is being flagged as inconsistent during scrub (causing the cluster to go to HEALTH_ERR), but doesn't actually appear to contain any inconsistent objects. $ ceph health detail HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg incon

[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 13:20, Dan van der Ster wrote: > Hey Florian, > > What does the ceph.log ERR or ceph-osd log show for this inconsistency? > > -- Dan Hi Dan, what's in the log is (as far as I can see) consistent with the pg query output: 2019-10-14 08:33:57.345 7f1808fb3700 0 log_channel(cluster

[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 13:29, Dan van der Ster wrote: >> Hi Dan, >> >> what's in the log is (as far as I can see) consistent with the pg query >> output: >> >> 2019-10-14 08:33:57.345 7f1808fb3700 0 log_channel(cluster) log [DBG] : >> 10.10d scrub starts >> 2019-10-14 08:33:57.345 7f1808fb3700 -1 log_chann

[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-14 Thread Florian Haas
On 14/10/2019 17:21, Dan van der Ster wrote: >> I'd appreciate a link to more information if you have one, but a PG >> autoscaling problem wouldn't really match with the issue already >> appearing in pre-Nautilus releases. :) > > https://github.com/ceph/ceph/pull/30479 Thanks! But no, this doesn'

[ceph-users] Re: Recurring issue: PG is inconsistent, but lists no inconsistent objects

2019-10-15 Thread Florian Haas
On 14/10/2019 22:57, Reed Dier wrote: > I had something slightly similar to you. > > However, my issue was specific/limited to the device_health_metrics pool > that is auto-created with 1 PG when you turn that mgr feature on. > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg56315.htm

[ceph-users] Re: Static website hosting with RGW

2019-10-25 Thread Florian Haas
On 25/10/2019 02:38, Oliver Freyermuth wrote: > Also, if there's an expert on this: Exposing a bucket under a tenant as > static website is not possible since the colon (:) can't be encoded in DNS, > right? There are certainly much better-qualified radosgw experts than I am, but as I understand

[ceph-users] Re: Bogus Entries in RGW Usage Log / Large omap object in rgw.log pool

2019-10-29 Thread Florian Haas
Hi David, On 28/10/2019 20:44, David Monschein wrote: > Hi All, > > Running an object storage cluster, originally deployed with Nautilus > 14.2.1 and now running 14.2.4. > > Last week I was alerted to a new warning from my object storage cluster: > > [root@ceph1 ~]# ceph health detail > HEALTH_

[ceph-users] Quincy: osd_pool_default_crush_rule being ignored?

2024-09-24 Thread Florian Haas
Hello everyone, my cluster has two CRUSH rules: the default replicated_rule (rule_id 0), and another rule named rack-aware (rule_id 1). Now, if I'm not misreading the config reference, I should be able to define that all future-created pools use the rack-aware rule, by setting osd_pool_defau

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-25 Thread Florian Haas
On 25/09/2024 09:05, Eugen Block wrote: Hi, for me this worked in a 17.2.7 cluster just fine Huh, interesting! (except for erasure-coded pools). Okay, *that* bit is expected. https://docs.ceph.com/en/quincy/rados/configuration/pool-pg-config-ref/#confval-osd_pool_default_crush_rule does

[ceph-users] Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Florian Haas
Hello everyone, my cluster has two CRUSH rules: the default replicated_rule (rule_id 0), and another rule named rack-aware (rule_id 1). Now, if I'm not misreading the config reference, I should be able to define that all future-created pools use the rack-aware rule, by setting osd_pool_defau

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Florian Haas
On 25/09/2024 09:05, Eugen Block wrote: Hi, for me this worked in a 17.2.7 cluster just fine Huh, interesting! (except for erasure-coded pools). Okay, *that* bit is expected. https://docs.ceph.com/en/quincy/rados/configuration/pool-pg-config-ref/#confval-osd_pool_default_crush_rule does

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-25 Thread Florian Haas
On 25/09/2024 15:21, Eugen Block wrote: Hm, do you have any local ceph.conf on your client which has an override for this option as well? No. By the way, how do you bootstrap your cluster? Is it cephadm based? This one is bootstrapped (on Quincy) with ceph-ansible. And when the "ceph confi

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-25 Thread Florian Haas
Hi Eugen, I've just torn down and completely respun my cluster, on 17.2.7. Recreated my CRUSH rule, set osd_pool_default_crush_rule to its rule_id, 1. Created a new pool. That new pool still has crush_rule 0, just as before and contrary to what you're seeing. I'm a bit puzzled, because I'm

[ceph-users] doc: https://docs.ceph.com/ root URL still redirects to Reef

2024-12-16 Thread Florian Haas
Hi everyone, A little while back I noticed that the docs.ceph.com root URL still redirects to the docs for Reef: $ curl -I https://docs.ceph.com/ HTTP/2 302 date: Fri, 06 Dec 2024 13:04:15 GMT content-type: text/html; charset=utf-8 content-length: 0 location: https://docs.ceph.com/en/reef/ Th

[ceph-users] Experimental upgrade of a Cephadm-managed Squid cluster to Ubuntu Noble (walk-through and RFC)

2024-12-17 Thread Florian Haas
Hi everyone, as part of keeping our Ceph course updated, we recently went through the *experimental* process of upgrading a Cephadm-managed cluster from Ubuntu Jammy to Noble. Note that at this point there are no community-built Ceph packages that are available for Noble, though there *are* p

[ceph-users] Re: Experimental upgrade of a Cephadm-managed Squid cluster to Ubuntu Noble (walk-through and RFC)

2024-12-18 Thread Florian Haas
On 18/12/2024 15:37, Robert Sander wrote: Hi Florian, On 17.12.24 20:10, Florian Haas wrote: 1. Disable orchestrator scheduling for the affected node: "ceph orch host label add _no_schedule". 14. Re-enable orchestrator scheduling with "ceph orch host label rm _no_schedule&

[ceph-users] Re: Experimental upgrade of a Cephadm-managed Squid cluster to Ubuntu Noble (walk-through and RFC)

2024-12-20 Thread Florian Haas
On 20/12/2024 09:16, Robert Sander wrote: Hi Florian, Am 12/18/24 um 16:18 schrieb Florian Haas: To illustrate why, assume you've got 3 Mons in your cluster. Now, on one of your physical hosts that runs a Mon, you enter maintenance mode. This will just shut down the Mon. Now you proceed

[ceph-users] Re: Experimental upgrade of a Cephadm-managed Squid cluster to Ubuntu Noble (walk-through and RFC)

2025-01-02 Thread Florian Haas
On 02/01/2025 16:37, Redouane Kachach wrote: Just to comment on the ceph.target. Technically in a containerized ceph a node can host daemons from *many ceph clusters* (each with its own ceph_fsid). The ceph.target is a global unit and it's the root for all the clusters running in the node. There

[ceph-users] Re: Unintuitive (buggy?) CephFS behaviour when dealing with pool_namespace layout attribute

2025-03-06 Thread Florian Haas
On 05/03/2025 20:45, Frédéric Nass wrote: Hi Florian, Point 1 is certainly a bug regarding the choice of terms in the response (confusion between file and directory). Well... no, I don't think so. Rather, I'd guess it's simply a result of setfattr returning ENOTEMPTY (errno 39), which the sh

[ceph-users] Re: Unintuitive (buggy?) CephFS behaviour when dealing with pool_namespace layout attribute

2025-03-06 Thread Florian Haas
On 06/03/2025 09:46, Florian Haas wrote: On 05/03/2025 20:45, Frédéric Nass wrote: Hi Florian, Point 1 is certainly a bug regarding the choice of terms in the response (confusion between file and directory). Well... no, I don't think so. Rather, I'd guess it's simply a res

[ceph-users] Unintuitive (buggy?) CephFS behaviour when dealing with pool_namespace layout attribute

2025-03-05 Thread Florian Haas
Hello everyone, I'm seeing some behaviour in CephFS that strikes me as unexpected, and I wonder if others have thoughts about it. Consider this scenario: * Ceph Reef (18.2.4) deployed with Cephadm running on Ubuntu Jammy, CephFS client is running kernel 5.15.0-133-generic. * CephFS is mounte