Thanks for the reply @Eugen Block
Yes, there is some thing else is wrong in my server, but no clue why it's
failing and what is the cause of boortstrap failure.
I was able to bootstrap with ed keys in other server.
___
ceph-users mailing list -- ceph-
First, thanks Xiubo for your feedback !
To go further on the points raised by Sake:
- How does this happen ? -> There were no preliminary signs before the incident
- Is this avoidable? -> Good question, I'd also like to know how!
- How to fix the issue ? -> So far, no fix nor workaround from w
It depends on the cluster. In general I would say if your PG count is
already good in terms of PG-per-OSD (say between 100 and 200 each) add
capacity and then re-evaluate your PG count after.
If you have a lot of time before the gear will be racked and could undergo
some PG splits before the new g
Rados results were approved, and we successfully upgraded the gibba cluster.
Now waiting on @Dan Mick to upgrade the LRC.
On Thu, May 30, 2024 at 8:32 PM Yuri Weinstein wrote:
> I reran rados on the fix https://github.com/ceph/ceph/pull/57794/commits
> and seeking approvals from Radek and Laur
Hi All,
I'm going to be adding a bunch of OSDs to our cephfs cluster shortly
(increasing the total size by 50%). We're on reef, and will be
deploying using the cephadm method, and the OSDs are exactly the same
size and disk type as the current ones.
So, after adding the new OSDs, my underst
>
> You could check if your devices support NVMe namespaces and create more
> than one namespace on the device.
Wow, tricky. Will give it a try.
Thanks!
Łukasz Borek
luk...@borek.org.pl
On Tue, 4 Jun 2024 at 16:26, Robert Sander
wrote:
> Hi,
>
> On 6/4/24 16:15, Anthony D'Atri wrote:
>
Hi,
I recently upgraded my RHCS cluster from v4 to v5 and moved to containerized
daemons (podman) along the way. I noticed that there are a huge number of logs
going to journald on each of my hosts. I am unsure why there are so many.
I tried changing the logging level at runtime with commands lik
Hi,
I'm using reef (18.2.2); the docs talk about setting up a multi-site
setup with a spec file e.g.
rgw_realm: apus
rgw_zonegroup: apus_zg
rgw_zone: eqiad
placement:
label: "rgw"
but I don't think it's possible to configure the "hostnames" parameter
of the zonegroup (and thus control what
Hi,
On 6/4/24 16:15, Anthony D'Atri wrote:
I've wondered for years what the practical differences are between using a
namespace and a conventional partition.
Namespaces show up as separate block devices in the kernel.
The orchestrator will not touch any devices that contain a partition
tab
Or partition, or use LVM.
I've wondered for years what the practical differences are between using a
namespace and a conventional partition.
> On Jun 4, 2024, at 07:59, Robert Sander wrote:
>
> On 6/4/24 12:47, Lukasz Borek wrote:
>
>> Using cephadm, is it possible to cut part of the NVME dr
Hi all!
I'm running ceph quincy 17.2.7 in a cluster. On monday I updated the OS to
AlmaLinux 9.3 to 9.4, since then grafana shows "No Data" message in all
ceph related fields but, for example, the nodes information is still fine
(Host Detail Dashboard).
I have redeployed the mgr service with cepha
Hi Robert,
I tried, but that doesn't work :(
Using exit maintenance mode results in the error: "missing 2 required
positional arguments: 'hostname' and 'addr'"
But running the command a second time, it looks like it works, but then I get
errors with starting the containers. The start up fail
Hi,
I don't have much to contribute, but according to the source code [1]
this seems to be a non-fatal message:
void CreatePrimaryRequest::handle_unlink_peer(int r) {
CephContext *cct = m_image_ctx->cct;
ldout(cct, 15) << "r=" << r << dendl;
if (r < 0) {
lderr(cct) << "failed to un
Hi Patrick,
it has been a year now and we did not have a single crash since upgrading to
16.2.13. We still have the 19 corrupted files which are reported by 'damage
ls‘. Is it now possible to delete the corrupted files without taking the
filesystem offline?
Am 22.05.2023 um 20:23 schrieb Patri
Hi,
On 6/4/24 14:35, Sake Ceph wrote:
* Store host labels (we use labels to deploy the services)
* Fail-over MDS and MGR services if running on the host
* Remove host from cluster
* Add host to cluster again with correct labels
AFAIK the steps above are not necessary. It should be sufficient
Hi all
I'm working on a way to automate the OS upgrade of our hosts. This happens with
a complete reinstall of the OS.
What is the correct way to do this? At the moment I'm using the following:
* Store host labels (we use labels to deploy the services)
* Fail-over MDS and MGR services if running
On 6/4/24 12:47, Lukasz Borek wrote:
Using cephadm, is it possible to cut part of the NVME drive for OSD and
leave rest space for RocksDB/WALL?
Not out of the box.
You could check if your devices support NVMe namespaces and create more
than one namespace on the device. The kernel then sees m
Hi Xiubo
Thank you for the explanation! This won't be a issue for us, but made me think
twice :)
Kind regards,
Sake
> Op 04-06-2024 12:30 CEST schreef Xiubo Li :
>
>
> On 6/4/24 15:20, Sake Ceph wrote:
> > Hi,
> >
> > A little break into this thread, but I have some questions:
> > * How d
>
> I have certainly seen cases where the OMAPS have not stayed within the
> RocksDB/WAL NVME space and have been going down to disk.
How to monitor OMAPS size and if it does not get out of NVME?
The OP's number suggest IIRC like 120GB-ish for WAL+DB, though depending on
> workload spillover coul
Hi,
I think there's something else wrong with your setup, I could
bootstrap a cluster without an issue with ed keys:
ceph:~ # ssh-keygen -t ed25519
Generating public/private ed25519 key pair.
ceph:~ # cephadm --image quay.io/ceph/ceph:v18.2.2 bootstrap --mon-ip
[IP] [some more options] --s
On 6/4/24 15:20, Sake Ceph wrote:
Hi,
A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus
The detail explanation you can refer to the ceph PR:
https://github.com/ceph/ceph/pull/55421.
* Is this avoidable?
* How-
Hello,
I wanted to try out (lab ceph setup) what exactly is going to happen
when parts of data on OSD disk gets corrupted. I created a simple test
where I was going through the block device data until I found something
that resembled user data (using dd and hexdump) (/dev/sdd is a block
devic
How exactly does your crush rule look right now? I assume it's
supposed to distribute data across two sites, and since one site is
missing, the PGs stay in degraded state until the site comes back up.
You would need to either change the crush rule or assign a different
one to that pool whic
Hi,
if you can verify which data has been removed, and that client is
still connected, you might find out who was responsible for that.
Do you know which files in which directories are missing? Does that
maybe already reveal one or several users/clients?
You can query the mds daemons and insp
Hi,
A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus
* Is this avoidable?
* How-to fix the issue, because I didn't see a workaround in the mentioned
tracker (or I missed it)
* With this bug around, should you use c
25 matches
Mail list logo