[ceph-users] Re: Balancer vs. Autoscaler

2021-09-23 Thread Jan-Philipp Litza
I'll have to do some reading on what "pgp" means, but you are correct: The pg_num is already equal to pg_num_target, and only pgp_num is increasing (halfway there - at least that's something). Thanks for the suggestions, though not really applicable here! Richard Bade wrote: > If you look at the

[ceph-users] Re: High overwrite latency

2021-09-23 Thread Nico Schottelius
Hey Erwin, I'd recommend to checkout the individual OSD performance in the slower cluster. We have seen such issues with SSDs that wore off - it might just be a specific OSD / pg that you are hitting. Best regards, Nico Erwin Ceph writes: > Hi, > > We do run several Ceph clusters, but one h

[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Burkhard Linke
Hi, On 9/23/21 9:49 AM, Mark Schouten wrote: Hi, Last night we’ve had downtime on a simple three-node cluster. Here’s what happened: 2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN] message from mon.2 was stamped 8.401927s in the future, clocks not synchronized 2021-09-23 00

[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread 胡 玮文
> 在 2021年9月23日,15:50,Mark Schouten 写道: > > Hi, > > Last night we’ve had downtime on a simple three-node cluster. Here’s > what happened: > 2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN] > message from mon.2 was stamped 8.401927s in the future, clocks not > synchronized > 202

[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Robert Sander
Am 23.09.21 um 09:49 schrieb Mark Schouten: The cause of this timeshift is the terrible way that systemd-timesyncd works, depending on a single NTP-server. I always kill that one with fire: systemctl disable --now systemd-timesyncd.service systemctl mask systemd-timesyncd.service and then us

[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Dan van der Ster
On Thu, Sep 23, 2021 at 10:23 AM Robert Sander wrote: > > Am 23.09.21 um 09:49 schrieb Mark Schouten: > > > The cause of this timeshift is the terrible way that systemd-timesyncd > > works, depending on a single NTP-server. > > I always kill that one with fire: > > systemctl disable --now systemd-

[ceph-users] Error while adding Ceph/RBD for Cloudstack/KVM: pool not found

2021-09-23 Thread Mevludin Blazevic
Hi everyone, I've tried to connect my Ceph cluster to Cloudstack/KVM via the Management-GUI using the RBD protocol, but I am getting the error that the rbd pool not exists, although I have created such a rbd pool, initialized it and created a user for the RBD pool. I have performed the steps

[ceph-users] Force MGR to be active one

2021-09-23 Thread Pascal Weißhaupt
Hi, is it possible to force a specifig MGR to be the active one? In the Zabbix configuration, we only can specify 1 MGR on a node, so when this one is not in an active state, Zabbix gives us warnings about it. Pascal ___ ceph-users mailing list

[ceph-users] Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Mark Schouten
Hi, Last night we’ve had downtime on a simple three-node cluster. Here’s what happened: 2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN] message from mon.2 was stamped 8.401927s in the future, clocks not synchronized 2021-09-23 00:18:57.783437 mon.node01 (mon.0) 834386 : clu

[ceph-users] when mds_all_down open "file system" page provoque dashboard crash

2021-09-23 Thread Francois Legrand
Hi, I am testing an upgrade (from 14.2.16 to 16.2.5)  on my ceph test cluster (bar metal). I noticed (when reaching the mds upgrade) that after I stopped all the mds, opening the "file system" page on the dashboard result in a crash of the dashboard (and also of the mgr). Does someone had th

[ceph-users] Re: when mds_all_down open "file system" page provoque dashboard crash

2021-09-23 Thread Francois Legrand
The crash report is : {     "backtrace": [     "/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f86044313c0]",     "gsignal()",     "abort()",     "/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911) [0x7f86042d2911]",     "/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c) [0x7f86

[ceph-users] Re: when mds_all_down open "file system" page provoque dashboard crash

2021-09-23 Thread Ernesto Puerta
The backtrace seems to point to this tracker ( https://tracker.ceph.com/issues/51757). Kind Regards, Ernesto On Thu, Sep 23, 2021 at 2:59 PM Francois Legrand wrote: > The crash report is : > > { > "backtrace": [ > "/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) > [0x7f86044313c0

[ceph-users] Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-23 Thread Chris
This seems similar to Bug#52301 https://tracker.ceph.com/issues/52301 however in this case, various device display commands correctly describe the devices. I have five nodes with identical inventories. After applying the following spec, 4 of the nodes filled out their OSDs as expected. Node 5 and

[ceph-users] Re: Force MGR to be active one

2021-09-23 Thread ceph
You should be able to stop and start the other mgr service when your desired mgr is the active one. The recently started mgrs will then be standby. Hth Mehmet Am 23. September 2021 13:28:06 MESZ schrieb "Pascal Weißhaupt" : >Hi, > > > >is it possible to force a specifig MGR to be the active one

[ceph-users] ceph-iscsi / tcmu-runner bad pefromance with vmware esxi

2021-09-23 Thread José H . Freidhof
Hello together, i need some help on our ceph 16.2.5 cluster as iscsi target with esxi nodes background infos: - we have build 3x osd nodes with 60 bluestore osd with and 60x6TB spinning disks, 12 ssd´s and 3nvme. - osd nodes have 32cores and 256gb Ram - the osd disk are connected to

[ceph-users] "Partitioning" in RGW

2021-09-23 Thread Manuel Holtgrewe
Dear all, Is it possible to achieve the following with rgw and the S3 protocol? I have a central Ceph cluster with rgw/S3 in my organisation and I have an internal network zone and a DMZ. Access from the internal network to Ceph is of course allowed. I want to expose certain parts of the Ceph in

[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-23 Thread David Galloway
I just repushed the 16.2.6 container with remoto 1.2.1 in it. On 9/22/21 4:19 PM, David Orman wrote: > https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-4b2736a28c > > ^^ if people want to test and provide feedback for a potential merge > to EPEL8 stable. > > David > > On Wed, Sep 22, 20

[ceph-users] Is this really an 'error'? "pg_autoscaler... has overlapping roots"

2021-09-23 Thread Harry G. Coin
Is there anything to be done about groups of log messages like "pg_autoscaler ERROR root] pool has overlapping roots" The cluster reports it is healthy, and yet this is reported as an error, so-- is it an error that ought to have been reported, or is it not an error? Thanks Harry Coin ___

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-23 Thread Eugen Block
Hi, as a workaround you could just set the rotational flag by yourself: echo 0 > /sys/block/sd[X]/queue/rotational That's the one ceph-volume is searching for and it should at least enable you to deploy the rest of the OSDs. Of course, you'll need to figure out why the rotational flag is se

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-23 Thread Rainer Krienke
Hallo Dan, I am also running a productive 14.2.22 Cluster with 144 HDD-OSDs and I am thinking if I should stay with this release or upgrade to octopus. So your info is very valuable... One more question: You described that OSDs do an expected fsck and that this took roughly 10min. I guess t