[ceph-users] Re: Balancer vs. Autoscaler

2021-09-23 Thread Jan-Philipp Litza
I'll have to do some reading on what "pgp" means, but you are correct:
The pg_num is already equal to pg_num_target, and only pgp_num is
increasing (halfway there - at least that's something).

Thanks for the suggestions, though not really applicable here!

Richard Bade wrote:
> If you look at the current pg_num in that pool ls detail command that
> Dan mentioned you can set the pool pg_num to what that value currently
> is, which will effectively pause the pg changes. I did this recently
> when decreasing the number of pg's in a pool, which took several weeks
> to complete. This let me get some other maintenance done before
> setting the pg_num back to the target num again.
> This works well for reduction, but I'm not sure if it works well for
> increase as I think the pg_num may reach the target much faster and
> then just the pgp_num changes till they match.
> 
> Rich
> 
> On Wed, 22 Sept 2021 at 23:06, Dan van der Ster  wrote:
>>
>> To get an idea how much work is left, take a look at `ceph osd pool ls
>> detail`. There should be pg_num_target... The osds will merge or split PGs
>> until pg_num matches that value.
>>
>> .. Dan
>>
>>
>> On Wed, 22 Sep 2021, 11:04 Jan-Philipp Litza,  wrote:
>>
>>> Hi everyone,
>>>
>>> I had the autoscale_mode set to "on" and the autoscaler went to work and
>>> started adjusting the number of PGs in that pool. Since this implies a
>>> huge shift in data, the reweights that the balancer had carefully
>>> adjusted (in crush-compat mode) are now rubbish, and more and more OSDs
>>> become nearful (we sadly have very different sized OSDs).
>>>
>>> Now apparently both manager modules, balancer and pg_autoscaler, have
>>> the same threshold for operation, namely target_max_misplaced_ratio. So
>>> the balancer won't become active as long as the pg_autoscaler is still
>>> adjusting the number of PGs.
>>>
>>> I already set the autoscale_mode to "warn" on all pools, but apparently
>>> the autoscaler is determined to finish what it started.
>>>
>>> Is there any way to pause the autoscaler so the balancer has a chance of
>>> fixing the reweights? Because even in manual mode (ceph balancer
>>> optimize), the balancer won't compute a plan when the misplaced ratio is
>>> higher than target_max_misplaced_ratio.
>>>
>>> I know about "ceph osd reweight-*", but they adjust the reweights
>>> (visible in "ceph osd tree"), whereas the balancer adjusts the "compat
>>> weight-set", which I don't know how to convert back to the old-style
>>> reweights.
>>>
>>> Best regards,
>>> Jan-Philipp
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
Jan-Philipp Litza
PLUTEX GmbH
Hermann-Ritter-Str. 108
28197 Bremen

Hotline: 0800 100 400 800
Telefon: 0800 100 400 821
Telefax: 0800 100 400 888
E-Mail: supp...@plutex.de
Internet: http://www.plutex.de

USt-IdNr.: DE 815030856
Handelsregister: Amtsgericht Bremen, HRB 25144
Geschäftsführer: Torben Belz, Hendrik Lilienthal
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: High overwrite latency

2021-09-23 Thread Nico Schottelius


Hey Erwin,

I'd recommend to checkout the individual OSD performance in the slower
cluster. We have seen such issues with SSDs that wore off - it might
just be a specific OSD / pg that you are hitting.

Best regards,

Nico

Erwin Ceph  writes:

> Hi,
>
> We do run several Ceph clusters, but one has a strange problem.
>
> It is running Octopus 15.2.14 on 9 (HP 360 Gen 8, 64 GB, 10 Gbps) servers, 48 
> OSDs (all 2 TB Samsung SSDs with Bluestore). Monitoring in Grafana shows 
> these three latency values
> over 7 days:
>
> ceph_osd_op_r_latency_sum: avg 1.16 ms, max 9.95 ms
> ceph_osd_op_w_latency_sum: avg 5.85 ms, max 26.2 ms
> ceph_osd_op_rw_latency_sum: avf 110 ms, max 388 ms
>
> Average throughput is around 30 MB/sec read and 40 MB/sec write. Both with 
> 2000 iops.
>
> On another cluster (hardware almost the same, identical software versions), 
> but 25% lower load, there the values are:
>
> ceph_osd_op_r_latency_sum: avg 1.09 ms, max 6.55 ms
> ceph_osd_op_w_latency_sum: avg 4.46 ms, max 14.4 ms
> ceph_osd_op_rw_latency_sum: avf 4.94 ms, max 17.6 ms
>
> I can't find any difference in hba controller settings, network or 
> kerneltuning. Has someone got any ideas?
>
> Regards,
> Erwin
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


--
Sustainable and modern Infrastructures by ungleich.ch
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Burkhard Linke

Hi,

On 9/23/21 9:49 AM, Mark Schouten wrote:

Hi,

Last night we’ve had downtime on a simple three-node cluster. Here’s
what happened:
2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN]
message from mon.2 was stamped 8.401927s in the future, clocks not
synchronized
2021-09-23 00:18:57.783437 mon.node01 (mon.0) 834386 : cluster [WRN] 1
clock skew 8.40163s > max 0.05s
2021-09-23 00:18:57.783486 mon.node01 (mon.0) 834387 : cluster [WRN] 2
clock skew 8.40146s > max 0.05s
2021-09-23 00:18:59.843444 mon.node01 (mon.0) 834388 : cluster [WRN]
Health check failed: clock skew detected on mon.node02, mon.node03
(MON_CLOCK_SKEW)

The cause of this timeshift is the terrible way that systemd-timesyncd
works, depending on a single NTP-server. If that one is going haywire,
systemd-timesyncd does not check with others, but just sets the clock on
your machine incorrect. We will fix this with chrony.

However, what I don’t understand is that why the cluster does not see
the single monitor as incorrect, but the two correct machines as
incorrect. Is this because one of the three is master-ish?



I would assume that the time of the mon leader is taken as reference. If 
both other mons have a clock skew, the mon quorum will be impacted.




Obviously we will fix the time issues, but I would like to understand
the reasoning of Ceph to stop functioning because one monitor has
incorrect time.


We do not rely on external NTP servers for internal synchronization. NTP 
is running on one of our central switches, and all hosts use that switch 
as time source. The switch itself is synchronizing to an external NTP 
server (but we are currently thinking about using an NTP USB receiver on 
one machine as additional reference). Even if internet connection is 
lost, NTP sync is not possible and the switch's time starts to shift, 
all machines will perform the same shift.



Regards,

Burkhard


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread 胡 玮文
> 在 2021年9月23日,15:50,Mark Schouten  写道:
>
> Hi,
>
> Last night we’ve had downtime on a simple three-node cluster. Here’s
> what happened:
> 2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN]
> message from mon.2 was stamped 8.401927s in the future, clocks not
> synchronized
> 2021-09-23 00:18:57.783437 mon.node01 (mon.0) 834386 : cluster [WRN] 1
> clock skew 8.40163s > max 0.05s
> 2021-09-23 00:18:57.783486 mon.node01 (mon.0) 834387 : cluster [WRN] 2
> clock skew 8.40146s > max 0.05s
> 2021-09-23 00:18:59.843444 mon.node01 (mon.0) 834388 : cluster [WRN]
> Health check failed: clock skew detected on mon.node02, mon.node03
> (MON_CLOCK_SKEW)
>
> The cause of this timeshift is the terrible way that systemd-timesyncd
> works, depending on a single NTP-server. If that one is going haywire,
> systemd-timesyncd does not check with others, but just sets the clock on
> your machine incorrect. We will fix this with chrony.
>
> However, what I don’t understand is that why the cluster does not see
> the single monitor as incorrect, but the two correct machines as
> incorrect. Is this because one of the three is master-ish?

I believe yes. “ceph mon stat” will tell you which one is the leader.

> Obviously we will fix the time issues, but I would like to understand
> the reasoning of Ceph to stop functioning because one monitor has
> incorrect time.
>
> Thanks!
>
> --
> Mark Schouten
> CTO, Tuxis B.V. | https://www.tuxis.nl/
>   | +31 318 200208
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Robert Sander

Am 23.09.21 um 09:49 schrieb Mark Schouten:


The cause of this timeshift is the terrible way that systemd-timesyncd
works, depending on a single NTP-server.


I always kill that one with fire:

systemctl disable --now systemd-timesyncd.service
systemctl mask systemd-timesyncd.service

and then use chrony or ntpd.

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Dan van der Ster
On Thu, Sep 23, 2021 at 10:23 AM Robert Sander
 wrote:
>
> Am 23.09.21 um 09:49 schrieb Mark Schouten:
>
> > The cause of this timeshift is the terrible way that systemd-timesyncd
> > works, depending on a single NTP-server.
>
> I always kill that one with fire:
>
> systemctl disable --now systemd-timesyncd.service
> systemctl mask systemd-timesyncd.service
>
> and then use chrony or ntpd.

Moreover, if you use chronyd, be sure to also enable the `chrony-wait`
service, which will ensure that ceph daemons are not started before
time is synchronized.

Cheers, Dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Error while adding Ceph/RBD for Cloudstack/KVM: pool not found

2021-09-23 Thread Mevludin Blazevic

Hi everyone,

I've tried to connect my Ceph cluster to Cloudstack/KVM via the 
Management-GUI using the RBD protocol, but I am getting the error that 
the rbd pool not exists, although I have created such a rbd pool, 
initialized it and created a user for the RBD pool. I have performed the 
steps described at https://docs.ceph.com/en/pacific/rbd/rbd-cloudstack/.


I really do not know where I went wrong during the installation. As far 
as I know, no preliminary configuration is needed on the KVM machines. 
QEMU/KVM and libvirt installation was carried out while installing 
CloudStack. The secret-xml should be created after typing the 
information on the GUI but nothing happens. I am getting always the same 
error:


2021-09-23 11:53:57,737 ERROR [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-5:null) (logid:699ed00c) Failed to create RBD 
storage pool: org.libvirt.LibvirtException: failed to create the RBD 
IoCTX. Does the pool 'sparci-rbd' exist?: No such file or directory
2021-09-23 11:53:57,738 ERROR [kvm.storage.LibvirtStorageAdaptor] 
(agentRequest-Handler-5:null) (logid:699ed00c) Failed to create the RBD 
storage pool, cleaning up the libvirt secret


While filling out the information on the GUI, I have picked a monitor 
node which is documented under /etc/ceph/ceph.conf. Performing the ceph 
df command, you can see that the rbd pool really exists:


--- POOLS ---
POOL   ID  PGS  STORED  OBJECTS  ...
device_health_metrics   1    1  25 MiB  320  ...
sparci-ec   2   32 0 B    0  ...
sparci-rbd  3   32    19 B    1  ...

Have I missed out some extra installation steps needed on the ceph machines?

Cheers

Mevludin

--
Mevludin Blazevic

University of Koblenz-Landau
Computing Centre (GHRKO)
Universitaetsstrasse 1
D-56070 Koblenz, Germany

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Force MGR to be active one

2021-09-23 Thread Pascal Weißhaupt
Hi,



is it possible to force a specifig MGR to be the active one? In the Zabbix 
configuration, we only can specify 1 MGR on a node, so when this one is not in 
an active state, Zabbix gives us warnings about it.





Pascal
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cluster downtime due to unsynchronized clocks

2021-09-23 Thread Mark Schouten
Hi,

Last night we’ve had downtime on a simple three-node cluster. Here’s 
what happened:
2021-09-23 00:18:48.331528 mon.node01 (mon.0) 834384 : cluster [WRN] 
message from mon.2 was stamped 8.401927s in the future, clocks not 
synchronized
2021-09-23 00:18:57.783437 mon.node01 (mon.0) 834386 : cluster [WRN] 1 
clock skew 8.40163s > max 0.05s
2021-09-23 00:18:57.783486 mon.node01 (mon.0) 834387 : cluster [WRN] 2 
clock skew 8.40146s > max 0.05s
2021-09-23 00:18:59.843444 mon.node01 (mon.0) 834388 : cluster [WRN] 
Health check failed: clock skew detected on mon.node02, mon.node03 
(MON_CLOCK_SKEW)

The cause of this timeshift is the terrible way that systemd-timesyncd 
works, depending on a single NTP-server. If that one is going haywire, 
systemd-timesyncd does not check with others, but just sets the clock on 
your machine incorrect. We will fix this with chrony.

However, what I don’t understand is that why the cluster does not see 
the single monitor as incorrect, but the two correct machines as 
incorrect. Is this because one of the three is master-ish?

Obviously we will fix the time issues, but I would like to understand 
the reasoning of Ceph to stop functioning because one monitor has 
incorrect time.

Thanks!

--
Mark Schouten
CTO, Tuxis B.V. | https://www.tuxis.nl/
  | +31 318 200208
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] when mds_all_down open "file system" page provoque dashboard crash

2021-09-23 Thread Francois Legrand

Hi,

I am testing an upgrade (from 14.2.16 to 16.2.5)  on my ceph test 
cluster (bar metal).


I noticed (when reaching the mds upgrade) that after I stopped all the 
mds, opening the "file system" page on the dashboard result in a crash 
of the dashboard (and also of the mgr). Does someone had this issue ?


F.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: when mds_all_down open "file system" page provoque dashboard crash

2021-09-23 Thread Francois Legrand

The crash report is :

{
    "backtrace": [
    "/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0) [0x7f86044313c0]",
    "gsignal()",
    "abort()",
    "/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911) [0x7f86042d2911]",
    "/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c) [0x7f86042de38c]",
    "/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7) [0x7f86042de3f7]",
    "/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9) [0x7f86042de6a9]",
    "(std::__throw_out_of_range(char const*)+0x41) [0x7f86042d537e]",
    "(Client::resolve_mds(std::__cxx11::basic_stringstd::char_traits, std::allocator > const&, 
std::vector >*)+0x1306) 
[0x563db199e076]",
    "(Client::mds_command(std::__cxx11::basic_stringstd::char_traits, std::allocator > const&, 
std::vector, 
std::allocator >, std::allocatorstd::char_traits, std::allocator > > > const&, 
ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*, 
std::__cxx11::basic_string, 
std::allocator >*, Context*)+0x179) [0x563db19baa69]",

    "/usr/bin/ceph-mgr(+0x1d185d) [0x563db17db85d]",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a170e) 
[0x7f860d5e770e]",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d) 
[0x7f860d3bad6d]",

    "_PyEval_EvalFrameDefault()",
    "_PyEval_EvalCodeWithName()",
    "_PyFunction_Vectorcall()",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8daa) 
[0x7f860d5eedaa]",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d) 
[0x7f860d3bad6d]",

    "_PyEval_EvalFrameDefault()",
    "_PyEval_EvalCodeWithName()",
    "_PyFunction_Vectorcall()",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d) 
[0x7f860d3bad6d]",

    "_PyEval_EvalFrameDefault()",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b) 
[0x7f860d3c606b]",

    "PyVectorcall_Call()",
    "_PyEval_EvalFrameDefault()",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b) 
[0x7f860d3c606b]",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d) 
[0x7f860d3bad6d]",

    "_PyEval_EvalFrameDefault()",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b) 
[0x7f860d3c606b]",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d) 
[0x7f860d3bad6d]",

    "_PyEval_EvalFrameDefault()",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b) 
[0x7f860d3c606b]",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8e2b) 
[0x7f860d5eee2b]",

    "PyVectorcall_Call()",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x116c01) 
[0x7f860d45cc01]",
    "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x17d51b) 
[0x7f860d4c351b]",

    "/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f8604425609]",
    "clone()"
    ],
    "ceph_version": "16.2.5",
    "os_id": "ubuntu",
    "os_name": "Ubuntu",
    "os_version": "20.04.3 LTS (Focal Fossa)",
    "os_version_id": "20.04",
    "process_name": "ceph-mgr",
    "stack_sig": 
"9a65d0019b8102fdaee8fd29c30e3aef3b86660d33fc6cd9bd51f57844872b2a",

    "timestamp": "2021-09-23T12:27:29.137868Z",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.0-86-generic",
    "utsname_sysname": "Linux",
    "utsname_version": "#97-Ubuntu SMP Fri Sep 17 19:19:40 UTC 2021"
}

Le 23/09/2021 à 14:55, Francois Legrand a écrit :

Hi,

I am testing an upgrade (from 14.2.16 to 16.2.5)  on my ceph test 
cluster (bar metal).


I noticed (when reaching the mds upgrade) that after I stopped all the 
mds, opening the "file system" page on the dashboard result in a crash 
of the dashboard (and also of the mgr). Does someone had this issue ?


F.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: when mds_all_down open "file system" page provoque dashboard crash

2021-09-23 Thread Ernesto Puerta
The backtrace seems to point to this tracker (
https://tracker.ceph.com/issues/51757).

Kind Regards,
Ernesto


On Thu, Sep 23, 2021 at 2:59 PM Francois Legrand 
wrote:

> The crash report is :
>
> {
>  "backtrace": [
>  "/lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)
> [0x7f86044313c0]",
>  "gsignal()",
>  "abort()",
>  "/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911) [0x7f86042d2911]",
>  "/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c) [0x7f86042de38c]",
>  "/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7) [0x7f86042de3f7]",
>  "/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9) [0x7f86042de6a9]",
>  "(std::__throw_out_of_range(char const*)+0x41) [0x7f86042d537e]",
>  "(Client::resolve_mds(std::__cxx11::basic_string std::char_traits, std::allocator > const&,
> std::vector >*)+0x1306)
> [0x563db199e076]",
>  "(Client::mds_command(std::__cxx11::basic_string std::char_traits, std::allocator > const&,
> std::vector,
> std::allocator >, std::allocator std::char_traits, std::allocator > > > const&,
> ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*,
> std::__cxx11::basic_string,
> std::allocator >*, Context*)+0x179) [0x563db19baa69]",
>  "/usr/bin/ceph-mgr(+0x1d185d) [0x563db17db85d]",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a170e)
> [0x7f860d5e770e]",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)
> [0x7f860d3bad6d]",
>  "_PyEval_EvalFrameDefault()",
>  "_PyEval_EvalCodeWithName()",
>  "_PyFunction_Vectorcall()",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8daa)
> [0x7f860d5eedaa]",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)
> [0x7f860d3bad6d]",
>  "_PyEval_EvalFrameDefault()",
>  "_PyEval_EvalCodeWithName()",
>  "_PyFunction_Vectorcall()",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)
> [0x7f860d3bad6d]",
>  "_PyEval_EvalFrameDefault()",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)
> [0x7f860d3c606b]",
>  "PyVectorcall_Call()",
>  "_PyEval_EvalFrameDefault()",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)
> [0x7f860d3c606b]",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)
> [0x7f860d3bad6d]",
>  "_PyEval_EvalFrameDefault()",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)
> [0x7f860d3c606b]",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x74d6d)
> [0x7f860d3bad6d]",
>  "_PyEval_EvalFrameDefault()",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x8006b)
> [0x7f860d3c606b]",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x2a8e2b)
> [0x7f860d5eee2b]",
>  "PyVectorcall_Call()",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x116c01)
> [0x7f860d45cc01]",
>  "/lib/x86_64-linux-gnu/libpython3.8.so.1.0(+0x17d51b)
> [0x7f860d4c351b]",
>  "/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609) [0x7f8604425609]",
>  "clone()"
>  ],
>  "ceph_version": "16.2.5",
>  "os_id": "ubuntu",
>  "os_name": "Ubuntu",
>  "os_version": "20.04.3 LTS (Focal Fossa)",
>  "os_version_id": "20.04",
>  "process_name": "ceph-mgr",
>  "stack_sig":
> "9a65d0019b8102fdaee8fd29c30e3aef3b86660d33fc6cd9bd51f57844872b2a",
>  "timestamp": "2021-09-23T12:27:29.137868Z",
>  "utsname_machine": "x86_64",
>  "utsname_release": "5.4.0-86-generic",
>  "utsname_sysname": "Linux",
>  "utsname_version": "#97-Ubuntu SMP Fri Sep 17 19:19:40 UTC 2021"
> }
>
> Le 23/09/2021 à 14:55, Francois Legrand a écrit :
> > Hi,
> >
> > I am testing an upgrade (from 14.2.16 to 16.2.5)  on my ceph test
> > cluster (bar metal).
> >
> > I noticed (when reaching the mds upgrade) that after I stopped all the
> > mds, opening the "file system" page on the dashboard result in a crash
> > of the dashboard (and also of the mgr). Does someone had this issue ?
> >
> > F.
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-23 Thread Chris
This seems similar to Bug#52301 https://tracker.ceph.com/issues/52301
however in this case, various device display commands correctly describe
the devices.

I have five nodes with identical inventories. After applying the following
spec, 4 of the nodes filled out their OSDs as expected.  Node 5 and all of
its OSDs were omitted entirely because one of the SSDs is being identified
as a rotational disk.

The osd spec assigns one SSD to handle the db for all the spinning drives
and another SSD to handle the wal for those same drives.  The SSDs are
identical, so I can't refer to them by model number or type.  Instead the
spec limits the use of only 1 SSD for each role.

A secondary problem this causes is that the osd apply spec can't be deleted
or modified because it's unable to complete its mission until all the OSDs
already created are manually deleted.  I've already redeployed the cluster
twice in an effort to overcome the issue; seems that's not a workaround!

Has anyone seen this or perhaps knows of a way to clear the blockage?  20%
of my cluster is unusable till I figure this out.


*osdspec.yaml*

service_type: osd
service_id: osd_spec_default
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0
  limit: 1
wal_devices:
  rotational: 0
  limit: 1


*example good node: cephadm ceph-volume inventory*

Device Path   Size rotates available Model name
/dev/sda  931.51 GBTrueFalse ST91000640SS
/dev/sdb  931.51 GBTrueFalse ST91000640SS
/dev/sdc  931.51 GBTrueFalse ST91000640SS
/dev/sdd  931.51 GBTrueFalse ST91000640SS
/dev/sde  931.51 GBTrueFalse ST91000640SS
/dev/sdf  186.31 GBFalse   False HUSSL4020BSS600
/dev/sdg  419.19 GBTrueFalse ST9450404SS
/dev/sdh  419.19 GBTrueFalse ST9450404SS
/dev/sdi  186.31 GBFalse   False HUSSL4020BSS600
/dev/sdj  136.73 GBTrueFalse ST9146852SS
/dev/sdk  136.73 GBTrueFalse ST9146852SS
/dev/sdl  136.73 GBTrueFalse ST9146852SS
/dev/sdn  136.73 GBTrueFalse ST9146852SS
/dev/sdo  136.73 GBTrueFalse ST9146852SS
/dev/sdp  136.73 GBTrueFalse ST9146852SS
/dev/sdq  136.73 GBTrueFalse ST9146852SS
/dev/sdr  136.73 GBTrueFalse ST9146852SS
/dev/sds  136.73 GBTrueFalse ST9146852SS
/dev/sdt  136.73 GBTrueFalse ST9146852SS
/dev/sdu  136.73 GBTrueFalse ST9146852SS
/dev/sdv  136.73 GBTrueFalse ST9146852SS
/dev/sdw  136.73 GBTrueFalse ST9146852SS
/dev/sdx  136.73 GBTrueFalse ST9146852SS
/dev/sdy  136.12 GBTrueFalse VIRTUAL DISK


*problem node cephadm ceph-volume inventory*

Device Path   Size rotates available Model name
/dev/sdr  186.31 GBFalse   True  HUSSL4020BSS600
/dev/sds  186.31 GBFalse   True  HUSSL4020BSS600
/dev/sdaa 232.89 GBTrueFalse FUJITSU MHZ2250B
/dev/sdc  136.73 GBTrueFalse ST9146852SS
/dev/sdd  136.73 GBTrueFalse ST9146852SS
/dev/sde  136.73 GBTrueFalse ST9146852SS
/dev/sdf  136.73 GBTrueFalse ST9146852SS
/dev/sdg  136.73 GBTrueFalse ST9146852SS
/dev/sdh  136.73 GBTrueFalse ST9146852SS
/dev/sdi  136.73 GBTrueFalse ST9146852SS
/dev/sdk  136.73 GBTrueFalse ST9146852SS
/dev/sdl  136.73 GBTrueFalse ST9146852SS
/dev/sdm  136.73 GBTrueFalse ST9146852SS
/dev/sdn  136.73 GBTrueFalse ST9146852SS
/dev/sdo  931.51 GBTrueFalse ST91000640SS
/dev/sdp  931.51 GBTrueFalse ST91000640SS
/dev/sdq  419.19 GBTrueFalse ST9450404SS
/dev/sdt  419.19 GBTrueFalse ST9450404SS
/dev/sdu  136.73 GBTrueFalse ST9146852SS
/dev/sdv  931.51 GBTrueFalse ST91000640SS
/dev/sdw  931.51 GBTrueFalse ST91000640SS
/dev/sdx  931.51 GBTrueFalse ST91000640SS
/dev/sdy  931.51 GBTrueFalse ST91000640SS
/dev/sdz  931.51 GBTrueFalse ST91000640SS


*log excerpt from the ceph mgr container's logs*

I can provide a dump of the cephadm log from the mgr contai

[ceph-users] Re: Force MGR to be active one

2021-09-23 Thread ceph
You should be able to stop and start the other mgr service when your desired 
mgr is the active one. The recently started mgrs will then be standby.

Hth
Mehmet

Am 23. September 2021 13:28:06 MESZ schrieb "Pascal Weißhaupt" 
:
>Hi,
>
>
>
>is it possible to force a specifig MGR to be the active one? In the Zabbix 
>configuration, we only can specify 1 MGR on a node, so when this one is not in 
>an active state, Zabbix gives us warnings about it.
>
>
>
>
>
>Pascal
>___
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-iscsi / tcmu-runner bad pefromance with vmware esxi

2021-09-23 Thread José H . Freidhof
Hello together,

i need some help on our ceph 16.2.5 cluster as iscsi target with esxi nodes

background infos:

   - we have build 3x osd nodes with 60 bluestore osd with and 60x6TB
   spinning disks, 12 ssd´s and 3nvme.
   - osd nodes have 32cores and 256gb Ram
   - the osd disk are connected to a scsi raid controller ... each disk is
   configured as raid0 and with write back enabled to use the raid controller
   cache etc.
   - we have 3x mons and 2x iscsi gateways
   - all servers are connected on a 10Gbit network (switches)
   - all servers have two 10gbit network adapter configured as bond-rr
   - we created one rbd pool with autoscaling and 128pg (at the moment)
   - in the pool are at the moment 5 rbd images... 2x 10tb and 3x500gb with
   feature exlusic lock and striping v2 (4mb obj / 1mb stipe / count 4)
   - All the images are attached to the two iscsi gateays running
   tcmu-runner 1.5.4 and exposed as iscsi target
   - we have 6 esxi 6.7u3 servers as computed node connected to the ceph
   iscsi target

esxi iscsi config:
esxcli system settings advanced set -o /ISCSI/MaxIoSizeKB -i 512
esxcli system module parameters set -m iscsi_vmk -p iscsivmk_LunQDepth=64
esxcli system module parameters set -m iscsi_vmk -p iscsivmk_HostQDepth=64
esxcli system settings advanced set --int-value 1 --option
/DataMover/HardwareAcceleratedMove

the osd nodes, mons, rgw/iscsi gateways and esxi nodes are all connected to
the 10gbit network with bond-rr

rbd benchmark test:

root@cd133-ceph-osdh-01:~# rados bench -p rbd 10 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size
4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_cd133-ceph-osdh-01_87894
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
0   0 0 0 0 0   -   0
1  166953   211.987   2120.2505780.249261
2  16   129   113   225.976   2400.2965190.266439
3  16   183   167   222.641   2160.2194220.273838
4  16   237   221   220.974   2160.469045 0.28091
5  16   292   276   220.773   2200.249321 0.27565
6  16   339   323   215.307   1880.205553 0.28624
7  16   390   374   213.688   2040.1884040.290426
8  16   457   441   220.472   2680.1812540.286525
9  16   509   493   219.083   2080.2505380.286832
   10  16   568   552   220.772   2360.3078290.286076
Total time run: 10.2833
Total writes made:  568
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 220.941
Stddev Bandwidth:   22.295
Max bandwidth (MB/sec): 268
Min bandwidth (MB/sec): 188
Average IOPS:   55
Stddev IOPS:5.57375
Max IOPS:   67
Min IOPS:   47
Average Latency(s): 0.285903
Stddev Latency(s):  0.115162
Max latency(s): 0.88187
Min latency(s): 0.119276
Cleaning up (deleting benchmark objects)
Removed 568 objects
Clean up completed and total clean up time :3.18627

the rbd benchmark says that min 250 mb/s is possible... but i saw realy
much more... up to 550mb/s

if i start iftop on one osd node i see the ceph iscsi gw names as rgw and
the traffic is nearly 80mb/s
[image: grafik]


the ceph dashboard shows that the write iscsi performance are only 40mb/s
the max value i saw was between 40 and 60mb/s.. very poor
[image: grafik]


if i look into the vcenter and esxi datastore performance i see very high
storage device latencys between 50 and 100ms... very bad
[image: grafik]


root@cd133-ceph-mon-01:/home/cephadm# ceph config dump
WHO   MASK   LEVEL
OPTION   VALUE
   RO
global   basic
container_image
docker.io/ceph/ceph@sha256:829ebf54704f2d827de00913b171e5da741aad9b53c1f35ad59251524790eceb
 *
global   advanced
journal_max_write_bytes  1073714824
global   advanced
journal_max_write_entries1
global   advanced
mon_osd_cache_size   1024
global   dev
osd_client_watch_timeout 15
global 

[ceph-users] "Partitioning" in RGW

2021-09-23 Thread Manuel Holtgrewe
Dear all,

Is it possible to achieve the following with rgw and the S3 protocol?

I have a central Ceph cluster with rgw/S3 in my organisation and I have an
internal network zone and a DMZ. Access from the internal network to Ceph
is of course allowed.

I want to expose certain parts of the Ceph in the DMZ. The easiest solution
would be to simply put a reverse proxy in the DMZ and allow the reverse
proxy to access my rgws via HTTP(S) in the firewall.

However, this provides access to ALL of my S3 data also from the DMZ.

Is there a built-in feature in Ceph/rgw that would allow me to limit access
to certain buckets only when they come from the DMZ?

Of course, I could use the multi-tenancy feature OR even use user prefixes
to limit access to a "public" tenant or users with prefix "public-". This
would be fairly simply to configure with nginx, for example, to forward
"everything '/public:*'" to "https://s3.example.com/public:*";.

Best wishes,
Manuel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-23 Thread David Galloway
I just repushed the 16.2.6 container with remoto 1.2.1 in it.

On 9/22/21 4:19 PM, David Orman wrote:
> https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-4b2736a28c
> 
> ^^ if people want to test and provide feedback for a potential merge
> to EPEL8 stable.
> 
> David
> 
> On Wed, Sep 22, 2021 at 11:43 AM David Orman  wrote:
>>
>> I'm wondering if this was installed using pip/pypi before, and now
>> switched to using EPEL? That would explain it - 1.2.1 may never have
>> been pushed to EPEL.
>>
>> David
>>
>> On Wed, Sep 22, 2021 at 11:26 AM David Orman  wrote:
>>>
>>> We'd worked on pushing a change to fix
>>> https://tracker.ceph.com/issues/50526 for a deadlock in remoto here:
>>> https://github.com/alfredodeza/remoto/pull/63
>>>
>>> A new version, 1.2.1, was built to help with this. With the Ceph
>>> release 16.2.6 (at least), we see 1.1.4 is again part of the
>>> containers. Looking at EPEL8, all that is built now is 1.1.4. We're
>>> not sure what happened, but would it be possible to get 1.2.1 pushed
>>> to EPEL8 again, and figure out why it was removed? We'd then need a
>>> rebuild of the 16.2.6 containers to 'fix' this bug.
>>>
>>> This is definitely a high urgency bug, as it impacts any deployments
>>> with medium to large counts of OSDs or split db/wal devices, like many
>>> modern deployments.
>>>
>>> https://koji.fedoraproject.org/koji/packageinfo?packageID=18747
>>> https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/p/
>>>
>>> Respectfully,
>>> David Orman
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Is this really an 'error'? "pg_autoscaler... has overlapping roots"

2021-09-23 Thread Harry G. Coin

Is there anything to be done about groups of log messages like

"pg_autoscaler ERROR root] pool  has overlapping roots"

The cluster reports it is healthy, and yet this is reported as an error, 
so-- is it an error that ought to have been reported, or is it not an error?


Thanks

Harry Coin


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-23 Thread Eugen Block

Hi,

as a workaround you could just set the rotational flag by yourself:

echo 0 > /sys/block/sd[X]/queue/rotational

That's the one ceph-volume is searching for and it should at least  
enable you to deploy the rest of the OSDs. Of course, you'll need to  
figure out why the rotational flag is set incorrectly, otherwise  
replacing/redeploying OSDs will most likely fail again.


Regards,
Eugen



Zitat von Chris :


This seems similar to Bug#52301 https://tracker.ceph.com/issues/52301
however in this case, various device display commands correctly describe
the devices.

I have five nodes with identical inventories. After applying the following
spec, 4 of the nodes filled out their OSDs as expected.  Node 5 and all of
its OSDs were omitted entirely because one of the SSDs is being identified
as a rotational disk.

The osd spec assigns one SSD to handle the db for all the spinning drives
and another SSD to handle the wal for those same drives.  The SSDs are
identical, so I can't refer to them by model number or type.  Instead the
spec limits the use of only 1 SSD for each role.

A secondary problem this causes is that the osd apply spec can't be deleted
or modified because it's unable to complete its mission until all the OSDs
already created are manually deleted.  I've already redeployed the cluster
twice in an effort to overcome the issue; seems that's not a workaround!

Has anyone seen this or perhaps knows of a way to clear the blockage?  20%
of my cluster is unusable till I figure this out.


*osdspec.yaml*

service_type: osd
service_id: osd_spec_default
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
db_devices:
  rotational: 0
  limit: 1
wal_devices:
  rotational: 0
  limit: 1


*example good node: cephadm ceph-volume inventory*

Device Path   Size rotates available Model name
/dev/sda  931.51 GBTrueFalse ST91000640SS
/dev/sdb  931.51 GBTrueFalse ST91000640SS
/dev/sdc  931.51 GBTrueFalse ST91000640SS
/dev/sdd  931.51 GBTrueFalse ST91000640SS
/dev/sde  931.51 GBTrueFalse ST91000640SS
/dev/sdf  186.31 GBFalse   False HUSSL4020BSS600
/dev/sdg  419.19 GBTrueFalse ST9450404SS
/dev/sdh  419.19 GBTrueFalse ST9450404SS
/dev/sdi  186.31 GBFalse   False HUSSL4020BSS600
/dev/sdj  136.73 GBTrueFalse ST9146852SS
/dev/sdk  136.73 GBTrueFalse ST9146852SS
/dev/sdl  136.73 GBTrueFalse ST9146852SS
/dev/sdn  136.73 GBTrueFalse ST9146852SS
/dev/sdo  136.73 GBTrueFalse ST9146852SS
/dev/sdp  136.73 GBTrueFalse ST9146852SS
/dev/sdq  136.73 GBTrueFalse ST9146852SS
/dev/sdr  136.73 GBTrueFalse ST9146852SS
/dev/sds  136.73 GBTrueFalse ST9146852SS
/dev/sdt  136.73 GBTrueFalse ST9146852SS
/dev/sdu  136.73 GBTrueFalse ST9146852SS
/dev/sdv  136.73 GBTrueFalse ST9146852SS
/dev/sdw  136.73 GBTrueFalse ST9146852SS
/dev/sdx  136.73 GBTrueFalse ST9146852SS
/dev/sdy  136.12 GBTrueFalse VIRTUAL DISK


*problem node cephadm ceph-volume inventory*

Device Path   Size rotates available Model name
/dev/sdr  186.31 GBFalse   True  HUSSL4020BSS600
/dev/sds  186.31 GBFalse   True  HUSSL4020BSS600
/dev/sdaa 232.89 GBTrueFalse FUJITSU MHZ2250B
/dev/sdc  136.73 GBTrueFalse ST9146852SS
/dev/sdd  136.73 GBTrueFalse ST9146852SS
/dev/sde  136.73 GBTrueFalse ST9146852SS
/dev/sdf  136.73 GBTrueFalse ST9146852SS
/dev/sdg  136.73 GBTrueFalse ST9146852SS
/dev/sdh  136.73 GBTrueFalse ST9146852SS
/dev/sdi  136.73 GBTrueFalse ST9146852SS
/dev/sdk  136.73 GBTrueFalse ST9146852SS
/dev/sdl  136.73 GBTrueFalse ST9146852SS
/dev/sdm  136.73 GBTrueFalse ST9146852SS
/dev/sdn  136.73 GBTrueFalse ST9146852SS
/dev/sdo  931.51 GBTrueFalse ST91000640SS
/dev/sdp  931.51 GBTrueFalse ST91000640SS
/dev/sdq  419.19 GBTrueFalse ST9450404SS
/dev/sdt  419.19 GBTrueFalse ST9450404SS
/dev/sdu  136.73 GBTrueFalse ST9146852SS
/dev/sdv  931.51 GB

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-23 Thread Rainer Krienke

Hallo Dan,

I am also running a productive  14.2.22 Cluster with 144 HDD-OSDs and I 
am thinking if I should stay with this release or upgrade to octopus. So 
your info is very valuable...


One more question: You described that OSDs do an expected fsck and that 
this took roughly 10min. I guess the fsck is done in parallel for all 
OSDs of one host? So the total down-time for one host regarding fsck 
should not be much more than say 15min, isn't it?


Are you using SSDs or HDDs in your cluster?

Thanks
Rainer

Am 21.09.21 um 12:09 schrieb Dan van der Ster:

Dear friends,

This morning we upgraded our pre-prod cluster from 14.2.22 to 15.2.14,
successfully, following the procedure at
https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus
It's a 400TB cluster which is 10% used with 72 osds (block=hdd,
block.db=ssd) and 40M objects.

* The mons upgraded cleanly as expected.
* One minor surprise was that the mgrs respawned themselves moments
after the leader restarted into octopus:



--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 
1001312

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io