[ceph-users] Re: cephfs-top causes 16 mgr modules have recently crashed

2024-01-23 Thread Jos Collin

This fix is in the mds.
I think you need to read 
https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade.


On 23/01/24 12:19, Özkan Göksu wrote:

Hello Jos.
Thank you for the reply.

I can upgrade to 17.2.7 but I wonder can I only upgrade MON+MGR for 
this issue or do I need to upgrade all the parts?
Otherwise I need to wait few weeks. I don't want to request 
maintenance during delivery time.


root@ud-01:~# ceph orch upgrade ls
{
    "image": "quay.io/ceph/ceph ",
    "registry": "quay.io ",
    "bare_image": "ceph/ceph",
    "versions": [
        "18.2.1",
        "18.2.0",
        "18.1.3",
        "18.1.2",
        "18.1.1",
        "18.1.0",
        "17.2.7",
        "17.2.6",
        "17.2.5",
        "17.2.4",
        "17.2.3",
        "17.2.2",
        "17.2.1",
        "17.2.0"
    ]
}

Best regards

Jos Collin , 23 Oca 2024 Sal, 07:42 tarihinde şunu 
yazdı:


Please have this fix: https://tracker.ceph.com/issues/59551. It's
backported to quincy.

On 23/01/24 03:11, Özkan Göksu wrote:
> Hello
>
> When I run cephfs-top it causes mgr module crash. Can you please
tell me
> the reason?
>
> My environment:
> My ceph version 17.2.6
> Operating System: Ubuntu 22.04.2 LTS
> Kernel: Linux 5.15.0-84-generic
>
> I created the cephfs-top user with the following command:
> ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r'
osd 'allow
> r' mgr 'allow r' > /etc/ceph/ceph.client.fstop.keyring
>
> This is the crash report:
>
> root@ud-01:~# ceph crash info
> 2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801
> {
>      "backtrace": [
>          "  File \"/usr/share/ceph/mgr/stats/module.py\", line
32, in
> notify\n    self.fs_perf_stats.notify_cmd(notify_id)",
>          "  File \"/usr/share/ceph/mgr/stats/fs/perf_stats.py\",
line 177,
> in notify_cmd\n    metric_features =
>

int(metadata[CLIENT_METADATA_KEY][\"metric_spec\"][\"metric_flags\"][\"feature_bits\"],
> 16)",
>          "ValueError: invalid literal for int() with base 16: '0x'"
>      ],
>      "ceph_version": "17.2.6",
>      "crash_id":
> "2024-01-22T21:25:59.313305Z_526253e3-e8cc-4d2c-adcb-69a7c9986801",
>      "entity_name": "mgr.ud-01.qycnol",
>      "mgr_module": "stats",
>      "mgr_module_caller": "ActivePyModule::notify",
>      "mgr_python_exception": "ValueError",
>      "os_id": "centos",
>      "os_name": "CentOS Stream",
>      "os_version": "8",
>      "os_version_id": "8",
>      "process_name": "ceph-mgr",
>      "stack_sig":
> "971ae170f1fff7f7bc0b7ae86d164b2b0136a8bd5ca7956166ea5161e51ad42c",
>      "timestamp": "2024-01-22T21:25:59.313305Z",
>      "utsname_hostname": "ud-01",
>      "utsname_machine": "x86_64",
>      "utsname_release": "5.15.0-84-generic",
>      "utsname_sysname": "Linux",
>      "utsname_version": "#93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023"
> }
>
>
> Best regards.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread Nicolas FOURNIL
 Hello,

I've just fresh upgrade from Quincy to Reef and my graphs are now blank...
after investigations, it seems that discovery service is not working
because of no certificate :

# ceph orch sd dump cert
Error EINVAL: No certificate found for service discovery

Maybe an upgrade issue ?

Is there a way to generate or replace the certificate properly ?

Regards

Nicolas F.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread David C.
Hello Nicolas,

I don't know if it's an update issue.

If this is not a problem for you, you can consider redeploying
grafana/prometheus.

It is also possible to inject your own certificates :

https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example

https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2



Cordialement,

*David CASIER*




Le mar. 23 janv. 2024 à 10:56, Nicolas FOURNIL 
a écrit :

>  Hello,
>
> I've just fresh upgrade from Quincy to Reef and my graphs are now blank...
> after investigations, it seems that discovery service is not working
> because of no certificate :
>
> # ceph orch sd dump cert
> Error EINVAL: No certificate found for service discovery
>
> Maybe an upgrade issue ?
>
> Is there a way to generate or replace the certificate properly ?
>
> Regards
>
> Nicolas F.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread Nicolas FOURNIL
Hello,

Thanks for advice but Prometheus cert is ok, (Self signed) and tested with
curl and web navigator.

 it seems to be the "Service discovery" certificate from cephadm who is
missing but I cannot figure out how to set it.

There's in the code a function to create this certificate inside the Key
store but how ... that's the point :-(

Regards.



Le mar. 23 janv. 2024 à 15:52, David C.  a écrit :

> Hello Nicolas,
>
> I don't know if it's an update issue.
>
> If this is not a problem for you, you can consider redeploying
> grafana/prometheus.
>
> It is also possible to inject your own certificates :
>
> https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example
>
>
> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
>
> 
>
> Cordialement,
>
> *David CASIER*
> 
>
>
>
> Le mar. 23 janv. 2024 à 10:56, Nicolas FOURNIL 
> a écrit :
>
>>  Hello,
>>
>> I've just fresh upgrade from Quincy to Reef and my graphs are now blank...
>> after investigations, it seems that discovery service is not working
>> because of no certificate :
>>
>> # ceph orch sd dump cert
>> Error EINVAL: No certificate found for service discovery
>>
>> Maybe an upgrade issue ?
>>
>> Is there a way to generate or replace the certificate properly ?
>>
>> Regards
>>
>> Nicolas F.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread David C.
Is the cephadm http server service starting correctly (in the mgr logs)?

IPv6 ?


Cordialement,

*David CASIER*





Le mar. 23 janv. 2024 à 16:29, Nicolas FOURNIL 
a écrit :

> Hello,
>
> Thanks for advice but Prometheus cert is ok, (Self signed) and tested with
> curl and web navigator.
>
>  it seems to be the "Service discovery" certificate from cephadm who is
> missing but I cannot figure out how to set it.
>
> There's in the code a function to create this certificate inside the Key
> store but how ... that's the point :-(
>
> Regards.
>
>
>
> Le mar. 23 janv. 2024 à 15:52, David C.  a écrit :
>
>> Hello Nicolas,
>>
>> I don't know if it's an update issue.
>>
>> If this is not a problem for you, you can consider redeploying
>> grafana/prometheus.
>>
>> It is also possible to inject your own certificates :
>>
>> https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example
>>
>>
>> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
>>
>> 
>>
>> Cordialement,
>>
>> *David CASIER*
>> 
>>
>>
>>
>> Le mar. 23 janv. 2024 à 10:56, Nicolas FOURNIL 
>> a écrit :
>>
>>>  Hello,
>>>
>>> I've just fresh upgrade from Quincy to Reef and my graphs are now
>>> blank...
>>> after investigations, it seems that discovery service is not working
>>> because of no certificate :
>>>
>>> # ceph orch sd dump cert
>>> Error EINVAL: No certificate found for service discovery
>>>
>>> Maybe an upgrade issue ?
>>>
>>> Is there a way to generate or replace the certificate properly ?
>>>
>>> Regards
>>>
>>> Nicolas F.
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread Nicolas FOURNIL
IPv6 only : Yes, the -ms_bind_ipv6=true is already set-

I had tried a rotation of the keys for node-exporter and I get this :

2024-01-23T16:43:56.098796+ mgr.srv06-r2b-fl1.foxykh (mgr.342408) 87074
: cephadm [INF] Rotating authentication key for node-exporter.srv06-r2b-fl1
2024-01-23T16:43:56.099224+ mgr.srv06-r2b-fl1.foxykh (mgr.342408) 87075
: cephadm [ERR] unknown daemon type node-exporter
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1039, in _check_daemons
self.mgr._daemon_action(daemon_spec, action=action)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2203, in _daemon_action
return self._rotate_daemon_key(daemon_spec)
  File "/usr/share/ceph/mgr/cephadm/module.py", line 2147, in
_rotate_daemon_key
'entity': daemon_spec.entity_name(),
  File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line 108,
in entity_name
return get_auth_entity(self.daemon_type, self.daemon_id, host=self.host)
  File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line 47,
in get_auth_entity
raise OrchestratorError(f"unknown daemon type {daemon_type}")
orchestrator._interface.OrchestratorError: unknown daemon type node-exporter

Tried to remove & recreate service : it's the same ... how to stop the
rotation now :-/



Le mar. 23 janv. 2024 à 17:18, David C.  a écrit :

> Is the cephadm http server service starting correctly (in the mgr logs)?
>
> IPv6 ?
> 
>
> Cordialement,
>
> *David CASIER*
> 
>
>
>
>
> Le mar. 23 janv. 2024 à 16:29, Nicolas FOURNIL 
> a écrit :
>
>> Hello,
>>
>> Thanks for advice but Prometheus cert is ok, (Self signed) and tested
>> with curl and web navigator.
>>
>>  it seems to be the "Service discovery" certificate from cephadm who is
>> missing but I cannot figure out how to set it.
>>
>> There's in the code a function to create this certificate inside the Key
>> store but how ... that's the point :-(
>>
>> Regards.
>>
>>
>>
>> Le mar. 23 janv. 2024 à 15:52, David C.  a écrit :
>>
>>> Hello Nicolas,
>>>
>>> I don't know if it's an update issue.
>>>
>>> If this is not a problem for you, you can consider redeploying
>>> grafana/prometheus.
>>>
>>> It is also possible to inject your own certificates :
>>>
>>> https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example
>>>
>>>
>>> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
>>>
>>> 
>>>
>>> Cordialement,
>>>
>>> *David CASIER*
>>> 
>>>
>>>
>>>
>>> Le mar. 23 janv. 2024 à 10:56, Nicolas FOURNIL <
>>> nicolas.four...@gmail.com> a écrit :
>>>
  Hello,

 I've just fresh upgrade from Quincy to Reef and my graphs are now
 blank...
 after investigations, it seems that discovery service is not working
 because of no certificate :

 # ceph orch sd dump cert
 Error EINVAL: No certificate found for service discovery

 Maybe an upgrade issue ?

 Is there a way to generate or replace the certificate properly ?

 Regards

 Nicolas F.
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io

>>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm discovery service certificate absent after upgrade.

2024-01-23 Thread David C.
According to sources, the certificates are generated automatically at
startup. Hence my question if the service started correctly.

I also had problems with IPv6 only, but I don't immediately have more info


Cordialement,

*David CASIER*



Le mar. 23 janv. 2024 à 17:46, Nicolas FOURNIL 
a écrit :

> IPv6 only : Yes, the -ms_bind_ipv6=true is already set-
>
> I had tried a rotation of the keys for node-exporter and I get this :
>
> 2024-01-23T16:43:56.098796+ mgr.srv06-r2b-fl1.foxykh (mgr.342408)
> 87074 : cephadm [INF] Rotating authentication key for
> node-exporter.srv06-r2b-fl1
> 2024-01-23T16:43:56.099224+ mgr.srv06-r2b-fl1.foxykh (mgr.342408)
> 87075 : cephadm [ERR] unknown daemon type node-exporter
> Traceback (most recent call last):
>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1039, in _check_daemons
> self.mgr._daemon_action(daemon_spec, action=action)
>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2203, in
> _daemon_action
> return self._rotate_daemon_key(daemon_spec)
>   File "/usr/share/ceph/mgr/cephadm/module.py", line 2147, in
> _rotate_daemon_key
> 'entity': daemon_spec.entity_name(),
>   File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line 108,
> in entity_name
> return get_auth_entity(self.daemon_type, self.daemon_id,
> host=self.host)
>   File "/usr/share/ceph/mgr/cephadm/services/cephadmservice.py", line 47,
> in get_auth_entity
> raise OrchestratorError(f"unknown daemon type {daemon_type}")
> orchestrator._interface.OrchestratorError: unknown daemon type
> node-exporter
>
> Tried to remove & recreate service : it's the same ... how to stop the
> rotation now :-/
>
>
>
> Le mar. 23 janv. 2024 à 17:18, David C.  a écrit :
>
>> Is the cephadm http server service starting correctly (in the mgr logs)?
>>
>> IPv6 ?
>> 
>>
>> Cordialement,
>>
>> *David CASIER*
>> 
>>
>>
>>
>>
>> Le mar. 23 janv. 2024 à 16:29, Nicolas FOURNIL 
>> a écrit :
>>
>>> Hello,
>>>
>>> Thanks for advice but Prometheus cert is ok, (Self signed) and tested
>>> with curl and web navigator.
>>>
>>>  it seems to be the "Service discovery" certificate from cephadm who is
>>> missing but I cannot figure out how to set it.
>>>
>>> There's in the code a function to create this certificate inside the Key
>>> store but how ... that's the point :-(
>>>
>>> Regards.
>>>
>>>
>>>
>>> Le mar. 23 janv. 2024 à 15:52, David C.  a
>>> écrit :
>>>
 Hello Nicolas,

 I don't know if it's an update issue.

 If this is not a problem for you, you can consider redeploying
 grafana/prometheus.

 It is also possible to inject your own certificates :

 https://docs.ceph.com/en/latest/cephadm/services/monitoring/#example


 https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2

 

 Cordialement,

 *David CASIER*
 



 Le mar. 23 janv. 2024 à 10:56, Nicolas FOURNIL <
 nicolas.four...@gmail.com> a écrit :

>  Hello,
>
> I've just fresh upgrade from Quincy to Reef and my graphs are now
> blank...
> after investigations, it seems that discovery service is not working
> because of no certificate :
>
> # ceph orch sd dump cert
> Error EINVAL: No certificate found for service discovery
>
> Maybe an upgrade issue ?
>
> Is there a way to generate or replace the certificate properly ?
>
> Regards
>
> Nicolas F.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to locate "bluestore_compressed_allocated" & "bluestore_compressed_original" parameters while executing "ceph daemon osd.X perf dump" command.

2024-01-23 Thread Alam Mohammad
Hi Eugen,

We are planning to build a cluster with an erasure-coded (EC) pool to save some 
disk space. For that we have experimented with compression settings on the RBD 
pool using the following parameters:

On pool we have set below parameters:
Compression mode: Aggressive
Compression type: lz4
Compression ratio: 0.85


Additionally, I have configured global settings in the Ceph configuration file 
for BlueStore compression:

Bluestore_compression_mode aggressive
bluestore_compression_algorithm lz4
bluestore_compression_required_ratio 0.85
However, when executing the ceph tell command [ceph tell osd.0 perf dump | grep 
-E '(compress_.*_count|bluestore_compressed_)'], we are getting only below 
paramets:
  "compress_success_count": 4,
   "compress_rejected_count": 0,

not getting below parameters:
"bluestore_compressed_allocated": 12288,
"bluestore_compressed_original": 24576,


Is there a specific aspect of the configuration that I might be overlooking? If 
so, could you please provide guidance on how to properly configure compression 
settings to effectively save disk space in the Ceph cluster?


Any guidance or insight would be greatly appreciated.


Regards,
Mohammad Saif
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm orchestrator and special label _admin in 17.2.7

2024-01-23 Thread Albert Shih
Le 18/01/2024 à 11:58:47+0100, Robert Sander a écrit
Hi, 

> 
> According to the documentation¹ the special host label _admin instructs the
> cephadm orchestrator to place a valid ceph.conf and the
> ceph.client.admin.keyring into /etc/ceph of the host.
> 
> I noticed that (at least) on 17.2.7 only the keyring file is placed in
> /etc/ceph, but not ceph.conf.
> 
> Both files are placed into the /var/lib/ceph//config directory.
> 
> Has something changed?
> 

Just like to known if it's a very bad idea to do a rsync of /etc/ceph from
the «_admin» server to the other ceph cluster server.

I in fact add something like 

for host in `cat /usr/local/etc/ceph_list_noeuds.txt`
do
  /usr/bin/rsync -av /etc/ceph/ceph* $host:/etc/ceph/
done

in a cronjob

Regards. 

-- 
Albert SHIH 🦫 🐸
France
Heure locale/Local time:
mar. 23 janv. 2024 18:18:39 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to locate "bluestore_compressed_allocated" & "bluestore_compressed_original" parameters while executing "ceph daemon osd.X perf dump" command.

2024-01-23 Thread Eugen Block

Hi,

ceph.conf is not used anymore the way it was before cephadm. Just add  
the config to the config store (see my previous example) and it should  
be applied to all OSDs.


Regards
Eugen

Zitat von Alam Mohammad :


Hi Eugen,

We are planning to build a cluster with an erasure-coded (EC) pool  
to save some disk space. For that we have experimented with  
compression settings on the RBD pool using the following parameters:


On pool we have set below parameters:
Compression mode: Aggressive
Compression type: lz4
Compression ratio: 0.85


Additionally, I have configured global settings in the Ceph  
configuration file for BlueStore compression:


Bluestore_compression_mode aggressive
bluestore_compression_algorithm lz4
bluestore_compression_required_ratio 0.85
However, when executing the ceph tell command [ceph tell osd.0 perf  
dump | grep -E '(compress_.*_count|bluestore_compressed_)'], we are  
getting only below paramets:

  "compress_success_count": 4,
   "compress_rejected_count": 0,

not getting below parameters:
"bluestore_compressed_allocated": 12288,
"bluestore_compressed_original": 24576,


Is there a specific aspect of the configuration that I might be  
overlooking? If so, could you please provide guidance on how to  
properly configure compression settings to effectively save disk  
space in the Ceph cluster?



Any guidance or insight would be greatly appreciated.


Regards,
Mohammad Saif
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io