[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-06-22 Thread Eugen Block

Hi,

have you tried restarting the primary OSD (currently 343)? It looks  
like this PG is part of an EC pool, are there enough hosts available,  
assuming your failure-domain is host? I assume that ceph isn't able to  
recreate the shard on a different OSD. You could share your osd tree  
and the crush rule as well as the erasure profile so we could get a  
better picture.


Thanks,
Eugen

Zitat von siddhit.ren...@nxtgen.com:


Hello All,

Ceph version: 14.2.5-382-g8881d33957  
(8881d33957b54b101eae9c7627b351af10e87ee8) nautilus (stable)


Issue:
1 PG stucked in "active+undersized+degraded for long time
Degraded data redundancy: 44800/8717052637 objects degraded  
(0.001%), 1 pg degraded, 1 pg undersized


#ceph pg dump_stuck
PG_STAT STATE   UP
   UP_PRIMARY ACTING  
 ACTING_PRIMARY
15.28f0  active+undersized+degraded  
[2147483647,343,355,415,426,640,302,392,78,202,607]343  
[2147483647,343,355,415,426,640,302,392,78,202,607]343


PG Query:
#ceph pg 15.28f0 query

{
"state": "active+undersized+degraded",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 303362,
"up": [
2147483647,
343,
355,
415,
426,
640,
302,
392,
78,
202,
607
],
"acting": [
2147483647,
343,
355,
415,
426,
640,
302,
392,
78,
202,
607
],
"acting_recovery_backfill": [
"78(8)",
"202(9)",
"302(6)",
"343(1)",
"355(2)",
"392(7)",
"415(3)",
"426(4)",
"607(10)",
"640(5)"
],
"info": {
"pgid": "15.28f0s1",
"last_update": "303161'598853",
"last_complete": "303161'598853",
"log_tail": "261289'595825",
"last_user_version": 598853,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": [],
"history": {
"epoch_created": 19841,
"epoch_pool_created": 16141,
"last_epoch_started": 303017,
"last_interval_started": 303016,
"last_epoch_clean": 250583,
"last_interval_clean": 250582,
"last_epoch_split": 19841,
"last_epoch_marked_full": 0,
"same_up_since": 303016,
"same_interval_since": 303016,
"same_primary_since": 256311,
"last_scrub": "255277'537760",
"last_scrub_stamp": "2021-04-11 03:18:39.164439",
"last_deep_scrub": "255277'537756",
"last_deep_scrub_stamp": "2021-04-10 01:42:16.182528",
"last_clean_scrub_stamp": "2021-04-11 03:18:39.164439"
},
"stats": {
"version": "303161'598853",
"reported_seq": "3594551",
"reported_epoch": "303362",
"state": "active+undersized+degraded",
"last_fresh": "2023-06-20 19:03:59.135295",
"last_change": "2023-06-20 15:11:12.569114",
"last_active": "2023-06-20 19:03:59.135295",
"last_peered": "2023-06-20 19:03:59.135295",
"last_clean": "2021-04-11 15:21:44.271834",
"last_became_active": "2023-06-20 15:11:12.569114",
"last_became_peered": "2023-06-20 15:11:12.569114",
"last_unstale": "2023-06-20 19:03:59.135295",
"last_undegraded": "2023-06-20 15:11:10.430426",
"last_fullsized": "2023-06-20 15:11:10.430154",
"mapping_epoch": 303016,
"log_start": "261289'595825",
"ondisk_log_start": "261289'595825",
"created": 19841,
"last_epoch_clean": 250583,
"parent": "0.0",
"parent_split_bits": 14,
"last_scrub": "255277'537760",
"last_scrub_stamp": "2021-04-11 03:18:39.164439",
"last_deep_scrub": "255277'537756",
"last_deep_scrub_stamp": "2021-04-10 01:42:16.182528",
"last_clean_scrub_stamp": "2021-04-11 03:18:39.164439",
"log_size": 3028,
"ondisk_log_size": 3028,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 54989065178,
"num_objects": 44800,
"num_object_clones": 0,
"num_object_copies": 492800,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 44800,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,

[ceph-users] ceph quincy repo update to debian bookworm...?

2023-06-22 Thread Christian Peters

Hi ceph users/maintainers,

I installed ceph quincy on debian bullseye as a ceph client and now want 
to update to bookworm.

I see that there is at the moment only bullseye supported.

https://download.ceph.com/debian-quincy/dists/bullseye/

Will there be an update of

deb https://download.coeh.com/debian-quincy/ bullseye main

to

deb https://download.coeh.com/debian-quincy/ boowkworm main

in the near future!?

Regards,

Christian



OpenPGP_0xC20C05037880471C.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Pacific bluefs enospc bug with newly created OSDs

2023-06-22 Thread Igor Fedotov
Quincy brings support for 4K allocation unit but doesn't start using it 
immediately. Instead it falls back to 4K when bluefs is unable to 
allocate more space with the default size. And even this mode isn't 
permanent, bluefs attempts to bring larger units back from time to time.



Thanks,

Igor

On 22/06/2023 00:04, Fox, Kevin M wrote:

Does quincy automatically switch existing things to 4k or do you need to do a 
new ost to get the 4k size?

Thanks,
Kevin


From: Igor Fedotov 
Sent: Wednesday, June 21, 2023 5:56 AM
To: Carsten Grommel; ceph-users@ceph.io
Subject: [ceph-users] Re: Ceph Pacific bluefs enospc bug with newly created OSDs

Check twice before you click! This email originated from outside PNNL.


Hi Carsten,

please also note a workaround to bring the osds back for e.g. data
recovery - set bluefs_shared_alloc_size to 32768.

This will hopefully allow OSD to startup and pull data out of it. But I
wouldn't discourage you from using such OSDs long term as fragmentation
might evolve and this workaround will become ineffective as well.

Please do not apply this change to healthy OSDs as it's irreversible.


BTW, having two namespace at NVMe drive is a good alternative to Logical
Volumes if for some reasons one needs two "physical" disks for OSD setup...

Thanks,

Igor

On 21/06/2023 11:41, Carsten Grommel wrote:

Hi Igor,

thank you for your ansere!


first of all Quincy does have a fix for the issue, see
https://tracker.ceph.com/issues/53466 (and its Quincy counterpart
https://tracker.ceph.com/issues/58588)

Thank you I somehow missed that release, good to know!


SSD or HDD? Standalone or shared DB volume? I presume the latter... What
is disk size and current utilization?

Please share ceph-bluestore-tool's bluefs-bdev-sizes command output if
possible

We use 4 TB NVMe SSDs, shared db yes and mainly Micron with some Dell
and Samsung in this cluster:

Micron_7400_MTFDKCB3T8TDZ_214733D291B1 cloud5-1561:nvme5n1  osd.5

All Disks are at ~ 88% utilization. I noticed that around 92% our
disks tend to run into this bug.

Here are some bluefs-bdev-sizes from different OSDs on different hosts
in this cluster:

ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-36/

inferring bluefs devices from bluestore path

1 : device size 0x37e3ec0 : using 0x2e1b390(2.9 TiB)

ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-24/

inferring bluefs devices from bluestore path

1 : device size 0x37e3ec0 : using 0x2d4e318d000(2.8 TiB)

ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-5/

inferring bluefs devices from bluestore path

1 : device size 0x37e3ec0 : using 0x2f2da93d000(2.9 TiB)


Generally, given my assumption that DB volume is currently collocated
and you still want to stay on Pacific, you might want to consider
redeploying OSDs with a standalone DB volume setup.

Just create large enough (2x of the current DB size seems to be pretty
conservative estimation for that volume's size) additional LV on top of
the same physical disk. And put DB there...

Separating DB from main disk would result in much less fragmentation at
DB volume and hence work around the problem. The cost would be having
some extra spare space at DB volume unavailable for user data .

I guess that makes, so the suggestion would be to deploy the osd and
db on the same NVMe

but with different logical volumes or updating to quincy.

Thank you!

Carsten

*Von: *Igor Fedotov 
*Datum: *Dienstag, 20. Juni 2023 um 12:48
*An: *Carsten Grommel , ceph-users@ceph.io

*Betreff: *Re: [ceph-users] Ceph Pacific bluefs enospc bug with newly
created OSDs

Hi Carsten,

first of all Quincy does have a fix for the issue, see
https://tracker.ceph.com/issues/53466 (and its Quincy counterpart
https://tracker.ceph.com/issues/58588)

Could you please share a bit more info on OSD disk layout?

SSD or HDD? Standalone or shared DB volume? I presume the latter... What
is disk size and current utilization?

Please share ceph-bluestore-tool's bluefs-bdev-sizes command output if
possible


Generally, given my assumption that DB volume is currently collocated
and you still want to stay on Pacific, you might want to consider
redeploying OSDs with a standalone DB volume setup.

Just create large enough (2x of the current DB size seems to be pretty
conservative estimation for that volume's size) additional LV on top of
the same physical disk. And put DB there...

Separating DB from main disk would result in much less fragmentation at
DB volume and hence work around the problem. The cost would be having
some extra spare space at DB volume unavailable for user data .


Hope this helps,

Igor


On 20/06/2023 10:29, Carsten Grommel wrote:

Hi all,

we are experiencing the “bluefs enospc bug” again after redeploying

all OSDs of our Pacific Cluster.

I know that our cluster is a bit too utilized at the moment with

87.26 % raw usage but still this should not happen afaik.

W

[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-06-22 Thread Damian

Hi Siddhit

You need more OSD's. Please read:

https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#erasure-coded-pgs-are-not-active-clean

Greetings

Damian

On 2023-06-20 15:53, siddhit.ren...@nxtgen.com wrote:

Hello All,

Ceph version: 14.2.5-382-g8881d33957 
(8881d33957b54b101eae9c7627b351af10e87ee8) nautilus (stable)


Issue:
1 PG stucked in "active+undersized+degraded for long time
Degraded data redundancy: 44800/8717052637 objects degraded (0.001%), 1 
pg degraded, 1 pg undersized


#ceph pg dump_stuck
PG_STAT STATE   UP  
UP_PRIMARY ACTING   
   ACTING_PRIMARY
15.28f0  active+undersized+degraded 
[2147483647,343,355,415,426,640,302,392,78,202,607]343 
[2147483647,343,355,415,426,640,302,392,78,202,607]343


PG Query:
#ceph pg 15.28f0 query

{
"state": "active+undersized+degraded",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 303362,
"up": [
2147483647,
343,
355,
415,
426,
640,
302,
392,
78,
202,
607
],
"acting": [
2147483647,
343,
355,
415,
426,
640,
302,
392,
78,
202,
607
],
"acting_recovery_backfill": [
"78(8)",
"202(9)",
"302(6)",
"343(1)",
"355(2)",
"392(7)",
"415(3)",
"426(4)",
"607(10)",
"640(5)"
],
"info": {
"pgid": "15.28f0s1",
"last_update": "303161'598853",
"last_complete": "303161'598853",
"log_tail": "261289'595825",
"last_user_version": 598853,
"last_backfill": "MAX",
"last_backfill_bitwise": 1,
"purged_snaps": [],
"history": {
"epoch_created": 19841,
"epoch_pool_created": 16141,
"last_epoch_started": 303017,
"last_interval_started": 303016,
"last_epoch_clean": 250583,
"last_interval_clean": 250582,
"last_epoch_split": 19841,
"last_epoch_marked_full": 0,
"same_up_since": 303016,
"same_interval_since": 303016,
"same_primary_since": 256311,
"last_scrub": "255277'537760",
"last_scrub_stamp": "2021-04-11 03:18:39.164439",
"last_deep_scrub": "255277'537756",
"last_deep_scrub_stamp": "2021-04-10 01:42:16.182528",
"last_clean_scrub_stamp": "2021-04-11 03:18:39.164439"
},
"stats": {
"version": "303161'598853",
"reported_seq": "3594551",
"reported_epoch": "303362",
"state": "active+undersized+degraded",
"last_fresh": "2023-06-20 19:03:59.135295",
"last_change": "2023-06-20 15:11:12.569114",
"last_active": "2023-06-20 19:03:59.135295",
"last_peered": "2023-06-20 19:03:59.135295",
"last_clean": "2021-04-11 15:21:44.271834",
"last_became_active": "2023-06-20 15:11:12.569114",
"last_became_peered": "2023-06-20 15:11:12.569114",
"last_unstale": "2023-06-20 19:03:59.135295",
"last_undegraded": "2023-06-20 15:11:10.430426",
"last_fullsized": "2023-06-20 15:11:10.430154",
"mapping_epoch": 303016,
"log_start": "261289'595825",
"ondisk_log_start": "261289'595825",
"created": 19841,
"last_epoch_clean": 250583,
"parent": "0.0",
"parent_split_bits": 14,
"last_scrub": "255277'537760",
"last_scrub_stamp": "2021-04-11 03:18:39.164439",
"last_deep_scrub": "255277'537756",
"last_deep_scrub_stamp": "2021-04-10 01:42:16.182528",
"last_clean_scrub_stamp": "2021-04-11 03:18:39.164439",
"log_size": 3028,
"ondisk_log_size": 3028,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 54989065178,
"num_objects": 44800,
"num_object_clones": 0,
"num_object_copies": 492800,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 44800,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 44800,
"num_whiteouts": 0,
"num_read": 201078,
"num_read_kb": 30408632,
"num_write": 219335,
  

[ceph-users] CephFS snapshots: impact of moving data

2023-06-22 Thread Kuhring, Mathias
Dear Ceph community,

We want to restructure (i.e. move around) a lot of data (hundreds of 
terabyte) in our CephFS.
And now I was wondering what happens within snapshots when I move data 
around within a snapshotted folder.
I.e. do I need to account for a lot increased storage usage due to older 
snapshots differing from the new restructured state?
In the end it is just metadata changes. Are the snapshots aware of this?

Consider the following examples.

Copying data:
Let's say I have a folder /test, with a file XYZ in sub-folder 
/test/sub1 and an empty sub-folder /test/sub2.
I create snapshot snapA in /test/.snap, copy XYZ to sub-folder 
/test/sub2, delete it from /test/sub1 and create another snapshot snapB.
I would have two snapshots each with distinct copies of XYZ, hence using 
double the space in the FS:
/test/.snap/snapA/sub1/XYZ <-- copy 1
/test/.snap/snapA/sub2/
/test/.snap/snapB/sub1/
/test/.snap/snapB/sub2/XYZ <-- copy 2

Moving data:
Let's assume the same structure.
But now after creating snapshot snapA, I move XYZ to sub-folder 
/test/sub2 and then create the other snapshot snapB.
The directory tree will look the same. But how is this treated internally?
Once I move the data, will there be an actually copy created in snapA to 
represent the old state?
Or will this remain the same data (like a link to the inode or so)?
And hence not double the storage used for that file.

I couldn't find (or understand) anything related to this in the docs.
The closest seems to be the hard-link section here:
https://docs.ceph.com/en/quincy/dev/cephfs-snapshots/#hard-links
Which unfortunately goes a bit over my head.
So I'm not sure if this answers my question.

Thank you all for your help. Appreciate it.

Best Wishes,
Mathias Kuhring

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How does a "ceph orch restart SERVICE" affect availability?

2023-06-22 Thread Mikael Öhman
Thank you Eugen!
After finding what the target name actually was it all worked like a charm.

Best regards, Mikael


On Wed, Jun 21, 2023 at 11:05 AM Eugen Block  wrote:

> Hi,
>
> > Will that try to be smart and just restart a few at a time to keep things
> > up and available. Or will it just trigger a restart everywhere
> > simultaneously.
>
> basically, that's what happens for example during an upgrade if
> services are restarted. It's designed to be a rolling upgrade
> procedure so restarting all daemons of a specific service at the same
> time would cause an interruption. So the daemons are scheduled to
> restart and the mgr decides when it's safe to restart the next (this
> is a test cluster started on Nautilus, but it's on Quincy now):
>
> nautilus:~ # ceph orch restart osd.osd-hdd-ssd
> Scheduled to restart osd.5 on host 'nautilus'
> Scheduled to restart osd.0 on host 'nautilus'
> Scheduled to restart osd.2 on host 'nautilus'
> Scheduled to restart osd.1 on host 'nautilus2'
> Scheduled to restart osd.4 on host 'nautilus2'
> Scheduled to restart osd.7 on host 'nautilus2'
> Scheduled to restart osd.3 on host 'nautilus3'
> Scheduled to restart osd.8 on host 'nautilus3'
> Scheduled to restart osd.6 on host 'nautilus3'
>
> When it comes to OSDs it's possible (or even likely) that multiple
> OSDs are restarted at the same time, depending on the pools (and their
> replication size) they are part of. But ceph tries to avoid "inactive
> PGs" which is critical, of course. An edge case would be a pool with
> size 1 where restarting an OSD would cause an inactive PG until the
> OSD is up again. But since size 1 would be a bad idea anyway (except
> for testing purposes) you'd have to live with that.
> If you have the option I'd recommend to create a test cluster and play
> around with these things to get a better understanding, especially
> when it comes to upgrade tests etc.
>
> > I guess in my current scenario, restarting one host at the time makes
> most
> > sense, with a
> > systemctl restart ceph-{fsid}.target
> > and then checking that "ceph -s" says OK before proceeding to the next
>
> Yes, if your crush-failure-domain is host that should be safe, too.
>
> Regards,
> Eugen
>
> Zitat von Mikael Öhman :
>
> > The documentation very briefly explains a few core commands for
> restarting
> > things;
> >
> https://docs.ceph.com/en/quincy/cephadm/operations/#starting-and-stopping-daemons
> > but I feel I'm lacking quite some details of what is safe to do.
> >
> > I have a system in production, clusters connected via CephFS and some
> > shared block devices.
> > We would like to restart some things due to some new network
> > configurations. Going daemon by daemon would take forever, so I'm curious
> > as to what happens if one tries the command;
> >
> > ceph orch restart osd
> >
> > Will that try to be smart and just restart a few at a time to keep things
> > up and available. Or will it just trigger a restart everywhere
> > simultaneously.
> >
> > I guess in my current scenario, restarting one host at the time makes
> most
> > sense, with a
> > systemctl restart ceph-{fsid}.target
> > and then checking that "ceph -s" says OK before proceeding to the next
> > host, but I'm still curious as to what the "ceph orch restart xxx"
> command
> > would do (but not enough to try it out in production)
> >
> > Best regards, Mikael
> > Chalmers University of Technology
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Grafana service fails to start due to bad directory name after Quincy upgrade

2023-06-22 Thread Adiga, Anantha
Hi Eugen,

Thank you so much for the details.  Here is the update (comments in-line >>):

Regards,
Anantha
-Original Message-
From: Eugen Block  
Sent: Monday, June 19, 2023 5:27 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Grafana service fails to start due to bad directory 
name after Quincy upgrade

Hi,

so grafana is starting successfully now? What did you change?  
>>  I stopped and removed the Grafana image and  started it from "Ceph 
>> Dashboard" service. The version is still 6.7.4. I also had to change the 
>> following. 
I do not have a way to make  this permanent, if the service is redeployed I  
will lose  the changes. 
I did not save the file that cephadm generated. This was one reason why  
Grafana service would not start. I had replace it with the one below to resolve 
this issue. 
[users]
  default_theme = light
[auth.anonymous]
  enabled = true
  org_name = 'Main Org.'
  org_role = 'Viewer'
[server]
  domain = 'bootstrap.storage.lab'
  protocol = https
  cert_file = /etc/grafana/certs/cert_file
  cert_key = /etc/grafana/certs/cert_key
  http_port = 3000
  http_addr =
[snapshots]
  external_enabled = false
[security]
  disable_initial_admin_creation = false
  cookie_secure = true
  cookie_samesite = none
  allow_embedding = true
  admin_password = paswd-value
  admin_user = user-name

Also this was the other change: 
# This file is generated by cephadm.
apiVersion: 1   <--  This was the line added to 
var/lib/ceph/d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e/grafana.fl31ca104ja0201/etc/grafana/provisioning/datasources/ceph-dashboard.yml
>>
Regarding the container images, yes there are defaults in cephadm which can be 
overridden with ceph config. Can you share this output?

ceph config dump | grep container_image
>>
Here it is
root@fl31ca104ja0201:/# ceph config dump | grep container_image
global   basic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
mgr  advanced  
mgr/cephadm/container_image_alertmanager   docker.io/prom/alertmanager:v0.16.2  
  *
mgr  advanced  
mgr/cephadm/container_image_base   quay.io/ceph/daemon
mgr  advanced  
mgr/cephadm/container_image_grafanadocker.io/grafana/grafana:6.7.4  
  *
mgr  advanced  
mgr/cephadm/container_image_node_exporter  docker.io/prom/node-exporter:v0.17.0 
  *
mgr  advanced  
mgr/cephadm/container_image_prometheus docker.io/prom/prometheus:v2.7.2 
  *
client.rgw.default.default.fl31ca104ja0201.ninovsbasic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
client.rgw.default.default.fl31ca104ja0202.yhjkmbbasic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
client.rgw.default.default.fl31ca104ja0203.fqnriqbasic 
container_image
quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e
  *
>>
I tend to always use a specific image as described here [2]. I also haven't 
deployed grafana via dashboard yet so I can't really comment on that as well as 
on the warnings you report.


>>OK. The need for that is, in Quincy when you enable Loki and Promtail, to 
>>view the daemon logs Ceph board pulls in Grafana  dashboard. I will let you 
>>know once that issue is resolved.

Regards,
Eugen

[2]
https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-images
>> Thank you I am following the document now

Zitat von "Adiga, Anantha" :

> Hi Eugene,
>
> Thank you for your response, here is the update.
>
> The upgrade to Quincy was done  following the cephadm orch upgrade 
> procedure ceph orch upgrade start --image quay.io/ceph/ceph:v17.2.6
>
> Upgrade completed with out errors. After the upgrade, upon creating 
> the Grafana service from Ceph dashboard, it deployed Grafana 6.7.4.
> The version is hardcoded in the code, should it not be 8.3.5 as listed 
> below in Quincy documentation? See below
>
> [Grafana service started from Cephdashboard]
>
> Quincy documentation states: 
> https://docs.ceph.com/en/latest/releases/quincy/
> ……documentation snippet
> Monitoring and alerting:
> 43 new alerts have been added (totalling 68) improving observability 
> of events affecting: cluster health, monitors, storage devices, PGs 
> and CephFS.
> Alert

[ceph-users] ceph orch host label rm : does not update label removal

2023-06-22 Thread Adiga, Anantha
Hi ,

Not sure if the lables are really removed or the update is not working?



root@fl31ca104ja0201:/# ceph orch host ls
HOST ADDR   LABELS  
  STATUS
fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons osds rgws
fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
fl31ca104ja0302  XX.XX.XXX.5_admin mgrs,ceph osd,rgws.ceph
4 hosts in cluster
root@fl31ca104ja0201:/#
root@fl31ca104ja0201:/#
root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph
Removed label rgws.ceph from host fl31ca104ja0302
root@fl31ca104ja0201:/# ceph orch host ls
HOST ADDR   LABELS  
  STATUS
fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons osds rgws
fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
fl31ca104ja0302  XX.XX.XXX.5_admin mgrs,ceph osd,rgws.ceph
4 hosts in cluster
root@fl31ca104ja0201:/# ceph orch host label rm fl31ca104ja0302 rgws.ceph 
--force
Removed label rgws.ceph from host fl31ca104ja0302
root@fl31ca104ja0201:/# ceph orch host ls
HOST ADDR   LABELS  
  STATUS
fl31ca104ja0201  XX.XX.XXX.139  ceph clients mdss mgrs monitoring mons osds rgws
fl31ca104ja0202  XX.XX.XXX.140  ceph clients mdss mgrs mons osds rgws
fl31ca104ja0203  XX.XX.XXX.141  ceph clients mdss mgrs mons osds rgws
fl31ca104ja0302  XX.XX.XXX.5_admin mgrs,ceph osd,rgws.ceph
4 hosts in cluster

Regards,
Anantha
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Removing the encryption: (essentially decrypt) encrypted RGW objects

2023-06-22 Thread Casey Bodley
hi Jayanth,

i don't know that we have a supported way to do this. the
s3-compatible method would be to copy the object onto itself without
requesting server-side encryption. however, this wouldn't prevent
default encryption if rgw_crypt_default_encryption_key was still
enabled. furthermore, rgw has not implemented support for copying
encrypted objects, so this would fail for other forms of server-side
encryption too. this has been tracked in
https://tracker.ceph.com/issues/23264

On Sat, Jun 17, 2023 at 12:13 PM Jayanth Reddy
 wrote:
>
> Hello Users,
> We've a big cluster (Quincy) with almost 1.7 billion RGW objects, and we've
> enabled SSE on as per
> https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only
> (yes, we've chosen this insecure method to store the key)
> We're now in the process of implementing RGW multisite, but stuck due to
> https://tracker.ceph.com/issues/46062 and list at
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/PQW66JJ5DCRTH5XFGTRESF3XXTOSIWFF/#43RHLUVFYNSDLZPXXPZSSXEDX34KWGJX
>
> Was wondering if there is a way to decrypt the objects in-place with the
> applied symmetric key. I tried to remove
> the rgw_crypt_default_encryption_key from the mon configuration database
> (on a test cluster), but as expected, RGW daemons throw 500 server errors
> as it can not work on encrypted objects.
>
> There is a PR being worked on about introducing the command option at
> https://github.com/ceph/ceph/pull/51842 but it appears it takes some time
> to be merged.
>
> Cheers,
> Jayanth Reddy
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] changing crush map on the fly?

2023-06-22 Thread Angelo Höngens
Hey,

Just to confirm my understanding: If I set up a 3-osd cluster really
fast with an EC42 pool, and I set the crush map to osd failover
domain, the data will be distributed among the osd's, and of course
there won't be protection against host failure. And yes, I know that's
a bad idea, but I need the extra storage really fast, and it's a
backup of other data. So availability is important, but now critical.

If I then add 5 more hosts a week later, I can just edit the crush map
and change the failover domain from osd to host, put the crush map
back in, and ceph should automatically distribute all the pg's over
the osd's again to be fully host-fault tolerant, right?

Am I understanding this correctly?

Angelo.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph iSCSI GW is too slow when compared with Raw RBD performance

2023-06-22 Thread Work Ceph
Hello guys,

We have a Ceph cluster that runs just fine with Ceph Octopus; we use RBD
for some workloads, RadosGW (via S3) for others, and iSCSI for some Windows
clients.

We started noticing some unexpected performance issues with iSCSI. I mean,
an SSD pool is reaching 100MB of write speed for an image, when it can
reach up to 600MB+ of write speed for the same image when mounted and
consumed directly via RBD.

Is that performance degradation expected? We would expect some degradation,
but not as much as this one.

Also, we have a question regarding the use of Intel Turbo boost. Should we
disable it? Is it possible that the root cause of the slowness in the iSCSI
GW is caused by the use of Intel Turbo boost feature, which reduces the
clock of some cores?

Any feedback is much appreciated.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io