[ceph-users] Changing bucket owner in a multi-zonegroup Ceph cluster

2023-06-25 Thread Ramin Najjarbashi
Hi all

I have a Ceph cluster consisting of two zonegroups with metadata syncing
enabled. I need to change the owner of a bucket that is located in the
secondary zonegroup.

I followed the steps below:

Unlinked the bucket from the old user on the secondary zonegroup:
bash
Copy code
$ radosgw-admin bucket unlink --uid OLD_UID -b test-change-owner
Linked the bucket to the new user on the secondary zonegroup:
bash
Copy code
$ radosgw-admin bucket link --uid NEW_UID -b test-change-owner
Changed the owner of the bucket on the primary (master) zonegroup:
bash
Copy code
$ radosgw-admin bucket chown --uid NEW_UID -b test-change-owner
After executing the last command on the primary zonegroup, the bucket owner
was successfully changed. However, the ownership of the objects within the
bucket still remains with the old user.

When I executed the same radosgw-admin bucket chown command on the
secondary zonegroup, I received a warning about inconsistent metadata
between zones, but the bucket owner was changed successfully on the
secondary zonegroup.

My questions are:

What is the best way to change the owner of a bucket in a multi-zonegroup
cluster?
What are the potential impacts of running the chown command on the
secondary zonegroup? Is it possible to have inconsistent metadata between
zones in this case?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw hang under pressure

2023-06-25 Thread Szabo, Istvan (Agoda)
Hi,

Can you check the read and write latency of your osds?
Maybe it hangs because it’s waiting for pg’s but maybe the pg are under scrub 
or something else.
Also with many small objects don’t rely on pg autoscaler, it might not tell to 
increase pg but maybe it should be.

Istvan Szabo
Staff Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2023. Jun 23., at 19:12, Rok Jaklič  wrote:

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


We are experiencing something similar (slow GETs responses) when sending 1k
delete requests for example in ceph v16.2.13.

Rok

On Mon, Jun 12, 2023 at 7:16 PM grin  wrote:

Hello,

ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy
(stable)

There is a single (test) radosgw serving plenty of test traffic. When
under heavy req/s ("heavy" in a low sense, about 1k rq/s) it pretty
reliably hangs: low traffic threads seem to work (like handling occasional
PUTs) but GETs are completely nonresponsive, all attention seems to be
spent on futexes.

The effect is extremely similar to

https://ceph-users.ceph.narkive.com/I4uFVzH9/radosgw-civetweb-hangs-once-around-850-established-connections
(subject: Radosgw (civetweb) hangs once around)
except this is quincy so it's beast instead of civetweb. The effect is the
same as described there, except the cluster is way smaller (about 20-40
OSDs).

I observed that when I start radosgw -f with debug 20/20 it almost never
hangs, so my guess is some ugly race condition. However I am a bit clueless
how to actually debug it since debugging makes it go away. Debug 1
(default) with -d seems to hang after a while but it's not that simple to
induce, I'm still testing under 4/4.

Also I do not see much to configure about beast.

As to answer the question in the original (2016) thread:
- Debian stable
- no visible limits issue
- no obvious memory leak observed
- no other visible resource shortage
- strace says everyone's waiting on futexes, about 600-800 threads, apart
from the one serving occasional PUTs
- tcp port doesn't respond.

IRC didn't react. ;-)

Thanks,
Peter
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: alerts in dashboard

2023-06-25 Thread Ben
attached screenshot was filtered out. Here it is partially:
name Severity Group  Duration  Summary
CephadmDaemonFailed critical cephadm 30 seconds A ceph daemon manged by
cephadm is down
CephadmPaused warning cephadm 1 minute Orchestration tasks via cephadm are
PAUSED
CephadmUpgradeFailed critical cephadm 30 seconds Ceph version upgrade has
failed
CephDaemonCrash critical generic 1 minute One or more Ceph daemons have
crashed, and are pending acknowledgement
CephDeviceFailurePredicted waming osd 1 minute Device(s) predicted to fail
soon
CephDeviceFailurePrediction TooHigh critical osd 1 minute Too many devices
are predicted to fail, unable to resolve
CephDeviceFailureRelocationincomplete warning osd 1 minute Device failure
is predicted, but unable to relocate data
CephFilesystemDamaged critical mds 1 minute CephFS filesystem is damaged.
CephFilesystemDegraded critical mds 1 minute CephFS filesystem is degraded
CephFilesystemFailureNoStandby critical mds 1 minute MDS daemon failed, no
further standby available
Meanwhile the cluster status is green ok. What should we do for this?

Thanks,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] copy file in nfs over cephfs error "error: error in file IO (code 11)"

2023-06-25 Thread farhad kh
 hi everybody

we have problem with nfs gansha load balancer
whene use rsync -avre to copy file from another share to ceph nfs share
path we get this error
`rsync -rav /mnt/elasticsearch/newLogCluster/acr-202*
/archive/Elastic-v7-archive`

rsync : close failed on "/archive/Elastic-v7-archive/"  :
Input/output error (5)
rsync error: error in file IO (code 11) at receiver.c(586) [Receiver=3.1.3]"

we used ingress for load balancing nfs service and No other problems are
observed in the cluster.
Below is information about the pool, volume path and quota

amount10.20.32.161:/volumes/arch-1/arch   30T  5.0T   26T  17% /archive#
ceph osd pool get-quota arch-bigdata-data
   quotas for pool 'arch-bigdata-data':
   max objects: N/A
   max bytes  : 30 TiB  (current num bytes: 5488192308978 bytes)

---
# ceph fs subvolume info   arch-bigdata arch arch-1
{
"atime": "2023-06-11 13:32:22",
"bytes_pcent": "16.64",
"bytes_quota": 32985348833280,
"bytes_used": 5488566602388,
"created_at": "2023-06-11 13:32:22",
"ctime": "2023-06-25 10:45:35",
"data_pool": "arch-bigdata-data",
"features": [
"snapshot-clone",
"snapshot-autoprotect",
"snapshot-retention"
],
"gid": 0,
"mode": 16877,
"mon_addrs": [
"10.20.32.153:6789",
"10.20.32.155:6789",
"10.20.32.154:6789"
],
"mtime": "2023-06-25 10:38:48",
"path": "/volumes/arch-1/arch/f246a31b-7103-41b9-8005-63d00efe88e4",
"pool_namespace": "",
"state": "complete",
"type": "subvolume",
"uid": 0
}
.Has anyone ever experienced this error? What way do you suggest to solve
it?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-25 Thread Jorge JP
Hello,

After deep-scrub my cluster shown this error:

HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data 
damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 
2/77158878 objects degraded (0.000%), 1 pg degraded
[WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%)
pg 32.15c has 1 unfound objects
[ERR] OSD_SCRUB_ERRORS: 1 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting [49,47], 
1 unfound
[WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded 
(0.000%), 1 pg degraded
pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting [49,47], 
1 unfound


I searching in internet how it solves, but I'm confusing..

Anyone can help me?

Thank you! (Sorry for my english)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io