[ceph-users] MDS stuck in rejoin

2023-07-20 Thread Frank Schilder
Hi all,

we had a client with the warning "[WRN] MDS_CLIENT_OLDEST_TID: 1 clients 
failing to advance oldest client/flush tid". I looked at the client and there 
was nothing going on, so I rebooted it. After the client was back, the message 
was still there. To clean this up I failed the MDS. Unfortunately, the MDS that 
took over is remained stuck in rejoin without doing anything. All that happened 
in the log was:

[root@ceph-10 ceph]# tail -f ceph-mds.ceph-10.log
2023-07-20T15:54:29.147+0200 7fedb9c9f700  1 mds.2.896604 rejoin_start
2023-07-20T15:54:29.161+0200 7fedb9c9f700  1 mds.2.896604 rejoin_joint_start
2023-07-20T15:55:28.005+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896614 from mon.4
2023-07-20T15:56:00.278+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896615 from mon.4
[...]
2023-07-20T16:02:54.935+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896653 from mon.4
2023-07-20T16:03:07.276+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896654 from mon.4

After some time I decided to give another fail a try and, this time, the 
replacement daemon went to active state really fast.

If I have a message like the above, what is the clean way of getting the client 
clean again (version: 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
octopus (stable))?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr ssh connections left open

2023-07-20 Thread John Mulligan
On Tuesday, July 18, 2023 10:56:12 AM EDT Wyll Ingersoll wrote:
> Every night at midnight, our ceph-mgr daemons open up ssh connections to the
> other nodes and then leaves them open. Eventually they become zombies. I
> cannot figure out what module is causing this or how to turn it off.  If
> left unchecked over days/weeks, the zombie ssh connections just keep
> growing, the only way to clear them is to restart ceph-mgr services.
> 
> Any idea what is causing this or how it can be disabled?
> 
> Example:
> 
> 
> ceph 1350387 1350373  7 Jul17 ?01:19:39 /usr/bin/ceph-mgr -n
> mgr.mon03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix
> 
> ceph 1350548 1350387  0 Jul17 ?00:00:01 ssh -C -F
> /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o
> ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.11 sudo python
> 
> [...snip...]

Is this cluster on pacific?  The module in question is likely to be `cephadm` 
but the cephadm ssh backend has been changed and the team assumes problems 
like this no longer occur.

Hope that helps!

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr ssh connections left open

2023-07-20 Thread Wyll Ingersoll


Yes, it is ceph pacific 16.2.11.

Is this a known issue that is fixed in a more recent pacific update?  We're not 
ready to move to quincy yet.

thanks,
   Wyllys


From: John Mulligan 
Sent: Thursday, July 20, 2023 10:30 AM
To: ceph-users@ceph.io 
Cc: Wyll Ingersoll 
Subject: Re: [ceph-users] ceph-mgr ssh connections left open

On Tuesday, July 18, 2023 10:56:12 AM EDT Wyll Ingersoll wrote:
> Every night at midnight, our ceph-mgr daemons open up ssh connections to the
> other nodes and then leaves them open. Eventually they become zombies. I
> cannot figure out what module is causing this or how to turn it off.  If
> left unchecked over days/weeks, the zombie ssh connections just keep
> growing, the only way to clear them is to restart ceph-mgr services.
>
> Any idea what is causing this or how it can be disabled?
>
> Example:
>
>
> ceph 1350387 1350373  7 Jul17 ?01:19:39 /usr/bin/ceph-mgr -n
> mgr.mon03 -f --setuser ceph --setgroup ceph --default-log-to-file=false
> --default-log-to-stderr=true --default-log-stderr-prefix
>
> ceph 1350548 1350387  0 Jul17 ?00:00:01 ssh -C -F
> /tmp/cephadm-conf-d0khggdz -i /tmp/cephadm-identity-onf2msju -o
> ServerAliveInterval=7 -o ServerAliveCountMax=3 xxx@10.4.1.11 sudo python
>
> [...snip...]

Is this cluster on pacific?  The module in question is likely to be `cephadm`
but the cephadm ssh backend has been changed and the team assumes problems
like this no longer occur.

Hope that helps!


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mgr ssh connections left open

2023-07-20 Thread John Mulligan
On Thursday, July 20, 2023 10:36:02 AM EDT Wyll Ingersoll wrote:
> Yes, it is ceph pacific 16.2.11.
> 
> Is this a known issue that is fixed in a more recent pacific update?  We're
> not ready to move to quincy yet.
> 
> thanks,
>Wyllys
> 


To the best of my knowledge there's no fix in pacific, I'm sorry to say. It was 
resolved by using a completely different library to make the ssh connections.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Workload that delete 100 M object daily via lifecycle

2023-07-20 Thread Paul JURCO
Enabling debug lc will execute more often the LC, but, please mind that
might not respect expiration time set. By design it will consider a day the
time set in interval.
So, if will run more often, you will end up removing objects sooner than
365 days (as an example) if set to do so.

Please test using:  rgw_lifecycle_work_time 00:00-23:59
to run all day, and

 rgw_lc_debug_interval 86400
Meaning it will run every 4h.
Paul


On Wed, Jul 19, 2023 at 5:04 AM Anthony D'Atri 
wrote:

> Indeed that's very useful.  I improved the documentation for that not long
> ago, took a while to sort out exactly what it was about.
>
> Normally LC only runs once a day as I understand it, there's a debug
> option that compresses time so that it'll run more frequently, as having to
> wait for a day to see the effect of changes harks back to the uucp days ;)
>
> > On Jul 18, 2023, at 21:37, Hoan Nguyen Van  wrote:
> >
> > You can enable debug lc to test and tuning rgw lc parameter.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] what is the point of listing "auth: unable to find a keyring on /etc/ceph/ceph.client nfs-ganesha

2023-07-20 Thread Marc


I need some help understanding this. I have configured nfs-ganesha for cephfs 
using something like this in ganesha.conf

FSAL { Name = CEPH; User_Id = "testing.nfs"; Secret_Access_Key = 
"AAA=="; }

But I contstantly have these messages in de ganesha logs, 6x per user_id

auth: unable to find a keyring on /etc/ceph/ceph.client.testing

I thought this was a ganesha authentication order issue, but they[1] say it has 
to do with ceph. I am still on Nautilus so maybe this has been fixed in newer 
releases. I still have a hard time understanding why this is an issue of ceph 
(libraries).


[1]
https://github.com/nfs-ganesha/nfs-ganesha/issues/974

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: index object in shard begins with hex 80

2023-07-20 Thread Christopher Durham
 Ok,
I fthink I igured this out. First, as I think I wrote earlier, these objects in 
the ugly namespace begin with "<80>0_", and as such are  a "bucket log 
index" file  according to the bucket_index_prefixes[] in cls_rgw.cc.
These objects were multiplying, and caused the 'Large omap object' warnings. 
Our users were creating *alot* of small objects.
We have a multi-site environment, with replication between the two sites for 
all buckets.
Recently, we had some inadvertent downtime on the slave zone side. Checking the 
bucket in question, the large omap warning ONLY showed up on the slave side. 
Turns out the bucket in question has expiration set of all objects after a few 
days. Since the date of the downtime, NO objects have been deleted on the slave 
side! Deleting the  'extra' objects on the slave side by hand, amd then running 
'bucket sync init' on the bucket on both sides seems to have resolved the 
situation. But this may be a bug in data sync when the slave side is not 
available for a time.

-Chris

On Tuesday, July 18, 2023 at 12:14:18 PM MDT, Dan van der Ster 
 wrote:  
 
 Hi Chris,
Those objects are in the so called "ugly namespace" of the rgw, used to prefix 
special bucket index entries.

// No UTF-8 character can begin with 0x80, so this is a safe indicator
// of a special bucket-index entry for the first byte. Note: although
// it has no impact, the 2nd, 3rd, or 4th byte of a UTF-8 character
// may be 0x80.
#define BI_PREFIX_CHAR 0x80

You can use --omap-key-file and some sed magic to interact with those keys, 
e.g. like this example from my archives [1].(In my example I needed to remove 
orphaned olh entries -- in your case you can generate uglykeys.txt in whichever 
way is meaningful for your situation.)

BTW, to be clear, I'm not suggesting you blindly delete those keys. You would 
need to confirm that they are not needed by a current bucket instance before 
deleting, lest some index get corrupted.

Cheers, Dan__
Clyso GmbH | Ceph Support and Consulting | https://www.clyso.com
[1] 
# radosgw-admin bi list --bucket=xxx --shard-id=0 >
xxx.bilist.0
# cat xxx.bilist.0 | jq -r '.[]|select(.type=="olh" and .entry.key.name=="") | 
.idx' > uglykeys.txt
# head -n2 uglykeys.txt
�1001_00/2a/002a985cc73a01ce738da460b990e9b2fa849eb4411efb0a4598876c2859d444/2018_12_11/2893439/3390300/metadata.gz
�1001_02/5f/025f8e0fc8234530d6ae7302adf682509f0f7fb68666391122e16d00bd7107e3/2018_11_14/2625203/3034777/metadata.gz

# cat do_remove.sh

# usage: "bash do_remove.sh | sh -x"
while read f;
do
    echo -n $f | sed 's/^.1001_/echo -n -e x801001_/'; echo ' > mykey && 
rados rmomapkey -p default.rgw.buckets.index 
.dir.zone.bucketid.xx.indexshardnumber --omap-key-file mykey';
done < uglykeys.txt




On Tue, Jul 18, 2023 at 9:27 AM Christopher Durham  wrote:

Hi,
I am using ceph 17.2.6 on rocky linux 8.
I got a large omap object warning today.
Ok, So I tracked it down to a shard for a bucket in the index pool of an s3 
pool.

However, when lisitng the omapkeys with:
# rados -p pool.index listomapkeys .dir.zone.bucketid.xx.indexshardnumber
it is clear that the problem is caused by many omapkeys with the following name 
format:

<80>0_4771163.3444695458.6
A hex dump of the output of the listomapkeys command above indicates that the 
first 'character' is indeed hex 80, but as there is no equivalent ascii for hex 
80, I am not sure how to 'get at' those keys to see the values, delete them, 
etc. The index keys not of the format above appear to be fine, indicating s3 
object names as expected.

The rest of the index shards for the bucket are reasonable and have less than  
osd_deep_scrub_large_omap_object_key_threshold index objects , and the overall 
total of objects in the bucket is way less than 
osd_deep_scrub_large_omap_object_key_threshold*num_shards. 

These weird objects seem to be created occasionally.? Yes, the 
bucket is used heavily.

Any advice here?
-Chris




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

  
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Quincy 17.2.6 - Rados gateway crash -

2023-07-20 Thread xadhoom76
Hi, we have service that is still crashing when S3 client (veeam backup) start 
to write data
main log from rgw service
 req 13170422438428971730 0.00886s s3:get_obj WARNING: couldn't find acl 
header for object, generating
 default
2023-07-20T14:36:45.331+ 7fa5adb4c700 -1 *** Caught signal (Aborted) **

And


"
2023-07-19T22:04:15.968+ 7ff07305b700  1 beast: 0x7fefc7178710: 
172.16.199.11 - veeam90 [19/Jul/2023:22:04:15.948 +] "PUT 
/veeam90/Veeam/Backu
p/veeam90/Clients/%7Bd14cd688-57b4-4809-a1d9-14cafd191b11%7D/34387bbd-bec9-4a40-a04d-6a890d5d6407/CloudStg/Data/%7Bf687ee0f-fb50-4ded-b3a8-3f67ca7f244
b%7D/%7B6f31c277-734c-46fd-98d5-c560aa6dc776%7D/144113_f3fd31c9ee2a45aeeadda0de3cbc9064_
 HTTP/1.1" 200 63422 - "APN/1.
0 Veeam/1.0 Backup/12.0" - latency=0.02216s
2023-07-19T22:04:15.972+ 7ff08307b700  1 == starting new request 
req=0x7fefc7682710 =
2023-07-19T22:04:15.972+ 7ff087083700  1 == starting new request 
req=0x7fefc737c710 =
2023-07-19T22:04:15.972+ 7ff071057700  1 == starting new request 
req=0x7fefc72fb710 =
2023-07-19T22:04:15.972+ 7ff0998a8700  1 == starting new request 
req=0x7fefc71f9710 =
2023-07-19T22:04:15.972+ 7fefe473e700 -1 *** Caught signal (Aborted) **
 in thread 7fefe473e700 thread_name:radosgw

 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
 1: /lib64/libpthread.so.0(+0x12cf0) [0x7ff102d62cf0]
 2: gsignal()
 3: abort()
 4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff101d5209b]
 5: /lib64/libstdc++.so.6(+0x9653c) [0x7ff101d5853c]
 6: /lib64/libstdc++.so.6(+0x95559) [0x7ff101d57559]
 7: __gxx_personality_v0()
 8: /lib64/libgcc_s.so.1(+0x10b03) [0x7ff101736b03]
 9: _Unwind_Resume()
 10: /lib64/libradosgw.so.2(+0x538c5b) [0x7ff105246c5b]

--
--
   -10> 2023-07-19T22:04:15.972+ 7ff071057700  2 req 8167590275148061076 
0.0s s3:put_obj pre-executing
-9> 2023-07-19T22:04:15.972+ 7ff071057700  2 req 8167590275148061076 
0.0s s3:put_obj check rate limiting
-8> 2023-07-19T22:04:15.972+ 7ff071057700  2 req 8167590275148061076 
0.0s s3:put_obj executing
-7> 2023-07-19T22:04:15.972+ 7ff0998a8700  1 == starting new 
request req=0x7fefc71f9710 =
-6> 2023-07-19T22:04:15.972+ 7ff0998a8700  2 req 15658207768827051601 
0.0s initializing for trans_id = tx0d94d21014832be51-0064b85
ddf-3dfe-backup
-5> 2023-07-19T22:04:15.972+ 7ff0998a8700  2 req 15658207768827051601 
0.0s getting op 1
-4> 2023-07-19T22:04:15.972+ 7ff0998a8700  2 req 15658207768827051601 
0.0s s3:put_obj verifying requester
-3> 2023-07-19T22:04:15.972+ 7ff0998a8700  2 req 15658207768827051601 
0.0s s3:put_obj normalizing buckets and tenants
-2> 2023-07-19T22:04:15.972+ 7ff0998a8700  2 req 15658207768827051601 
0.0s s3:put_obj init permissions
-1> 2023-07-19T22:04:15.972+ 7ff011798700  2 req 15261257039771290446 
0.024000257s s3:put_obj completing
 0> 2023-07-19T22:04:15.972+ 7fefe473e700 -1 *** Caught signal 
(Aborted) **
 in thread 7fefe473e700 thread_name:radosgw

"

Anyone have this issue ?
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: librbd hangs during large backfill

2023-07-20 Thread Jack Hayhurst
We did have a peering storm, we're past that portion of the backfill and still 
experiencing new instances of rbd volumes hanging. It is for sure not just the 
peering storm.

We've got 22.184% objects misplaced yet, with a bunch of pgs left to backfill 
(like 75k). Our rbd poll is using about 1.7PiB of storage, so we're looking at 
like 370TiB yet to backfill, rough estimate. This specific pool is using 
replicated encoding, with size=3.

RAW STORAGE:
CLASS SIZE   AVAIL  USED   RAW USED %RAW USED
hdd   21 PiB 11 PiB 10 PiB   10 PiB 48.73
TOTAL 21 PiB 11 PiB 10 PiB   10 PiB 48.73

POOLS:
POOLID PGS   STORED  
OBJECTS USED%USED MAX AVAIL
pool14 32768 574 TiB 
147.16M 1.7 PiB 68.87   260 TiB

We did see a lot of rbd volumes that hung, often giving the buffer i/o errors 
previously sent - whether that was the peering storm or backfills is uncertain. 
As suggested, we've already been detaching/reattaching the rbd volumes, pushing 
the primary active osd for pgs to another, and sometimes rebooting the kernel 
on the vm to clear the io queue. A combination of those brings the rbd volume 
block device back for a while.

We're no longer in a peering storm and we're seeing the rbd volumes going into 
an unresponsive state again - including osds where they were unresponsive, we 
did things and got them responsive, and then they went unresponsive again. All 
pgs are in an active state, some active+remapped+backfilling, some 
active+undersized+remapped+backfilling, etc.

We also run the object gateway off the same cluster with the same backfill, the 
object gateway is not experiencing issues. Also the osds patricipating in the 
backfill are not saturated with i/o, or seeing abnormal load for our usual 
backfill operations.

But with the continuing backfill, we're seeing rbd volumes on active pgs going 
back into a blocked state. We can do about the same with detaching the volume / 
bouncing the pg to a new primary acting osd, but we'd rather have these stop 
going unresponsive in the first place. Any suggestions towards that direction 
are greatly appreciated.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
I think the rook-ceph is not responding to the liveness probe (confirmed by k8s 
describe mds pod) I don't think it's the memory as I don't limit it, and I have 
the cpu set to 500m per mds, but what direction should I go from here?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-07-20 Thread siddhit . renake
Hello Eugen,

Requested details are as below.

PG ID: 15.28f0
Pool ID: 15
Pool:  default.rgw.buckets.data   
Pool EC Ratio: 8: 3
Number of Hosts: 12

## crush dump for rule ##
#ceph osd crush rule dump data_ec_rule
{
"rule_id": 1,
"rule_name": "data_ec_rule",
"ruleset": 1,
"type": 3,
"min_size": 3,
"max_size": 11,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -50,
"item_name": "root_data~hdd"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}

## From Crushmap dump ##
rule data_ec_rule {
id 1
type erasure
min_size 3
max_size 11
step set_chooseleaf_tries 5
step set_choose_tries 100
step take root_data class hdd
step chooseleaf indep 0 type host
step emit
}

## EC Profile ##
ceph osd erasure-code-profile get data
crush-device-class=hdd
crush-failure-domain=host
crush-root=root_data
jerasure-per-chunk-alignment=false
k=8
m=3
plugin=jerasure
technique=reed_sol_van
w=8

OSD Tree:
https://pastebin.com/raw/q6u7aSeu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-07-20 Thread siddhit . renake
What should be appropriate way to restart primary OSD in this case (343) ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: librbd hangs during large backfill

2023-07-20 Thread fb2cd0fc-933c-4cfe-b534-93d67045a088
We did have a peering storm, we're past that portion of the backfill and still 
experiencing new instances of rbd volumes hanging. It is for sure not just the 
peering storm.

We've got 22.184% objects misplaced yet, with a bunch of pgs left to backfill 
(like 75k). Our rbd poll is using about 1.7PiB of storage, so we're looking at 
like 370TiB yet to backfill, rough estimate. This specific pool is using 
replicated encoding, with size=3.

RAW STORAGE:
CLASS SIZE   AVAIL  USED   RAW USED %RAW USED
hdd   21 PiB 11 PiB 10 PiB   10 PiB 48.73
TOTAL 21 PiB 11 PiB 10 PiB   10 PiB 48.73

POOLS:
POOLID PGS   STORED  
OBJECTS USED%USED MAX AVAIL
pool14 32768 574 TiB 
147.16M 1.7 PiB 68.87   260 TiB

We did see a lot of rbd volumes that hung, often giving the buffer i/o errors 
previously sent - whether that was the peering storm or backfills is uncertain. 
As suggested, we've already been detaching/reattaching the rbd volumes, pushing 
the primary active osd for pgs to another, and sometimes rebooting the kernel 
on the vm to clear the io queue. A combination of those brings the rbd volume 
block device back for a while.

We're no longer in a peering storm and we're seeing the rbd volumes going into 
an unresponsive state again - including osds where they were unresponsive, we 
did things and got them responsive, and then they went unresponsive again. All 
pgs are in an active state, some active+remapped+backfilling, some 
active+undersized+remapped+backfilling, etc.

We also run the object gateway off the same cluster with the same backfill, the 
object gateway is not experiencing issues. Also the osds patricipating in the 
backfill are not saturated with i/o, or seeing abnormal load for our usual 
backfill operations.

But with the continuing backfill, we're seeing rbd volumes on active pgs going 
back into a blocked state. We can do about the same with detaching the volume / 
bouncing the pg to a new primary acting osd, but we'd rather have these stop 
going unresponsive in the first place. Any suggestions towards that direction 
are greatly appreciated.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
This issue has been closed.
If any rook-ceph users see this, when mds replay takes a long time, look at the 
logs in mds pod.
If it's going well and then abruptly terminates, try describing the mds pod, 
and if liveness probe terminated, try increasing the threadhold of liveness 
probe.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds terminated

2023-07-20 Thread dxodnd
If any rook-ceph users see the situation that mds is stuck in replay, then look 
at the logs of the mds pod.

When it runs and then terminates repeatedly, check if  there is "liveness probe 
termninated" error message by typing "kubectl describe pod -n (namspace) (mds' 
pod name)"

If there is the error message, it's helpful to increase the threshold about 
"liveness probe"

In my case, it resolved the issue.
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw multisite sync not syncing data, error: RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

2023-07-20 Thread david . piper
Hey Christian, 

What does sync look like on the first site?  And does restarting the RGW 
instances on the first site fix up your issues?

We saw issues in the past that sound a lot like yours. We've adopted the 
practice of restarting the RGW instances in the first cluster after deploying a 
second cluster, and that's got sync working in both directions.

Dave
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGWs offline after upgrade to Nautilus

2023-07-20 Thread Ben . Zieglmeier
Hello,

We have an RGW cluster that was recently upgraded from 12.2.11 to 14.2.22. The 
upgrade went mostly fine, though now several of our RGWs will not start. One 
RGW is working fine, the rest will not initialize. They are on a crash loop. 
This is part of a multisite configuration, and is currently not the master 
zone. Current master zone is running 14.2.22. These are the only two zones in 
the zonegroup. After turning debug up to 20, these are the log snippets between 
each crash:
```
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got 
periods.1b6e1a93-98ba-4378-bc5c-d36cd5542f11.52
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got 
periods.1b6e1a93-98ba-4378-bc5c-d36cd5542f11.54
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got 
realms_names. 
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got 
2023-07-20 14:29:56.371 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.371 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.371 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=114
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.374 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=686
2023-07-20 14:29:56.374 7fd8dec40900 20 period zonegroup init ret 0
2023-07-20 14:29:56.374 7fd8dec40900 20 period zonegroup name 
2023-07-20 14:29:56.374 7fd8dec40900 20 using current period zonegroup 

2023-07-20 14:29:56.374 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.374 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.374 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.375 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=903
2023-07-20 14:29:56.375 7fd8dec40900 10 Cannot find current period zone using 
local zone
2023-07-20 14:29:56.375 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.375 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=903
2023-07-20 14:29:56.375 7fd8dec40900 20 zone 
2023-07-20 14:29:56.375 7fd8dec40900 20 generating connection object for zone 
 id f10b465f-bf18-47d0-a51c-ca4f17118ee1
2023-07-20 14:34:56.198 7fd8cafe8700 -1 Initialization timeout, failed to 
initialize
```

I’ve checked all file permissions, filesystem free space, disabled selinux and 
firewalld, tried turning up the initialization timeout to 600, and tried 
removing all non-essential config from ceph.conf. All produce the same results. 
I would greatly appreciate any other ideas or insight.

Thanks,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-07-20 Thread Matthew Leonard (BLOOMBERG/ 120 PARK)
Assuming you're running systemctl OSDs you can run the following command on the 
host that OSD 343 resides on.

systemctl restart ceph-osd@343 

From: siddhit.ren...@nxtgen.com At: 07/20/23 13:44:36 UTC-4:00To:  
ceph-users@ceph.io
Subject: [ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long 
time

What should be appropriate way to restart primary OSD in this case (343) ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

2023-07-20 Thread Niklas Hambüchen

Thank you both Michel and Christian.

Looks like I will have to do the rebalancing eventually.
From past experience with Ceph 16 the rebalance will likely take at least a 
month with my 500 M objects.

It seems like a good idea to upgrade to Ceph 17 first as Michel suggests.

Unless:

I was hoping that Ceph might have a way to reduce the rebalancing, given that 
all constraints about failure domains are already fulfilled.

In particular, I was wondering whether I could play with the names of the 
"datacenter"s, to bring them in the same (alphabetical?) order as the hosts 
were so far.
I suspect that this is what avoided the reshuffling on my my mini test cluster.
I think it would be in alignment with Table 1 from the CRUSH paper: 
https://ceph.com/assets/pdfs/weil-crush-sc06.pdf

E.g. perhaps

take(root)
select(1, row)
select(3, cabinet)
emit

yields the same result as

take(root)
select(3, row)
select(1, cabinet)
emit

?


Niklas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 PG stucked in "active+undersized+degraded for long time

2023-07-20 Thread Anthony D'Atri
Sometimes one can even get away with "ceph osd down 343" which doesn't affect 
the process.  I have had occasions when this goosed peering in a less-intrusive 
way.  I believe it just marks the OSD down in the mons' map, and when that 
makes it to the OSD, the OSD responds with "I'm not dead yet" and gets marked 
up again.

> On Jul 20, 2023, at 13:50, Matthew Leonard (BLOOMBERG/ 120 PARK) 
>  wrote:
> 
> Assuming you're running systemctl OSDs you can run the following command on 
> the host that OSD 343 resides on.
> 
> systemctl restart ceph-osd@343 
> 
> From: siddhit.ren...@nxtgen.com At: 07/20/23 13:44:36 UTC-4:00To:  
> ceph-users@ceph.io
> Subject: [ceph-users] Re: 1 PG stucked in "active+undersized+degraded for 
> long time
> 
> What should be appropriate way to restart primary OSD in this case (343) ?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

2023-07-20 Thread Michel Jouvin

Hi Niklas,

As I said, ceph placement is based on more than fulfilling the failure 
domain constraint. This is a core feature in ceph design. There is no 
reason for a rebalancing on a cluster with a few hundreds OSDs to last a 
month. Just before 17 you have to adjust the max backfills parameter whose 
default is 1, a very conservative value. Using 2 should already reduce to 
rebalancing to a few days. But my experience shows that if it an option, 
upgrading to quincy first may be a better option due to to the autotuning 
of the number of backfills based on the real load of the cluster.


If your cluster is using cephadm, upgrading to quincy is very 
straightforward and should be complete I. A couple of hours for the cluster 
size I mentioned.


Cheers,

Michel
Sent from my mobile
Le 20 juillet 2023 20:15:54 Niklas Hambüchen  a écrit :


Thank you both Michel and Christian.

Looks like I will have to do the rebalancing eventually.
From past experience with Ceph 16 the rebalance will likely take at least a 
month with my 500 M objects.


It seems like a good idea to upgrade to Ceph 17 first as Michel suggests.

Unless:

I was hoping that Ceph might have a way to reduce the rebalancing, given 
that all constraints about failure domains are already fulfilled.


In particular, I was wondering whether I could play with the names of the 
"datacenter"s, to bring them in the same (alphabetical?) order as the hosts 
were so far.

I suspect that this is what avoided the reshuffling on my my mini test cluster.
I think it would be in alignment with Table 1 from the CRUSH paper: 
https://ceph.com/assets/pdfs/weil-crush-sc06.pdf


E.g. perhaps

take(root)
select(1, row)
select(3, cabinet)
emit

yields the same result as

take(root)
select(3, row)
select(1, cabinet)
emit

?


Niklas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding datacenter level to CRUSH tree causes rebalancing

2023-07-20 Thread Anthony D'Atri
I can believe the month timeframe for a cluster with multiple large spinners 
behind each HBA.  I’ve witnessed such personally.

> On Jul 20, 2023, at 4:16 PM, Michel Jouvin  
> wrote:
> 
> Hi Niklas,
> 
> As I said, ceph placement is based on more than fulfilling the failure domain 
> constraint. This is a core feature in ceph design. There is no reason for a 
> rebalancing on a cluster with a few hundreds OSDs to last a month. Just 
> before 17 you have to adjust the max backfills parameter whose default is 1, 
> a very conservative value. Using 2 should already reduce to rebalancing to a 
> few days. But my experience shows that if it an option, upgrading to quincy 
> first may be a better option due to to the autotuning of the number of 
> backfills based on the real load of the cluster.
> 
> If your cluster is using cephadm, upgrading to quincy is very straightforward 
> and should be complete I. A couple of hours for the cluster size I mentioned.
> 
> Cheers,
> 
> Michel
> Sent from my mobile
> Le 20 juillet 2023 20:15:54 Niklas Hambüchen  a écrit :
> 
>> Thank you both Michel and Christian.
>> 
>> Looks like I will have to do the rebalancing eventually.
>> From past experience with Ceph 16 the rebalance will likely take at least a 
>> month with my 500 M objects.
>> 
>> It seems like a good idea to upgrade to Ceph 17 first as Michel suggests.
>> 
>> Unless:
>> 
>> I was hoping that Ceph might have a way to reduce the rebalancing, given 
>> that all constraints about failure domains are already fulfilled.
>> 
>> In particular, I was wondering whether I could play with the names of the 
>> "datacenter"s, to bring them in the same (alphabetical?) order as the hosts 
>> were so far.
>> I suspect that this is what avoided the reshuffling on my my mini test 
>> cluster.
>> I think it would be in alignment with Table 1 from the CRUSH paper: 
>> https://ceph.com/assets/pdfs/weil-crush-sc06.pdf
>> 
>> E.g. perhaps
>> 
>> take(root)
>> select(1, row)
>> select(3, cabinet)
>> emit
>> 
>> yields the same result as
>> 
>> take(root)
>> select(3, row)
>> select(1, cabinet)
>> emit
>> 
>> ?
>> 
>> 
>> Niklas
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds terminated

2023-07-20 Thread Venky Shankar
On Thu, Jul 20, 2023 at 11:19 PM  wrote:
>
> If any rook-ceph users see the situation that mds is stuck in replay, then 
> look at the logs of the mds pod.
>
> When it runs and then terminates repeatedly, check if  there is "liveness 
> probe termninated" error message by typing "kubectl describe pod -n 
> (namspace) (mds' pod name)"
>
> If there is the error message, it's helpful to increase the threshold about 
> "liveness probe"
>
> In my case, it resolved the issue.

Would you mind sharing what version of ceph (mds) was used? In a
particular (pacific) release, the mds would abort when it received an
metric update message (from a client) that it did not understand.

> Thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS stuck in rejoin

2023-07-20 Thread Xiubo Li


On 7/20/23 22:09, Frank Schilder wrote:

Hi all,

we had a client with the warning "[WRN] MDS_CLIENT_OLDEST_TID: 1 clients failing to 
advance oldest client/flush tid". I looked at the client and there was nothing going 
on, so I rebooted it. After the client was back, the message was still there. To clean 
this up I failed the MDS. Unfortunately, the MDS that took over is remained stuck in 
rejoin without doing anything. All that happened in the log was:


BTW, are you using the kclient or user space client ? How long was the 
MDS stuck in rejoin state ?


This means in the client side the oldest client has been stuck too long, 
maybe in heavy load case there were to many requests generated in a 
short time and the oldest request was stuck too long in MDS.




[root@ceph-10 ceph]# tail -f ceph-mds.ceph-10.log
2023-07-20T15:54:29.147+0200 7fedb9c9f700  1 mds.2.896604 rejoin_start
2023-07-20T15:54:29.161+0200 7fedb9c9f700  1 mds.2.896604 rejoin_joint_start
2023-07-20T15:55:28.005+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896614 from mon.4
2023-07-20T15:56:00.278+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896615 from mon.4
[...]
2023-07-20T16:02:54.935+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896653 from mon.4
2023-07-20T16:03:07.276+0200 7fedb9c9f700  1 mds.ceph-10 Updating MDS map to 
version 896654 from mon.4


Did you see any slow request log in the mds log files ? And any other 
suspect logs from the dmesg if it's kclient ?




After some time I decided to give another fail a try and, this time, the 
replacement daemon went to active state really fast.

If I have a message like the above, what is the clean way of getting the client 
clean again (version: 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) 
octopus (stable))?


I think your steps are correct.

Thanks

- Xiubo



Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGWs offline after upgrade to Nautilus

2023-07-20 Thread Eugen Block

Hi,

a couple of threads with similar error messages all lead back to some  
sort of pool or osd issue. What is your current cluster status (ceph  
-s)? Do you have some full OSDs? Those can cause this initialization  
timeout as well as hit the max_pg_per_osd limit. So a few more cluster  
details could help here.


Thanks,
Eugen

Zitat von "Ben.Zieglmeier" :


Hello,

We have an RGW cluster that was recently upgraded from 12.2.11 to  
14.2.22. The upgrade went mostly fine, though now several of our  
RGWs will not start. One RGW is working fine, the rest will not  
initialize. They are on a crash loop. This is part of a multisite  
configuration, and is currently not the master zone. Current master  
zone is running 14.2.22. These are the only two zones in the  
zonegroup. After turning debug up to 20, these are the log snippets  
between each crash:

```
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got  
periods.1b6e1a93-98ba-4378-bc5c-d36cd5542f11.52
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got  
periods.1b6e1a93-98ba-4378-bc5c-d36cd5542f11.54
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got  
realms_names. 
2023-07-20 14:29:56.371 7fd8dec40900 20 RGWRados::pool_iterate: got  


2023-07-20 14:29:56.371 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.371 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.371 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=-2 bl.length=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=114
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.373 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.373 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.374 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=686
2023-07-20 14:29:56.374 7fd8dec40900 20 period zonegroup init ret 0
2023-07-20 14:29:56.374 7fd8dec40900 20 period zonegroup name 
2023-07-20 14:29:56.374 7fd8dec40900 20 using current period  
zonegroup 

2023-07-20 14:29:56.374 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.374 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=46
2023-07-20 14:29:56.374 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.375 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=903
2023-07-20 14:29:56.375 7fd8dec40900 10 Cannot find current period  
zone using local zone

2023-07-20 14:29:56.375 7fd8dec40900 20 rados->read ofs=0 len=0
2023-07-20 14:29:56.375 7fd8dec40900 20 rados_obj.operate() r=0 bl.length=903
2023-07-20 14:29:56.375 7fd8dec40900 20 zone 
2023-07-20 14:29:56.375 7fd8dec40900 20 generating connection object  
for zone  id f10b465f-bf18-47d0-a51c-ca4f17118ee1
2023-07-20 14:34:56.198 7fd8cafe8700 -1 Initialization timeout,  
failed to initialize

```

I’ve checked all file permissions, filesystem free space, disabled  
selinux and firewalld, tried turning up the initialization timeout  
to 600, and tried removing all non-essential config from ceph.conf.  
All produce the same results. I would greatly appreciate any other  
ideas or insight.


Thanks,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD tries (and fails) to scrub the same PGs over and over

2023-07-20 Thread Eugen Block

Hi,

what's the cluster status? Is there recovery or backfilling going on?


Zitat von Vladimir Brik :

I have a PG that hasn't been scrubbed in over a month and not  
deep-scrubbed in over two months.


I tried forcing with `ceph pg (deep-)scrub` but with no success.

Looking at the logs of that PG's primary OSD it looks like every  
once in a while it attempts (and apparently fails) to scrub that PG,  
along with two others, over and over. For example:


2023-07-19T16:26:07.082 ... 24.3ea scrub starts
2023-07-19T16:26:10.284 ... 27.aae scrub starts
2023-07-19T16:26:11.169 ... 24.aa scrub starts
2023-07-19T16:26:12.153 ... 24.3ea scrub starts
2023-07-19T16:26:13.346 ... 27.aae scrub starts
2023-07-19T16:26:16.239 ... 24.aa scrub starts
...

Lines like that are repeated throughout the log file.


Has anyone seen something similar? How can I debug this?

I am running 17.2.5


Vlad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io