[ceph-users] WG: Multisite sync issue

2022-02-25 Thread Poß , Julian
Hi,

i did setup multisite with 2 ceph clusters and multiple rgw's and 
realms/zonegroups.
This setup was installed using ceph ansible branch "stable-5.0", with 
focal+octopus.
During some testing, i noticed that somehow the replication seems to not work 
as expected.

With s3cmd, i put a small file of 1.9 kb into a bucket on the master zone
s3cmd put /etc/hosts s3://test/

Then i can see at the output of "radosgw-admin sync status --rgw_realm 
internal", that the cluster has indeed to sync something, and switching back to 
"nothing to sync" after a couple of seconds.
"radosgw-admin sync error list --rgw_realm internal" is emtpy, too.
However, if i look via s3cmd on the secondary zone, i can't see the file. Even 
if i look at the ceph pools directly, the data didn't get replicated.
If i proceed by uploading the file again, with the same command and without a 
change, basically just updating it, or by restarting rgw deamon of the 
secondary zone, the affected file gets replicated.

I spotted this issue with all my realms/zonegroups. But even with "debug_rgw = 
20" and debug_rgw_sync = "20" i can't spot any obvious errors in the logs.

It also worries me that replication won't work with multiple rgws in one zone, 
but one of them being unavailable, for instance during maintenance.
I did somehow expect ceph to work it's way though the list of available 
endpoints, and only fail if none are available.
...Or am I missing something here?

Any help whatsoever is very much appreciated.
I am pretty new to multisite and stuck on this for a couple of days now already.

Thanks, Julian


Here is some additional information, including some log snippets:

# ON Master site, i can see the file in the bilog right away
radosgw-admin bilog list --bucket test/test --rgw_realm internal
{
"op_id": "3#001.445.5",
"op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106",
"op": "write",
"object": "hosts",
"instance": "",
"state": "complete",
"index_ver": 1,
"timestamp": "2022-02-24T09:14:41.957638774Z",
"ver": {
"pool": 7,
"epoch": 2
},
"bilog_flags": 0,
"versioned": false,
"owner": "",
"owner_display_name": "",
"zones_trace": [
{
"entry": 
"b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"
}
]
},


# RGW log of secondary zone shows the sync attempt:
2022-02-24T09:14:52.502+ 7f1419ff3700  0 
RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3]:
 triggering sync of source bucket/shard 
test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3

# but the secondary zone, doesnt actually show the new file in the bilog
radosgw-admin bilog list --bucket test/test --rgw_realm internal

# and the shard log that according to the logfile had the data to sync in it, 
doesn't seem to even exist at the secondary zone
radosgw-admin datalog list --shard-id 72 --rgw_realm internal
ERROR: list_bi_log_entries(): (2) No such file or directory


# RGW Log at master zone, there is one 404 in there which worries me a bit
2022-02-24T09:14:52.515+ 7ff5816e2700  1 beast: 0x7ff6387f77c0: 
192.168.85.71 - - [2022-02-24T09:14:52.515949+] "GET 
/admin/log/?type=bucket-index&bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&info&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609
 HTTP/1.1" 200 94 - - -
2022-02-24T09:14:52.527+ 7ff512604700  1 beast: 0x7ff6386747c0: 
192.168.85.71 - - [2022-02-24T09:14:52.527950+] "GET 
/test?rgwx-bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&versions&format=json&objs-container=true&key-marker&version-id-marker&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609
 HTTP/1.1" 404 146 - - -
2022-02-24T09:14:52.535+ 7ff559e93700  1 beast: 0x7ff6386747c0: 
192.168.85.71 - - [2022-02-24T09:14:52.535950+] "GET 
/admin/log?bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&format=json&marker=001.445.5&type=bucket-index&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609
 HTTP/1.1" 200 2 - - -



# if i update the file, by reuploading it, or restart the rgw deamon of the 
secondary zone, the affected file gets synced
s3cmd put /etc/hosts s3://test/

# again, there is the sync attempt from the secondary zone rgw
2022-02-24T12:04:52.452+ 7f1419ff3700  0 
RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3]:
 triggering sync of source bucket/shard 
test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3

# But now the file does show in bilog, and data log
radosgw-admin bilog list --bucket test/test --rgw_realm internal
{
"op_id": "3#001.456.5",
"op_tag": "_e1zRfGuaFH7mLumu1gapeLzHo9zYU6M",
"op": "write",
"object": "hosts",
"instance": "

[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-25 Thread Janne Johansson
Den fre 25 feb. 2022 kl 08:49 skrev Anthony D'Atri :
> There was a similar discussion last year around Software Heritage’s archive 
> project, suggest digging up that thread.
> Some ideas:
>
> * Pack them into (optionally compressed) tarballs - from a quick search it 
> sorta looks like HAR uses a similar model.  Store the tarballs as RGW 
> objects, or as RBD volumes, or on CephFS.

After doing several different kinds of storage solutions in my career,
this above advice is REALLY important. Many hard to solve problems
have started out with "it is just one million files/objects" and when
you reach 50 and sound the alarm, people try to throw money at the
problem instead, and then you reach 2-3-400M and then you can't ask
for the index in finite time without it being invalid by the time the
list is complete.

If you have a possibility to stick 10,100,1000 small items into a
.tar, into a .zip, into whatever, DO IT. Do it before the numbers grow
too large to handle. When the numbers grow too big, you seldom get the
chance to both keep running in the too-large setup AND re-pack them at
the same time.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: WG: Multisite sync issue

2022-02-25 Thread Eugen Block

Hi,

I would stop alle RGWs except one in each cluster to limit the places  
and logs to look at. Do you have a loadbalancer as endpoint or do you  
have a list of all RGWs as endpoints?



Zitat von "Poß, Julian" :


Hi,

i did setup multisite with 2 ceph clusters and multiple rgw's and  
realms/zonegroups.
This setup was installed using ceph ansible branch "stable-5.0",  
with focal+octopus.
During some testing, i noticed that somehow the replication seems to  
not work as expected.


With s3cmd, i put a small file of 1.9 kb into a bucket on the master zone
s3cmd put /etc/hosts s3://test/

Then i can see at the output of "radosgw-admin sync status  
--rgw_realm internal", that the cluster has indeed to sync  
something, and switching back to "nothing to sync" after a couple of  
seconds.

"radosgw-admin sync error list --rgw_realm internal" is emtpy, too.
However, if i look via s3cmd on the secondary zone, i can't see the  
file. Even if i look at the ceph pools directly, the data didn't get  
replicated.
If i proceed by uploading the file again, with the same command and  
without a change, basically just updating it, or by restarting rgw  
deamon of the secondary zone, the affected file gets replicated.


I spotted this issue with all my realms/zonegroups. But even with  
"debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious  
errors in the logs.


It also worries me that replication won't work with multiple rgws in  
one zone, but one of them being unavailable, for instance during  
maintenance.
I did somehow expect ceph to work it's way though the list of  
available endpoints, and only fail if none are available.

...Or am I missing something here?

Any help whatsoever is very much appreciated.
I am pretty new to multisite and stuck on this for a couple of days  
now already.


Thanks, Julian


Here is some additional information, including some log snippets:

# ON Master site, i can see the file in the bilog right away
radosgw-admin bilog list --bucket test/test --rgw_realm internal
{
"op_id": "3#001.445.5",
"op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106",
"op": "write",
"object": "hosts",
"instance": "",
"state": "complete",
"index_ver": 1,
"timestamp": "2022-02-24T09:14:41.957638774Z",
"ver": {
"pool": 7,
"epoch": 2
},
"bilog_flags": 0,
"versioned": false,
"owner": "",
"owner_display_name": "",
"zones_trace": [
{
"entry":  
"b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"

}
]
},


# RGW log of secondary zone shows the sync attempt:
2022-02-24T09:14:52.502+ 7f1419ff3700  0  
RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard  
test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3


# but the secondary zone, doesnt actually show the new file in the bilog
radosgw-admin bilog list --bucket test/test --rgw_realm internal

# and the shard log that according to the logfile had the data to  
sync in it, doesn't seem to even exist at the secondary zone

radosgw-admin datalog list --shard-id 72 --rgw_realm internal
ERROR: list_bi_log_entries(): (2) No such file or directory


# RGW Log at master zone, there is one 404 in there which worries me a bit
2022-02-24T09:14:52.515+ 7ff5816e2700  1 beast: 0x7ff6387f77c0:  
192.168.85.71 - - [2022-02-24T09:14:52.515949+] "GET  
/admin/log/?type=bucket-index&bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&info&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 HTTP/1.1" 200 94 - -  
-
2022-02-24T09:14:52.527+ 7ff512604700  1 beast: 0x7ff6386747c0:  
192.168.85.71 - - [2022-02-24T09:14:52.527950+] "GET  
/test?rgwx-bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&versions&format=json&objs-container=true&key-marker&version-id-marker&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 HTTP/1.1" 404 146 - -  
-
2022-02-24T09:14:52.535+ 7ff559e93700  1 beast: 0x7ff6386747c0:  
192.168.85.71 - - [2022-02-24T09:14:52.535950+] "GET  
/admin/log?bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&format=json&marker=001.445.5&type=bucket-index&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 HTTP/1.1" 200 2 - -  
-




# if i update the file, by reuploading it, or restart the rgw deamon  
of the secondary zone, the affected file gets synced

s3cmd put /etc/hosts s3://test/

# again, there is the sync attempt from the secondary zone rgw
2022-02-24T12:04:52.452+ 7f1419ff3700  0  
RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard  
test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3


# But now the file

[ceph-users] Re: WG: Multisite sync issue

2022-02-25 Thread Poß , Julian
Hi Eugen,

there is currently only one RGW installed for each region+realm.
So the places to look at are already pretty much limited.

As of now, the RGWs itself are the endpoints. So far no loadbalancer has been 
put into place there.

Best, Julian

-Ursprüngliche Nachricht-
Von: Eugen Block  
Gesendet: Freitag, 25. Februar 2022 10:52
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: WG: Multisite sync issue

This email originated from outside of CGM. Please do not click links or open 
attachments unless you know the sender and know the content is safe.


Hi,

I would stop alle RGWs except one in each cluster to limit the places and logs 
to look at. Do you have a loadbalancer as endpoint or do you have a list of all 
RGWs as endpoints?


Zitat von "Poß, Julian" :

> Hi,
>
> i did setup multisite with 2 ceph clusters and multiple rgw's and 
> realms/zonegroups.
> This setup was installed using ceph ansible branch "stable-5.0", with 
> focal+octopus.
> During some testing, i noticed that somehow the replication seems to 
> not work as expected.
>
> With s3cmd, i put a small file of 1.9 kb into a bucket on the master 
> zone s3cmd put /etc/hosts s3://test/
>
> Then i can see at the output of "radosgw-admin sync status --rgw_realm 
> internal", that the cluster has indeed to sync something, and 
> switching back to "nothing to sync" after a couple of seconds.
> "radosgw-admin sync error list --rgw_realm internal" is emtpy, too.
> However, if i look via s3cmd on the secondary zone, i can't see the 
> file. Even if i look at the ceph pools directly, the data didn't get 
> replicated.
> If i proceed by uploading the file again, with the same command and 
> without a change, basically just updating it, or by restarting rgw 
> deamon of the secondary zone, the affected file gets replicated.
>
> I spotted this issue with all my realms/zonegroups. But even with 
> "debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious 
> errors in the logs.
>
> It also worries me that replication won't work with multiple rgws in 
> one zone, but one of them being unavailable, for instance during 
> maintenance.
> I did somehow expect ceph to work it's way though the list of 
> available endpoints, and only fail if none are available.
> ...Or am I missing something here?
>
> Any help whatsoever is very much appreciated.
> I am pretty new to multisite and stuck on this for a couple of days 
> now already.
>
> Thanks, Julian
>
>
> Here is some additional information, including some log snippets:
>
> # ON Master site, i can see the file in the bilog right away 
> radosgw-admin bilog list --bucket test/test --rgw_realm internal
> {
> "op_id": "3#001.445.5",
> "op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106",
> "op": "write",
> "object": "hosts",
> "instance": "",
> "state": "complete",
> "index_ver": 1,
> "timestamp": "2022-02-24T09:14:41.957638774Z",
> "ver": {
> "pool": 7,
> "epoch": 2
> },
> "bilog_flags": 0,
> "versioned": false,
> "owner": "",
> "owner_display_name": "",
> "zones_trace": [
> {
> "entry":
> "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"
> }
> ]
> },
>
>
> # RGW log of secondary zone shows the sync attempt:
> 2022-02-24T09:14:52.502+ 7f1419ff3700  0
> RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a
> 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard
> test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3
>
> # but the secondary zone, doesnt actually show the new file in the 
> bilog radosgw-admin bilog list --bucket test/test --rgw_realm internal
>
> # and the shard log that according to the logfile had the data to sync 
> in it, doesn't seem to even exist at the secondary zone radosgw-admin 
> datalog list --shard-id 72 --rgw_realm internal
> ERROR: list_bi_log_entries(): (2) No such file or directory
>
>
> # RGW Log at master zone, there is one 404 in there which worries me a 
> bit
> 2022-02-24T09:14:52.515+ 7ff5816e2700  1 beast: 0x7ff6387f77c0:
> 192.168.85.71 - - [2022-02-24T09:14:52.515949+] "GET
> /admin/log/?type=bucket-index&bucket-instance=test%2Ftest%3Ab9794e07-8
> f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&info&rgwx-zonegroup=7d35d818-088
> 1-483a-b1bf-47ec21f26609 HTTP/1.1" 200 94 - -
> -
> 2022-02-24T09:14:52.527+ 7ff512604700  1 beast: 0x7ff6386747c0:
> 192.168.85.71 - - [2022-02-24T09:14:52.527950+] "GET
> /test?rgwx-bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3
> a4dc863.8366.3%3A3&versions&format=json&objs-container=true&key-marker
> &version-id-marker&rgwx-zonegroup=7d35d818-0881-483a-b1bf-47ec21f26609 
> HTTP/1.1" 404 146 - -
> -
> 2022-02-24T09:14:52.535+ 7ff559e93700  1 beast: 0x7ff6386747c0:
> 192.168.85.71 - - [2022-02-24T09:14:52.535

[ceph-users] taking out ssd osd's, having backfilling with hdd's?

2022-02-25 Thread Marc
I am taking out ssd's, and get backfilling on hdd's, how is this possible?


 2   active+remapped+backfill_wait
 1   active+remapped+backfilling

pools 51, 53, 20 are backfilling, these pools are having crush rules

replicated_ruleset
"steps": [
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},

fs_data.ec21
{
"op": "take",
"item": -10,
"item_name": "default~hdd"
},





ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using NFS-Ganesha V4 with current ceph docker image V16.2.7 ?

2022-02-25 Thread Uwe Richter


Hallo all,

I want to use NFS-Ganesha V4 for it's "POSIX ACL support for FSAL_CEPH"
(=> https://github.com/nfs-ganesha/nfs-ganesha/wiki/ReleaseNotes_4 )
with a docker container from quay.io/ceph/ceph in our running cluster.

For e.g. tag v16.2.7 in the manifest-ContainerConfig.Cmd
(=> 
https://quay.io/repository/ceph/ceph/manifest/sha256:c3a89afac4f9c83c716af57e08863f7010318538c7e2cd911458800097f7d97d
 )
and for CEPH_VERSION==master there is 
https://buildlogs.centos.org/centos/8/storage/x86_64/nfsganesha-3/ used, while NFS-Ganesha V4 packages are right there in

https://buildlogs.centos.org/centos/8/storage/x86_64/nfsganesha-4/

Is there a way and how to use the packages from
https://buildlogs.centos.org/centos/8/storage/x86_64/nfsganesha-4/ 
with the current docker pacific image?


Thanks & best regards
Uwe


--
Uwe Richter
FSU Jena, Fakultät für Mathematik und Informatik, KSZ
Ernst-Abbe-Platz 2, Zi. 3418, D-07740 Jena
mailto:uwe.rich...@uni-jena.de, tel:+49.3641.9.46044
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: WG: Multisite sync issue

2022-02-25 Thread Eugen Block

I see, then I misread your statement about multiple RGWs:

It also worries me that replication won't work with multiple rgws in  
one zone, but one of them being unavailable, for instance during  
maintenance.


Is there anything else than the RGW logs pointing to any issues? I  
find it strange that after a restart of the RGW fixes it. Is this  
always reproducable?


Zitat von "Poß, Julian" :


Hi Eugen,

there is currently only one RGW installed for each region+realm.
So the places to look at are already pretty much limited.

As of now, the RGWs itself are the endpoints. So far no loadbalancer  
has been put into place there.


Best, Julian

-Ursprüngliche Nachricht-
Von: Eugen Block 
Gesendet: Freitag, 25. Februar 2022 10:52
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: WG: Multisite sync issue

This email originated from outside of CGM. Please do not click links  
or open attachments unless you know the sender and know the content  
is safe.



Hi,

I would stop alle RGWs except one in each cluster to limit the  
places and logs to look at. Do you have a loadbalancer as endpoint  
or do you have a list of all RGWs as endpoints?



Zitat von "Poß, Julian" :


Hi,

i did setup multisite with 2 ceph clusters and multiple rgw's and
realms/zonegroups.
This setup was installed using ceph ansible branch "stable-5.0", with
focal+octopus.
During some testing, i noticed that somehow the replication seems to
not work as expected.

With s3cmd, i put a small file of 1.9 kb into a bucket on the master
zone s3cmd put /etc/hosts s3://test/

Then i can see at the output of "radosgw-admin sync status --rgw_realm
internal", that the cluster has indeed to sync something, and
switching back to "nothing to sync" after a couple of seconds.
"radosgw-admin sync error list --rgw_realm internal" is emtpy, too.
However, if i look via s3cmd on the secondary zone, i can't see the
file. Even if i look at the ceph pools directly, the data didn't get
replicated.
If i proceed by uploading the file again, with the same command and
without a change, basically just updating it, or by restarting rgw
deamon of the secondary zone, the affected file gets replicated.

I spotted this issue with all my realms/zonegroups. But even with
"debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious
errors in the logs.

It also worries me that replication won't work with multiple rgws in
one zone, but one of them being unavailable, for instance during
maintenance.
I did somehow expect ceph to work it's way though the list of
available endpoints, and only fail if none are available.
...Or am I missing something here?

Any help whatsoever is very much appreciated.
I am pretty new to multisite and stuck on this for a couple of days
now already.

Thanks, Julian


Here is some additional information, including some log snippets:

# ON Master site, i can see the file in the bilog right away
radosgw-admin bilog list --bucket test/test --rgw_realm internal
{
"op_id": "3#001.445.5",
"op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106",
"op": "write",
"object": "hosts",
"instance": "",
"state": "complete",
"index_ver": 1,
"timestamp": "2022-02-24T09:14:41.957638774Z",
"ver": {
"pool": 7,
"epoch": 2
},
"bilog_flags": 0,
"versioned": false,
"owner": "",
"owner_display_name": "",
"zones_trace": [
{
"entry":
"b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"
}
]
},


# RGW log of secondary zone shows the sync attempt:
2022-02-24T09:14:52.502+ 7f1419ff3700  0
RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-a
73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard
test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3

# but the secondary zone, doesnt actually show the new file in the
bilog radosgw-admin bilog list --bucket test/test --rgw_realm internal

# and the shard log that according to the logfile had the data to sync
in it, doesn't seem to even exist at the secondary zone radosgw-admin
datalog list --shard-id 72 --rgw_realm internal
ERROR: list_bi_log_entries(): (2) No such file or directory


# RGW Log at master zone, there is one 404 in there which worries me a
bit
2022-02-24T09:14:52.515+ 7ff5816e2700  1 beast: 0x7ff6387f77c0:
192.168.85.71 - - [2022-02-24T09:14:52.515949+] "GET
/admin/log/?type=bucket-index&bucket-instance=test%2Ftest%3Ab9794e07-8
f6c-4c45-a981-a73c3a4dc863.8366.3%3A3&info&rgwx-zonegroup=7d35d818-088
1-483a-b1bf-47ec21f26609 HTTP/1.1" 200 94 - -
-
2022-02-24T09:14:52.527+ 7ff512604700  1 beast: 0x7ff6386747c0:
192.168.85.71 - - [2022-02-24T09:14:52.527950+] "GET
/test?rgwx-bucket-instance=test%2Ftest%3Ab9794e07-8f6c-4c45-a981-a73c3
a4dc863.8366.3%3A3&versions&format=json&objs-container=true&key-marker
&versio

[ceph-users] removing osd, reweight 0, backfilling done, after purge, again backfilling.

2022-02-25 Thread Marc


I have a clean cluster state, with the osd's that I am going to remove a 
reweight of 0. And then after executing 'ceph osd purge 19', I have again 
remapping+backfilling done?

Is this indeed the correct procedure, or is this old?
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: removing osd, reweight 0, backfilling done, after purge, again backfilling.

2022-02-25 Thread Janne Johansson
Den fre 25 feb. 2022 kl 13:00 skrev Marc :
> I have a clean cluster state, with the osd's that I am going to remove a 
> reweight of 0. And then after executing 'ceph osd purge 19', I have again 
> remapping+backfilling done?
>
> Is this indeed the correct procedure, or is this old?
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual

When you either 1) purge an OSD, or 2) ceph osd crush reweight to 0.0
you change the total weight of the OSD-host, so if you ceph osd
reweight an OSD, it will push its PGs to other OSDs on the same host
and empty itself, but that host is now having more PGs than it really
should. When you do one of the two above steps, the host weight
becomes corrected and the extra PGs move to other osd hosts. This will
also affect the total weight of the whole subtree, so other PGs might
start moving aswell, on hosts not directly related, but this is more
uncommon.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-25 Thread Bobby
thanks Anthony and Janneexactly what I have been looking for!

On Fri, Feb 25, 2022 at 9:25 AM Janne Johansson  wrote:

> Den fre 25 feb. 2022 kl 08:49 skrev Anthony D'Atri <
> anthony.da...@gmail.com>:
> > There was a similar discussion last year around Software Heritage’s
> archive project, suggest digging up that thread.
> > Some ideas:
> >
> > * Pack them into (optionally compressed) tarballs - from a quick search
> it sorta looks like HAR uses a similar model.  Store the tarballs as RGW
> objects, or as RBD volumes, or on CephFS.
>
> After doing several different kinds of storage solutions in my career,
> this above advice is REALLY important. Many hard to solve problems
> have started out with "it is just one million files/objects" and when
> you reach 50 and sound the alarm, people try to throw money at the
> problem instead, and then you reach 2-3-400M and then you can't ask
> for the index in finite time without it being invalid by the time the
> list is complete.
>
> If you have a possibility to stick 10,100,1000 small items into a
> .tar, into a .zip, into whatever, DO IT. Do it before the numbers grow
> too large to handle. When the numbers grow too big, you seldom get the
> chance to both keep running in the too-large setup AND re-pack them at
> the same time.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-25 Thread Anthony D'Atri
You bet, glad to help.  

Zillions of small files indeed present a relatively higher metadata overhead, 
and can be problematic in multiple ways.  When using RGW, indexless buckets may 
be advantageous.  

Another phenomenon is space amplification — with say a 1 GB file/object, a 
partially full last allocated block is a trivial amount of wasted space, 
sometimes called internal fragmentation.  As the files get smaller, this 
becomes an increasingly larger ratio. 

Mark’s sheet is terrific for visualizing this:

https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?usp=sharing

Work was done a couple of releases ago to allow lowering the default 
min_alloc_size because of the inefficiency with small RGW objects especially.  
A subtle additional factor that is often missed is that RADOS writes full 
stripes, adding another layer of potential incremental wasted space that can be 
increased by misaligned / larger EC profiles vs replication.  


> On Feb 25, 2022, at 4:18 AM, Bobby  wrote:
> 
> 
> 
> thanks Anthony and Janneexactly what I have been looking for!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: WG: Multisite sync issue

2022-02-25 Thread Poß , Julian
As far as i can tell, it can be reproduced every time, yes.

That statement was actually about two RGW in one zone. That is also something 
that I tested.
Because I felt like ceph should be able to handle that ha-like on its own.

But for the main issue, there is indeed only one rgw in each zone running.
Well as far as I can tell, I see no issues others than what I posted in my 
initial mail.

Best, Julian

-Ursprüngliche Nachricht-
Von: Eugen Block  
Gesendet: Freitag, 25. Februar 2022 12:57
An: ceph-users@ceph.io
Betreff: [ceph-users] Re: WG: Multisite sync issue

I see, then I misread your statement about multiple RGWs:

> It also worries me that replication won't work with multiple rgws in 
> one zone, but one of them being unavailable, for instance during 
> maintenance.

Is there anything else than the RGW logs pointing to any issues? I find it 
strange that after a restart of the RGW fixes it. Is this always reproducable?

Zitat von "Poß, Julian" :

> Hi Eugen,
>
> there is currently only one RGW installed for each region+realm.
> So the places to look at are already pretty much limited.
>
> As of now, the RGWs itself are the endpoints. So far no loadbalancer 
> has been put into place there.
>
> Best, Julian
>
> -Ursprüngliche Nachricht-
> Von: Eugen Block 
> Gesendet: Freitag, 25. Februar 2022 10:52
> An: ceph-users@ceph.io
> Betreff: [ceph-users] Re: WG: Multisite sync issue
>
> This email originated from outside of CGM. Please do not click links 
> or open attachments unless you know the sender and know the content is 
> safe.
>
>
> Hi,
>
> I would stop alle RGWs except one in each cluster to limit the places 
> and logs to look at. Do you have a loadbalancer as endpoint or do you 
> have a list of all RGWs as endpoints?
>
>
> Zitat von "Poß, Julian" :
>
>> Hi,
>>
>> i did setup multisite with 2 ceph clusters and multiple rgw's and 
>> realms/zonegroups.
>> This setup was installed using ceph ansible branch "stable-5.0", with
>> focal+octopus.
>> During some testing, i noticed that somehow the replication seems to 
>> not work as expected.
>>
>> With s3cmd, i put a small file of 1.9 kb into a bucket on the master 
>> zone s3cmd put /etc/hosts s3://test/
>>
>> Then i can see at the output of "radosgw-admin sync status 
>> --rgw_realm internal", that the cluster has indeed to sync something, 
>> and switching back to "nothing to sync" after a couple of seconds.
>> "radosgw-admin sync error list --rgw_realm internal" is emtpy, too.
>> However, if i look via s3cmd on the secondary zone, i can't see the 
>> file. Even if i look at the ceph pools directly, the data didn't get 
>> replicated.
>> If i proceed by uploading the file again, with the same command and 
>> without a change, basically just updating it, or by restarting rgw 
>> deamon of the secondary zone, the affected file gets replicated.
>>
>> I spotted this issue with all my realms/zonegroups. But even with 
>> "debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious 
>> errors in the logs.
>>
>> It also worries me that replication won't work with multiple rgws in 
>> one zone, but one of them being unavailable, for instance during 
>> maintenance.
>> I did somehow expect ceph to work it's way though the list of 
>> available endpoints, and only fail if none are available.
>> ...Or am I missing something here?
>>
>> Any help whatsoever is very much appreciated.
>> I am pretty new to multisite and stuck on this for a couple of days 
>> now already.
>>
>> Thanks, Julian
>>
>>
>> Here is some additional information, including some log snippets:
>>
>> # ON Master site, i can see the file in the bilog right away 
>> radosgw-admin bilog list --bucket test/test --rgw_realm internal
>> {
>> "op_id": "3#001.445.5",
>> "op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106",
>> "op": "write",
>> "object": "hosts",
>> "instance": "",
>> "state": "complete",
>> "index_ver": 1,
>> "timestamp": "2022-02-24T09:14:41.957638774Z",
>> "ver": {
>> "pool": 7,
>> "epoch": 2
>> },
>> "bilog_flags": 0,
>> "versioned": false,
>> "owner": "",
>> "owner_display_name": "",
>> "zones_trace": [
>> {
>> "entry":
>> "b9794e07-8f6c-4c45-a981-a73c3a4dc863:test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3"
>> }
>> ]
>> },
>>
>>
>> # RGW log of secondary zone shows the sync attempt:
>> 2022-02-24T09:14:52.502+ 7f1419ff3700  0 
>> RGW-SYNC:data:sync:shard[72]:entry[test/test:b9794e07-8f6c-4c45-a981-
>> a
>> 73c3a4dc863.8366.3:3]: triggering sync of source bucket/shard
>> test/test:b9794e07-8f6c-4c45-a981-a73c3a4dc863.8366.3:3
>>
>> # but the secondary zone, doesnt actually show the new file in the 
>> bilog radosgw-admin bilog list --bucket test/test --rgw_realm 
>> internal
>>
>> # and the shard log that according to the 

[ceph-users] quay.io image no longer existing, required for node add to repair cluster

2022-02-25 Thread Kai Börnert

Hi,

what would be the correct way to move forward?

I have a 3 node cephadm installed cluster, one node died, the other two 
are fine and work as expected, so no data loss, but a lot of 
remapped/degraded.


The dead node was replaced and I wanted to add it to the cluster using 
"ceph orch host add"


The current container_image seems to be global: 
quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac 
after some update a while back to 16.2.6. (I did not set the image to a 
digest, ceph upgrade did this apparently)


The new node cannot pull this image, as it no longer exists on quay.io.

I tried to copy the image via docker save & docker load, however the 
digest is not filled due to security reasons this way.


I kinda do not want to do an additional ceph upgrade until the health is 
back at ok.



Is there some other way to transfer the image to the new host?

Is it expected, that images on quay max dissapear at any time?

Is it possible to force ceph to use a tag instead of a digest? As I 
could fix it easily myself then?



Greetings,

Kai


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-25 Thread Igor Fedotov

Hi Sebastian,

I submitted a ticket https://tracker.ceph.com/issues/54409 which shows 
my analysis based on your previous log (from 21-02-2022). Which wasn't 
verbose enough at debug-bluestore level to make the final conclusion.


Unfortunately the last logs (from 24-02-2022) you shared don't include 
the point where actual corruption happened as the previois log did. 
These new logs miss the last successful OSD start which apparently 
corrupts DB data. Do you have any output prior to their content?


If not could you please reproduce that once again? Generally  I'd like 
to see OSD log for a broken startup along with a couple of restarts back 
- the event sequence for the failure seems to be as follows:


1) OSD is shutdown for the first time. It (for uncear reasons) keeps a 
set of deferred writes to be applied once again.


2) OSD is started up which triggers deferred writes submissions. They 
overlap (again for unclear reasons so far) with DB data content written 
shortly before. The OSD starts properly but DB data corruption has 
happened at this point


3) OSD is restarted again which reveals the data corruption and since 
that point OSD is unable to start.


So these last logs new logs include 3) only for now. While I need 1) & 
2) as well...



Thanks,

Igor


On 2/24/2022 3:04 AM, Sebastian Mazza wrote:

Hi Igor,

I let ceph rebuild the OSD.7. Then I added
```
[osd]
debug bluefs = 20
 debug bdev = 20
 debug bluestore = 20
```
to the ceph.conf of all 3 nodes and shut down all 3 nodes without writing 
anything to the pools on the HDDs (the Debian VM was not even running).
Immediately at the first boot OSD.5 and 6 crashed with the same “Bad table 
magic number” error. The OSDs 5 and 6 are on the same node, but not on the node 
of OSD 7, wich crashed the last two times.

Logs and corrupted rocks DB Files: https://we.tl/t-ZBXYp8r4Hq
I have saved the entire /var/log directory of every node and the result of
```
$ ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-5 --out-dir 
/tmp/osd.5-data
$ ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-6 --out-dir 
/tmp/osd.6-data
```
Let me know if you need something else.


I hop you can now track it down. I'm really looking forward to your 
interpretation of the logs.


Best Regards,
Sebastian



On 22.02.2022, at 11:44, Igor Fedotov  wrote:

Hi Sebastian,

On 2/22/2022 3:01 AM, Sebastian Mazza wrote:

Hey Igor!



thanks a lot for the new logs - looks like they provides some insight.

I'm glad the logs are helpful.



At this point I think the root cause is apparently a race between deferred 
writes replay and some DB maintenance task happening on OSD startup. It seems 
that deferred write replay updates a block extent which RocksDB/BlueFS are 
using. Hence the target BlueFS file gets all-zeros content. Evidently that's 
just a matter of chance whether they use conflicting physical extent or not 
hence the occasional nature of the issue...

Do I understand that correct: The corruption of the rocksDB (Table overwritten 
by zeros) happens at the first start of the OSD after  “*** Immediate shutdown 
(osd_fast_shutdown=true) ***”? Before the system launches the OSD Service the 
RocksDB is still fine?

Looks like that. From logs I can see an unexpected write to specific extent 
(LBA 0x63) which shouldn't occur and at which RocksDB subsequently fails.



So first of all I'm curious if you have any particular write patterns that can 
be culprits? E.g. something like disk wiping procedure which writes all-zeros 
to an object followed by object truncate or removal comes to my mind. If you 
can identify something like that - could you please collect OSD log for such an 
operation (followed by OSD restart) with debug-bluestore set to 20?

Best to my knowledge the OSD was hardly doing anything and I do not see any 
pattern that would fit to you explanation.
However, you certainly understand a lot more about it than I do, so I try to 
explain everything that could be relevant.

The Cluster has 3 Nodes. Each has a 240GB NVMe m.2 SSD as boot drive, which 
should not be relevant. Each node has 3 OSDs, one is on an U.2 NVMe SSD with 
2TB and the other two are on 12TB HDDs.

I have configured two crush rules ‘c3nvme’ and ‘ec4x2hdd’. The ‘c3nvme’ is a 
replicated rule that uses only OSDs with class ’nvme’. The second rule is a 
tricky erasure rule. It selects exactly 2 OSDs on exactly 4 Hosts with class 
‘hdd’. So it only works for a size of exactly 8. That means that a pool that 
uses this rule has always only “undersized” placement groups, since the cluster 
has only 3 nodes. (I did not add the fourth server after the first crash in 
December, since we want to reproduce the problem.)

The pools device_health_metrics, test-pool, fs.metadata-root-pool, 
fs.data-root-pool, fs.data-nvme.c-pool, and block-nvme.c-pool uses the crush 
rule ‘c3nvme’ with a size of 3 and a min size of 2. The pools 
fs.data-hdd.ec-pool, block-hdd

[ceph-users] Re: quay.io image no longer existing, required for node add to repair cluster

2022-02-25 Thread Adam King
For the last question, cephadm has a config option for whether or not it
tries to convert image tags to repo digest (ceph config set mgr
mgr/cephadm/use_repo_digest true/false). I'm not sure if setting it to
false helps if the tag has already been converted though.

In terms of getting the cluster in order,

In the case there are actually daemons on this replaced node, if this image
doesn't exist anymore you can deploy the individual daemons on the host via
"ceph orch daemon redeploy  " to whatever 16.2.6
image you want to use for now. They would still be on a slightly different
image than the other daemons but if they're the same minor version I
imagine it's okay. Once they've been redeployed with a functional image and
are up and running and the health warnings go away you can upgrade the
cluster to whichever image you were redeploying those daemons with and then
they should all end up in line. I do think you would need to add the host
back to the cluster first before you could redeploy the daemons in this
fashion though. Having the host back in the cluster, even if the daemons
are all down, shouldn't cause issues.

If the replaced node doesn't actually have any daemons yet, maybe setting
the global container image to an image that exists "ceph config set global
container_image " then adding the host I think should allow you
to place daemons on the host as normal. Again, once things are healthy, you
can use upgrade to make sure every daemon is on the same image.

- Adam King

On Fri, Feb 25, 2022 at 10:06 AM Kai Börnert  wrote:

> Hi,
>
> what would be the correct way to move forward?
>
> I have a 3 node cephadm installed cluster, one node died, the other two
> are fine and work as expected, so no data loss, but a lot of
> remapped/degraded.
>
> The dead node was replaced and I wanted to add it to the cluster using
> "ceph orch host add"
>
> The current container_image seems to be global:
>
> quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac
> after some update a while back to 16.2.6. (I did not set the image to a
> digest, ceph upgrade did this apparently)
>
> The new node cannot pull this image, as it no longer exists on quay.io.
>
> I tried to copy the image via docker save & docker load, however the
> digest is not filled due to security reasons this way.
>
> I kinda do not want to do an additional ceph upgrade until the health is
> back at ok.
>
>
> Is there some other way to transfer the image to the new host?
>
> Is it expected, that images on quay max dissapear at any time?
>
> Is it possible to force ceph to use a tag instead of a digest? As I
> could fix it easily myself then?
>
>
> Greetings,
>
> Kai
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quay.io image no longer existing, required for node add to repair cluster

2022-02-25 Thread Kai Börnert

Thank you very much :)

ceph config set global container_image  was the solution to 
get the new node to deploy fully,


and with ceph config set mgr mgr/cephadm/use_repo_digest true/false it 
will hopefully never repeat


now lets hope the recovery is without further trouble

Greetings,

Kai

On 2/25/22 16:43, Adam King wrote:
For the last question, cephadm has a config option for whether or not 
it tries to convert image tags to repo digest (ceph config set mgr 
mgr/cephadm/use_repo_digest true/false). I'm not sure if setting it to 
false helps if the tag has already been converted though.


In terms of getting the cluster in order,

In the case there are actually daemons on this replaced node, if this 
image doesn't exist anymore you can deploy the individual daemons on 
the host via "ceph orch daemon redeploy  " to 
whatever 16.2.6 image you want to use for now. They would still be on 
a slightly different image than the other daemons but if they're the 
same minor version I imagine it's okay. Once they've been redeployed 
with a functional image and are up and running and the health warnings 
go away you can upgrade the cluster to whichever image you were 
redeploying those daemons with and then they should all end up in 
line. I do think you would need to add the host back to the cluster 
first before you could redeploy the daemons in this fashion though. 
Having the host back in the cluster, even if the daemons are all down, 
shouldn't cause issues.


If the replaced node doesn't actually have any daemons yet, maybe 
setting the global container image to an image that exists "ceph 
config set global container_image " then adding the host I 
think should allow you to place daemons on the host as normal. Again, 
once things are healthy, you can use upgrade to make sure every daemon 
is on the same image.


- Adam King

On Fri, Feb 25, 2022 at 10:06 AM Kai Börnert  
wrote:


Hi,

what would be the correct way to move forward?

I have a 3 node cephadm installed cluster, one node died, the
other two
are fine and work as expected, so no data loss, but a lot of
remapped/degraded.

The dead node was replaced and I wanted to add it to the cluster
using
"ceph orch host add"

The current container_image seems to be global:

quay.io/ceph/ceph@sha256:5d042251e1faa1408663508099cf97b256364300365d403ca5563a518060abac



after some update a while back to 16.2.6. (I did not set the image
to a
digest, ceph upgrade did this apparently)

The new node cannot pull this image, as it no longer exists on
quay.io .

I tried to copy the image via docker save & docker load, however the
digest is not filled due to security reasons this way.

I kinda do not want to do an additional ceph upgrade until the
health is
back at ok.


Is there some other way to transfer the image to the new host?

Is it expected, that images on quay max dissapear at any time?

Is it possible to force ceph to use a tag instead of a digest? As I
could fix it easily myself then?


Greetings,

Kai


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quay.io image no longer existing, required for node add to repair cluster

2022-02-25 Thread Robert Sander

On 25.02.22 16:43, Adam King wrote:

ceph config set mgr mgr/cephadm/use_repo_digest false


Nice to know.

The other question is: Why is the digest changing for a released Ceph 
image with a specific version tag?


What changes are made to the container image that are not in the release 
notes?


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quay.io image no longer existing, required for node add to repair cluster

2022-02-25 Thread Adam King
I don't know for sure, but it's possibly a result of the centos 8 EOL stuff
from a few weeks ago (they removed seom repos and a lot of our build stuff
broke). I think we had to update some of our container images to deal with
that.

- Adam King

On Fri, Feb 25, 2022 at 10:55 AM Robert Sander 
wrote:

> On 25.02.22 16:43, Adam King wrote:
> > ceph config set mgr mgr/cephadm/use_repo_digest false
>
> Nice to know.
>
> The other question is: Why is the digest changing for a released Ceph
> image with a specific version tag?
>
> What changes are made to the container image that are not in the release
> notes?
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quay.io image no longer existing, required for node add to repair cluster

2022-02-25 Thread Robert Sander

On 25.02.22 17:24, Adam King wrote:

I don't know for sure, but it's possibly a result of the centos 8 EOL 
stuff from a few weeks ago (they removed seom repos and a lot of our 
build stuff broke). I think we had to update some of our container 
images to deal with that.


IMHO container image changes should also be trackable somehow in the 
version number (of the container image, not Ceph).


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Multisite sync issue

2022-02-25 Thread Mule Te (TWL007)
We have the same issue on Ceph 15.2.15.

In the testing cluster, seem like Ceph 16 solved this issue. The PR 
https://github.com/ceph/ceph/pull/41316 
 seem to remove this issue, but I do 
not know why it does not merge back to Ceph 15. 

Also here is a new issue in Ceph tracker describes the same issue you have: 
https://tracker.ceph.com/issues/53737 

Thanks

> On Feb 25, 2022, at 10:07 PM, Poß, Julian  wrote:
> 
> As far as i can tell, it can be reproduced every time, yes.
> 
> That statement was actually about two RGW in one zone. That is also something 
> that I tested.
> Because I felt like ceph should be able to handle that ha-like on its own.
> 
> But for the main issue, there is indeed only one rgw in each zone running.
> Well as far as I can tell, I see no issues others than what I posted in my 
> initial mail.
> 
> Best, Julian
> 
> -Ursprüngliche Nachricht-
> Von: Eugen Block  
> Gesendet: Freitag, 25. Februar 2022 12:57
> An: ceph-users@ceph.io
> Betreff: [ceph-users] Re: WG: Multisite sync issue
> 
> I see, then I misread your statement about multiple RGWs:
> 
>> It also worries me that replication won't work with multiple rgws in 
>> one zone, but one of them being unavailable, for instance during 
>> maintenance.
> 
> Is there anything else than the RGW logs pointing to any issues? I find it 
> strange that after a restart of the RGW fixes it. Is this always reproducable?
> 
> Zitat von "Poß, Julian" :
> 
>> Hi Eugen,
>> 
>> there is currently only one RGW installed for each region+realm.
>> So the places to look at are already pretty much limited.
>> 
>> As of now, the RGWs itself are the endpoints. So far no loadbalancer 
>> has been put into place there.
>> 
>> Best, Julian
>> 
>> -Ursprüngliche Nachricht-
>> Von: Eugen Block 
>> Gesendet: Freitag, 25. Februar 2022 10:52
>> An: ceph-users@ceph.io
>> Betreff: [ceph-users] Re: WG: Multisite sync issue
>> 
>> This email originated from outside of CGM. Please do not click links 
>> or open attachments unless you know the sender and know the content is 
>> safe.
>> 
>> 
>> Hi,
>> 
>> I would stop alle RGWs except one in each cluster to limit the places 
>> and logs to look at. Do you have a loadbalancer as endpoint or do you 
>> have a list of all RGWs as endpoints?
>> 
>> 
>> Zitat von "Poß, Julian" :
>> 
>>> Hi,
>>> 
>>> i did setup multisite with 2 ceph clusters and multiple rgw's and 
>>> realms/zonegroups.
>>> This setup was installed using ceph ansible branch "stable-5.0", with
>>> focal+octopus.
>>> During some testing, i noticed that somehow the replication seems to 
>>> not work as expected.
>>> 
>>> With s3cmd, i put a small file of 1.9 kb into a bucket on the master 
>>> zone s3cmd put /etc/hosts s3://test/
>>> 
>>> Then i can see at the output of "radosgw-admin sync status 
>>> --rgw_realm internal", that the cluster has indeed to sync something, 
>>> and switching back to "nothing to sync" after a couple of seconds.
>>> "radosgw-admin sync error list --rgw_realm internal" is emtpy, too.
>>> However, if i look via s3cmd on the secondary zone, i can't see the 
>>> file. Even if i look at the ceph pools directly, the data didn't get 
>>> replicated.
>>> If i proceed by uploading the file again, with the same command and 
>>> without a change, basically just updating it, or by restarting rgw 
>>> deamon of the secondary zone, the affected file gets replicated.
>>> 
>>> I spotted this issue with all my realms/zonegroups. But even with 
>>> "debug_rgw = 20" and debug_rgw_sync = "20" i can't spot any obvious 
>>> errors in the logs.
>>> 
>>> It also worries me that replication won't work with multiple rgws in 
>>> one zone, but one of them being unavailable, for instance during 
>>> maintenance.
>>> I did somehow expect ceph to work it's way though the list of 
>>> available endpoints, and only fail if none are available.
>>> ...Or am I missing something here?
>>> 
>>> Any help whatsoever is very much appreciated.
>>> I am pretty new to multisite and stuck on this for a couple of days 
>>> now already.
>>> 
>>> Thanks, Julian
>>> 
>>> 
>>> Here is some additional information, including some log snippets:
>>> 
>>> # ON Master site, i can see the file in the bilog right away 
>>> radosgw-admin bilog list --bucket test/test --rgw_realm internal
>>>{
>>>"op_id": "3#001.445.5",
>>>"op_tag": "b9794e07-8f6c-4c45-a981-a73c3a4dc863.8360.106",
>>>"op": "write",
>>>"object": "hosts",
>>>"instance": "",
>>>"state": "complete",
>>>"index_ver": 1,
>>>"timestamp": "2022-02-24T09:14:41.957638774Z",
>>>"ver": {
>>>"pool": 7,
>>>"epoch": 2
>>>},
>>>"bilog_flags": 0,
>>>"versioned": false,
>>>"owner": "",
>>>"owner_display_name": "",
>>>"zones_trace": [
>>>{
>>>"entry"

[ceph-users] Quincy release candidate v17.1.0 is available

2022-02-25 Thread Josh Durgin
This is the first release candidate for Quincy. The final release is slated
for the end of March.

This release has been through large-scale testing thanks to several
organizations, including Pawsey Supercomputing Centre, who allowed us to
harden cephadm and the ceph dashboard on their 4000-OSD cluster. Subsequent
logical scale testing went up to over 8000 OSDs in a cluster.

One major improvement is in quality of service - the mclock scheduler,
providing quality of service for ceph clients relative to background
operations, is now the default. For more information, see the docs [0].

There are many other improvements, with release notes under construction
[1][2].

Please try it out and report any issues you encounter!

Josh

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-17.1.0.tar.gz
* Containers at
quay.ceph.io/ceph-ci/ceph@sha256:eadcf0385e99e595a865bcb02845b42e11bb55a62165b1403f954d2f7c4e1e07
* For packages, see https://docs.ceph.com/docs/master/install/get-packages/

[0]
https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#mclock-profile-types
[1] https://github.com/ceph/ceph/blob/quincy/PendingReleaseNotes
[2] https://github.com/ceph/ceph/pull/45048
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io