[ceph-users] Re: Huge amounts of objects orphaned by lifecycle policy.

2024-06-27 Thread Casey Bodley
hi Adam,

On Thu, Jun 27, 2024 at 4:41 AM Adam Prycki  wrote:
>
> Hello,
>
> I have a question. Do people use rgw lifecycle policies in production?
> I had big hopes for this technology bug in practice it seems to be very
> unreliable.
>
> Recently I've been testing different pool layouts and using lifecycle
> policy to move data between them. Once I've checked orphaned objects
> I've discovered that my pools were full of orphaned objects. One pool
> was over 1/3 orphans by volume. Orphan object belonged to data that was
> moved by lifecycle.
>
> Yesterday I decided to recreate one of the pools with 3TiB of data. All
> 3TiB was located in a single directory of some buckets. I've created a
> lifecycle which should move it all to STANDARD pool and run
> radosgw-admin lc process --bucket. After lifecycle finished executing
> ceph pool still contained 1TiB of data. Removing objects from
> rgw-orphan-list output reduced pool size to 65GiB and 17k objects.
>
> The 17k rados __shadow objects seem to belong to s3 objects which were
> not moved by lifecycle. I tried lifecycle from radosgw-admin but
> lifecycle seems to be unable to move them. s3cmd info show that they
> still report old storage class. Filenames don't contain special
> characters other than spaces. I have directories with sequentially named
> objects, some of them cannot be moved by lifecycle.
>
> Deleting all the objects form original 3TiB dataset also doesn't help.
> After running gc and orphan finding tool there are still 1,2k rados
> objects which should have been deleted but are not considered orphans.

i assume you used `radosgw-admin gc process` here - can you confirm
whether you added the --include-all option? without that option,
garbage collection won't delete objects newer than
rgw_gc_obj_min_wait=2hours in case they're still being read. it sounds
like these rados objects may still be in the gc queue, which could
explain why they aren't considered orphans

>
> I've been testing on 18.2.2.
>
> Best regards
> Adam Prycki
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: squid 19.1.0 RC QE validation status

2024-07-01 Thread Casey Bodley
On Mon, Jul 1, 2024 at 10:23 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/66756#note-1
>
> Release Notes - TBD
> LRC upgrade - TBD
>
> (Reruns were not done yet.)
>
> Seeking approvals/reviews for:
>
> smoke
> rados - Radek, Laura
> rgw- Casey

rgw approved, thanks

one rgw/notifications job crashed due to
https://tracker.ceph.com/issues/65337. the fix was already backported
to squid, but merged after we forked the RC. i would not consider it a
blocker for this RC

> fs - Venky
> orch - Adam King
> rbd, krbd - Ilya
> quincy-x, reef-x - Laura, Neha
> powercycle - Brad
> perf-basic - Yaarit, Laura
> crimson-rados - Samuel
> ceph-volume - Guillaume
>
> Pls let me know if any tests were missed from this list.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.3 QE validation status

2024-07-03 Thread Casey Bodley
(cc Thomas Goirand)

in April, an 18.2.3 tarball was uploaded to
https://download.ceph.com/tarballs/ceph_18.2.3.orig.tar.gz. that's been
picked up and packaged by the Debian project under the assumption that it
was a supported release

when we do finally release 18.2.3, we will presumably overwrite that
existing tarball with the latest contents. Thomas has requested that we
re-number this release to 18.2.4 to prevent further confusion. Thomas,
could you explain the consequences of overwriting that tarball in place?

On Tue, Jul 2, 2024 at 10:06 AM Yuri Weinstein  wrote:

> After fixing the issues identified below we cherry-picked all PRs from
> this list for 18.2.3
> https://pad.ceph.com/p/release-cherry-pick-coordination.
>
> The question to the dev leads: do you think we can proceed with the
> release without rerunning suites, as they were already approved?
>
> Please reply with your recommendations.
>
>
> On Thu, Jun 6, 2024 at 4:59 PM Yuri Weinstein  wrote:
>
>> Please see the update from Laura below for the status of this release.
>>
>> Dev Leads help is appreciated to expedite fixes necessary to publish it
>> soon.
>>
>> "Hi all, we have hit another blocker with this release.
>>
>> Due to centos 8 stream going end of life, we only have the option of
>> releasing centos 9 stream containers.
>>
>> However, we did not test the efficacy of centos 9 stream containers
>> against the orch and upgrade suites during the initial 18.2.3 release cycle.
>>
>> This problem is tracked here: https://tracker.ceph.com/issues/66334
>>
>> What needs to happen now is:
>>
>> 1.  The orch team needs to fix all references to centos 8 stream in the
>> orch suite
>> 2.  fs, rados, etc. need to fix their relative jobs the same way in the
>> upgrade suite
>>
>> The easiest way to tackle that is to raise a PR against main and backport
>> to stable releases since this problem actually affects main and all other
>> releases.
>>
>> Then, we will:
>>
>> 1.  Rerun orch and upgrade with these fixes
>> 2.  Re-approve orch and upgrade
>> 3.  Re-upgrade gibba and LRC
>>
>> Then the release will be unblocked."
>>
>> On Tue, Jun 4, 2024 at 3:26 PM Laura Flores  wrote:
>>
>>> Rados results were approved, and we successfully upgraded the gibba
>>> cluster.
>>>
>>> Now waiting on @Dan Mick  to upgrade the LRC.
>>>
>>> On Thu, May 30, 2024 at 8:32 PM Yuri Weinstein 
>>> wrote:
>>>
 I reran rados on the fix
 https://github.com/ceph/ceph/pull/57794/commits
 and seeking approvals from Radek and Laure

 https://tracker.ceph.com/issues/65393#note-1

 On Tue, May 28, 2024 at 2:12 PM Yuri Weinstein 
 wrote:
 >
 > We have discovered some issues (#1 and #2) during the final stages of
 > testing that require considering a delay in this point release until
 > all options and risks are assessed and resolved.
 >
 > We will keep you all updated on the progress.
 >
 > Thank you for your patience!
 >
 > #1 https://tracker.ceph.com/issues/66260
 > #2 https://tracker.ceph.com/issues/61948#note-21
 >
 > On Wed, May 1, 2024 at 3:41 PM Yuri Weinstein 
 wrote:
 > >
 > > We've run into a problem during the last verification steps before
 > > publishing this release after upgrading the LRC to it  =>
 > > https://tracker.ceph.com/issues/65733
 > >
 > > After this issue is resolved, we will continue testing and
 publishing
 > > this point release.
 > >
 > > Thanks for your patience!
 > >
 > > On Thu, Apr 18, 2024 at 11:29 PM Christian Rohmann
 > >  wrote:
 > > >
 > > > On 18.04.24 8:13 PM, Laura Flores wrote:
 > > > > Thanks for bringing this to our attention. The leads have
 decided that
 > > > > since this PR hasn't been merged to main yet and isn't
 approved, it
 > > > > will not go in v18.2.3, but it will be prioritized for v18.2.4.
 > > > > I've already added the PR to the v18.2.4 milestone so it's sure
 to be
 > > > > picked up.
 > > >
 > > > Thanks a bunch. If you miss the train, you miss the train - fair
 enough.
 > > > Nice to know there is another one going soon and that bug is
 going to be
 > > > on it !
 > > >
 > > >
 > > > Regards
 > > >
 > > > Christian
 > > > ___
 > > > ceph-users mailing list -- ceph-users@ceph.io
 > > > To unsubscribe send an email to ceph-users-le...@ceph.io
 > > >
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io

>>>
>>>
>>> --
>>>
>>> Laura Flores
>>>
>>> She/Her/Hers
>>>
>>> Software Engineer, Ceph Storage 
>>>
>>> Chicago, IL
>>>
>>> lflo...@ibm.com | lflo...@redhat.com 
>>> M: +17087388804
>>>
>>>
>>> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to

[ceph-users] Re: reef 18.2.3 QE validation status

2024-07-09 Thread Casey Bodley
this was discussed in the ceph leadership team meeting yesterday, and
we've agreed to re-number this release to 18.2.4

On Wed, Jul 3, 2024 at 1:08 PM  wrote:
>
>
> On Jul 3, 2024 5:59 PM, Kaleb Keithley  wrote:
> >
> >
> >
>
> > Replacing the tar file is problematic too, if only because it's a potential 
> > source of confusion for people who aren't paying attention.
>
> It'd be really the worst thing to do.
>
> > I'm not sure I believe that making this next release 18.2.4 really solves 
> > anything
>
> It solves *my* problem that the old version of the file is already in the 
> Debian archive and cannot be replaced there. By all means, please find a 
> better solution for long term. In the mean time, do *not* re-release an 
> already released tarball.
>
> Cheers,
>
> Thomas Goirand (zigo)
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-09 Thread Casey Bodley
in general, these omap entries should be evenly spread over the
bucket's index shard objects. but there are two features that may
cause entries to clump on a single shard:

1. for versioned buckets, multiple versions of the same object name
map to the same index shard. this can become an issue if an
application is repeatedly overwriting an object without cleaning up
old versions. lifecycle rules can help to manage these noncurrent
versions

2. during a multipart upload, all of the parts are tracked on the same
index shard as the final object name. if applications are leaving a
lot of incomplete multipart uploads behind (especially if they target
the same object name) this can lead to similar clumping. the S3 api
has operations to list and abort incomplete multipart uploads, along
with lifecycle rules to automate their cleanup

separately, multisite clusters use these same index shards to store
replication logs. if sync gets far enough behind, these log entries
can also lead to large omap warnings

On Tue, Jul 9, 2024 at 10:25 AM Szabo, Istvan (Agoda)
 wrote:
>
> It's the same bucket:
> https://gist.github.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d
>
>
> 
> From: Eugen Block 
> Sent: Tuesday, July 9, 2024 8:03 PM
> To: Szabo, Istvan (Agoda) 
> Cc: ceph-users@ceph.io 
> Subject: Re: [ceph-users] Re: Large omap in index pool even if properly 
> sharded and not "OVER"
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
>
> Are those three different buckets? Could you share the stats for each of them?
>
> radosgw-admin bucket stats --bucket=
>
> Zitat von "Szabo, Istvan (Agoda)" :
>
> > Hello,
> >
> > Yeah, still:
> >
> > the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> > 290005
> >
> > and the
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726 | wc -l
> > 289378
> >
> > And just make me happy more I have one more
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.6 | wc -l
> > 181588
> >
> > This is my crush tree (I'm using host based crush rule)
> > https://gist.githubusercontent.com/Badb0yBadb0y/9bea911701184a51575619bc99cca94d/raw/e5e4a918d327769bb874aaed279a8428fd7150d5/gistfile1.txt
> >
> > I'm thinking could that be the issue that host 2s13-15 has less nvme
> > osd (however size wise same as in the other 12 host where have 8x
> > nvme osd) than the others?
> > But the pgs are located like this:
> >
> > pg26.427
> > osd.261 host8
> > osd.488 host13
> > osd.276 host4
> >
> > pg26.606
> > osd.443 host12
> > osd.197 host8
> > osd.524 host14
> >
> > pg26.78c
> > osd.89 host7
> > osd.406 host11
> > osd.254 host6
> >
> > If pg26.78c wouldn't be here I'd say 100% the nvme osd distribution
> > based on host is the issue, however this pg is not located on any of
> > the 4x nvme osd nodes 😕
> >
> > Ty
> >
> > 
> > From: Eugen Block 
> > Sent: Tuesday, July 9, 2024 6:02 PM
> > To: ceph-users@ceph.io 
> > Subject: [ceph-users] Re: Large omap in index pool even if properly
> > sharded and not "OVER"
> >
> > Email received from the internet. If in doubt, don't click any link
> > nor open any attachment !
> > 
> >
> > Hi,
> >
> > the number of shards looks fine, maybe this was just a temporary
> > burst? Did you check if the rados objects in the index pool still have
> > more than 200k omap objects? I would try someting like
> >
> > rados -p  listomapkeys
> > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> >
> >
> > Zitat von "Szabo, Istvan (Agoda)" :
> >
> >> Hi,
> >>
> >> I have a pretty big bucket which sharded with 1999 shard so in
> >> theory can hold close to 200m objects (199.900.000).
> >> Currently it has 54m objects.
> >>
> >> Bucket limit check looks also good:
> >>  "bucket": ""xyz,
> >>  "tenant": "",
> >>  "num_objects": 53619489,
> >>  "num_shards": 1999,
> >>  "objects_per_shard": 26823,
> >>  "fill_status": "OK"
> >>
> >> This is the bucket id:
> >> "id": "9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1"
> >>
> >> This is the log lines:
> >> 2024-06-27T10:41:05.679870+0700 osd.261 (osd.261) 9643 : cluster
> >> [WRN] Large omap object found. Object:
> >> 26:e433e65c:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151:head
> >>  PG: 26.3a67cc27 (26.427) Key count: 236919 Size
> >> (bytes):
> >> 89969920
> >>
> >> 2024-06-27T10:43:35.557835+0700 osd.89 (osd.89) 9000 : cluster [WRN]
> >> Large omap object found. Object:
> >> 26:31ff4df1:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.726:head
> >>  PG: 26.8fb2ff8c (26.78c) Key count: 236495 Size
> >> (bytes):
> >> 95560458
> >>
> >> Tried to deep scrub the affected pgs, tried to deep-scrub the
> >> mentioned osds in the log but didn't help.
> >> Why? What I'm missing?
> >>
> >> Thank you in advance for your help.
> >>
> >> 
> >> This message is confidential a

[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-10 Thread Casey Bodley
On Tue, Jul 9, 2024 at 12:41 PM Szabo, Istvan (Agoda)
 wrote:
>
> Hi Casey,
>
> 1.
> Regarding versioning, the user doesn't use verisoning it if I'm not mistaken:
> https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt
>
> 2.
> Regarding multiparts, if it would have multipart thrash, it would be listed 
> here:
> https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt
> as a rgw.multimeta under the usage, right?
>
> 3.
> Regarding the multisite idea, this bucket has been a multisite bucket last 
> year but we had to reshard (accepting to loose the replica on the 2nd site 
> and just keep it in the master site) and that time as expected it has 
> disappeared completely from the 2nd site (I guess the 40TB thrash still there 
> but can't really find it how to clean 🙁 ). Now it is a single site bucket.
> Also it is the index pool, multisite logs should go to the rgw.log pool 
> shouldn't it?

some replication logs are in the log pool, but the per-object logs are
stored in the bucket index objects. you can inspect these with
`radosgw-admin bilog list --bucket=X`. by default, that will only list
--max-entries=1000. you can add --shard-id=Y to look at specific
'large omap' objects

even if your single-site bucket doesn't exist on the secondary zone,
changes on the primary zone are probably still generating these bilog
entries. you would need to do something like `radosgw-admin bucket
sync disable --bucket=X` to make it stop. because you don't expect
these changes to replicate, it's safe to delete any of this bucket's
bilog entries with `radosgw-admin bilog trim --end-marker 9
--bucket=X`. depending on ceph version, you may need to run this trim
command in a loop until the `bilog list` output is empty

radosgw does eventually trim bilogs in the background after they're
processed, but the secondary zone isn't processing them in this case

>
> Thank you
>
>
> 
> From: Casey Bodley 
> Sent: Tuesday, July 9, 2024 10:39 PM
> To: Szabo, Istvan (Agoda) 
> Cc: Eugen Block ; ceph-users@ceph.io 
> Subject: Re: [ceph-users] Re: Large omap in index pool even if properly 
> sharded and not "OVER"
>
> Email received from the internet. If in doubt, don't click any link nor open 
> any attachment !
> 
>
> in general, these omap entries should be evenly spread over the
> bucket's index shard objects. but there are two features that may
> cause entries to clump on a single shard:
>
> 1. for versioned buckets, multiple versions of the same object name
> map to the same index shard. this can become an issue if an
> application is repeatedly overwriting an object without cleaning up
> old versions. lifecycle rules can help to manage these noncurrent
> versions
>
> 2. during a multipart upload, all of the parts are tracked on the same
> index shard as the final object name. if applications are leaving a
> lot of incomplete multipart uploads behind (especially if they target
> the same object name) this can lead to similar clumping. the S3 api
> has operations to list and abort incomplete multipart uploads, along
> with lifecycle rules to automate their cleanup
>
> separately, multisite clusters use these same index shards to store
> replication logs. if sync gets far enough behind, these log entries
> can also lead to large omap warnings
>
> On Tue, Jul 9, 2024 at 10:25 AM Szabo, Istvan (Agoda)
>  wrote:
> >
> > It's the same bucket:
> > https://gist.github.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d
> >
> >
> > 
> > From: Eugen Block 
> > Sent: Tuesday, July 9, 2024 8:03 PM
> > To: Szabo, Istvan (Agoda) 
> > Cc: ceph-users@ceph.io 
> > Subject: Re: [ceph-users] Re: Large omap in index pool even if properly 
> > sharded and not "OVER"
> >
> > Email received from the internet. If in doubt, don't click any link nor 
> > open any attachment !
> > 
> >
> > Are those three different buckets? Could you share the stats for each of 
> > them?
> >
> > radosgw-admin bucket stats --bucket=
> >
> > Zitat von "Szabo, Istvan (Agoda)" :
> >
> > > Hello,
> > >
> > > Yeah, still:
> > >
> > > the .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2479481907.1.151 | wc -l
> > > 290005
> > >
> > > and the
> > > .dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.2

[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-10 Thread Casey Bodley
On Wed, Jul 10, 2024 at 6:23 PM Richard Bade  wrote:
>
> Hi Casey,
> Thanks for that info on the bilog. I'm in a similar situation with
> large omap objects and we have also had to reshard buckets on
> multisite losing the index on the secondary.
> We also now have a lot of buckets with sync disable so I wanted to
> check that it's always safe to trim the bilog on buckets with sync
> disabled?

in general, the only risk of trimming bilogs is that they refer to
changes you still want to replicate

after `bucket sync disable`, it's fine to trim if it isn't happening
automatically. it's fine even if you want to `bucket sync enable`
later, because that restarts a 'full sync' and skips any bilog entries
from before

> I can see some stale entries with "completed" state and a timestamp of
> a number of months ago but also some that say pending and have no
> timestamp.
>
> Istvan, I can also possibly help with your orphaned 40TB on the secondary 
> zone.
> Each object has the bucket marker in its name. If you do a `rados -p
> {pool_name} ls` and find all the ones that start with the bucket
> marker (found with `radosgw-admin bucket stats
> --bucket={bucket_name}`) then you can do one of two things:
> 1, `rados rm` the object
> 2, restore the index with info from the object itself
> - create a dummy index template (use `radosgw-admin bi get` on a
> known good index to get the structure)
> - grab the etag from the object xattribs and use this and the name
> in the template (`rados -p {pool} getxattr {objname} user.rgw.etag`)
> - use ` radosgw-admin bi put` to create the index
> - use `radosgw-admin bucket check --check-objects --fix
> --bucket={bucket_name}` to fix up the bucket object count and object
> sizes at the end
>
> This process takes quite some time and I can't say if it's 100%
> perfect but it enabled us to get to a state where we could delete the
> buckets and clean up the objects.
> I hope this helps.
>
> Regards,
> Richard
>
> On Thu, 11 Jul 2024 at 01:25, Casey Bodley  wrote:
> >
> > On Tue, Jul 9, 2024 at 12:41 PM Szabo, Istvan (Agoda)
> >  wrote:
> > >
> > > Hi Casey,
> > >
> > > 1.
> > > Regarding versioning, the user doesn't use verisoning it if I'm not 
> > > mistaken:
> > > https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt
> > >
> > > 2.
> > > Regarding multiparts, if it would have multipart thrash, it would be 
> > > listed here:
> > > https://gist.githubusercontent.com/Badb0yBadb0y/d80c1bdb8609088970413969826d2b7d/raw/baee46865178fff454c224040525b55b54e27218/gistfile1.txt
> > > as a rgw.multimeta under the usage, right?
> > >
> > > 3.
> > > Regarding the multisite idea, this bucket has been a multisite bucket 
> > > last year but we had to reshard (accepting to loose the replica on the 
> > > 2nd site and just keep it in the master site) and that time as expected 
> > > it has disappeared completely from the 2nd site (I guess the 40TB thrash 
> > > still there but can't really find it how to clean 🙁 ). Now it is a single 
> > > site bucket.
> > > Also it is the index pool, multisite logs should go to the rgw.log pool 
> > > shouldn't it?
> >
> > some replication logs are in the log pool, but the per-object logs are
> > stored in the bucket index objects. you can inspect these with
> > `radosgw-admin bilog list --bucket=X`. by default, that will only list
> > --max-entries=1000. you can add --shard-id=Y to look at specific
> > 'large omap' objects
> >
> > even if your single-site bucket doesn't exist on the secondary zone,
> > changes on the primary zone are probably still generating these bilog
> > entries. you would need to do something like `radosgw-admin bucket
> > sync disable --bucket=X` to make it stop. because you don't expect
> > these changes to replicate, it's safe to delete any of this bucket's
> > bilog entries with `radosgw-admin bilog trim --end-marker 9
> > --bucket=X`. depending on ceph version, you may need to run this trim
> > command in a loop until the `bilog list` output is empty
> >
> > radosgw does eventually trim bilogs in the background after they're
> > processed, but the secondary zone isn't processing them in this case
> >
> > >
> > > Thank you
> > >
> > >
> > > 
> > > From: Casey Bodley 
> > > Sent: Tuesday

[ceph-users] Re: v19.1.0 Squid RC0 released

2024-07-19 Thread Casey Bodley
On Fri, Jul 19, 2024 at 9:04 AM Stefan Kooman  wrote:
>
> Hi,
>
> On 12-07-2024 00:27, Yuri Weinstein wrote:
>
> ...
>
> > * For packages, see https://docs.ceph.com/en/latest/install/get-packages/
>
> I see that only packages have been build for Ubuntu 22.04 LTS. Will
> there also be packages built for 24.04 LTS (the current LTS)?

our lab doesn't have any builders running ubuntu 24.04 yet,
unfortunately. you can track progress in
https://tracker.ceph.com/issues/66914. it's unlikely to happen for the
initial squid release, but i would very much like to see 24.04 support
added in a later point release when it's ready and tested

>
> Thanks,
>
> Gr. Stefan
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Query] Safe to discard bucket lock objects in reshard pool?

2021-01-19 Thread Casey Bodley
On Tue, Jan 19, 2021 at 10:57 AM Prasad Krishnan
 wrote:
>
> Dear Ceph users,
>
> We have a slightly dated version of Luminous cluster in which dynamic
> bucket resharding was accidentally enabled due to a misconfig (we don't use
> this feature since the number of objects per bucket is capped).
>
> This resulted in creation of the RGW reshard pool with lots of bucket
> reshard lock objects (we have thousands of buckets) which is leading to
> clutter. Also, we've run into a malloc failure issue (similar to
> https://tracker.ceph.com/issues/21826 but not the same since we already use
> tcmalloc) on the OSDs in which these reshard lock objects are located and
> we'd like to reduce the objects that have to be copied out.
>
> My question to the community is: "Is it safe to discard the bucket reshard
> lock objects if we know that we'll never use the reshard feature on the
> cluster again?".

yes, they can be safely deleted. those locks just prevent multiple
radosgws from trying to reshard at the same time. any such locks would
have expired long ago, so removing them should have no observable
effect

>
> The RGWs performed resharding several months ago due to a misconfiguration
> and we already have stale bucket instances which are due for cleanup on
> this cluster.
>
> Thanks,
> Prasad Krishnan
>
> --
>
>
> *-*
>
> *This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error, please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named addressee,
> you should not disseminate, distribute or copy this email. Please notify
> the sender immediately by email if you have received this email by mistake
> and delete this email from your system. If you are not the intended
> recipient, you are notified that disclosing, copying, distributing or
> taking any action in reliance on the contents of this information is
> strictly prohibited.*
>
>  
>
> *Any views or opinions presented in this
> email are solely those of the author and do not necessarily represent those
> of the organization. Any information on shares, debentures or similar
> instruments, recommended product pricing, valuations and the like are for
> information purposes only. It is not meant to be an instruction or
> recommendation, as the case may be, to buy or to sell securities, products,
> services nor an offer to buy or sell securities, products or services
> unless specifically stated to be so on behalf of the Flipkart group.
> Employees of the Flipkart group of companies are expressly required not to
> make defamatory statements and not to infringe or authorise any
> infringement of copyright or any other legal right by email communications.
> Any such communication is contrary to organizational policy and outside the
> scope of the employment of the individual concerned. The organization will
> not accept any liability in respect of such communication, and the employee
> responsible will be personally liable for any damages or other liability
> arising.*
>
>  
>
> *Our organization accepts no liability for the
> content of this email, or for the consequences of any actions taken on the
> basis of the information *provided,* unless that information is
> subsequently confirmed in writing. If you are not the intended recipient,
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.*
>
> _-_
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Storage-class split objects

2021-02-10 Thread Casey Bodley
On Wed, Feb 10, 2021 at 8:31 AM Marcelo  wrote:
>
> Hello all!
>
> We have a cluster where there are HDDs for data and NVMEs for journals and
> indexes. We recently added pure SSD hosts, and created a storage class SSD.
> To do this, we create a default.rgw.hot.data pool, associate a crush rule
> using SSD and create a HOT storage class in the placement-target. The
> problem is when we send an object to use a HOT storage class, it is in both
> the STANDARD storage class pool and the HOT pool.
>
> STANDARD pool:
> # rados -p default.rgw.buckets.data ls
> d86dade5-d401-427b-870a-0670ec3ecb65.385198.4_LICENSE
>
> # rados -p default.rgw.buckets.data stat
> d86dade5-d401-427b-870a-0670ec3ecb65.385198.4_LICENSE
> default.rgw.buckets.data/d86dade5-d401-427b-870a-0670ec3ecb65.385198.4_LICENSE
> mtime 2021-02-09 14: 54: 14.00, size 0
>
>
> HOT pool:
> # rados -p default.rgw.hot.data ls
> d86dade5-d401-427b-870a-0670ec3ecb65.385198.4__shadow_.rmpla1NTgArcUQdSLpW4qEgTDlbhn9f_0
>
>
> # rados -p default.rgw.hot.data stat
> d86dade5-d401-427b-870a-0670ec3ecb65.385198.4__shadow_.rmpla1NTgArcUQdSLpW4qEgTDlbhn9f_0
> default.rgw.hot.data/d86dade5-d401-427b-870a-0670ec3ecb65.385198.4__shadow_.rmpla1NTgArcUQdSLpW4qEgTDlbhn9f_0
> mtime 2021-02-09 14: 54: 14.00, size 15220
>
> The object itself is in the HOT pool, however it creates this other object
> similar to an index in the STANDARD pool. Monitoring with iostat we noticed
> that this behavior generates an unnecessary IO on disks that do not need to
> be touched.
>
> Why this behavior? Are there any ways around it?

this object in the STANDARD pool is called the 'head object', and it
holds the s3 object's metadata - including an attribute that says
which storage class the object's data is in

when an S3 client downloads the object with a 'GET /bucket/LICENSE'
request, it doesn't specify the storage class. so radosgw has to find
its head object in a known location (the bucket's default storage
class pool) in order to figure out which pool holds the object's data

>
> Thanks, Marcelo
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Storage-class split objects

2021-02-11 Thread Casey Bodley
On Thu, Feb 11, 2021 at 9:31 AM Marcelo  wrote:
>
> Hi Casey, thank you for the reply.
>
> I was wondering, just as the placement target is in the bucket metadata in
> the index, if it would not be possible to insert the storage-class
> information in the metadata of the object that is in the index as well. Or
> did I get it wrong and there is absolutely no type of object metadata in
> the index, just a listing of the objects?

the bucket index is for bucket listing, so each entry in the index
stores enough metadata (mtime, etag, size, etc) to satisfy the
s3/swift bucket listing APIs. this does include the storage class for
each object

but GetObject requests don't read from the bucket index, they just
look for a 'head object' with the object's name

for objects in the default storage class, we also store the first
chunk (4M) of data in the head object - so a GetObject request can
satisfy small object reads in a single round trip

for objects in non-default storage classes, we need one level of
indirection to locate the data. we *could* potentially go through the
bucket index for this, but the index itself is optional (see indexless
buckets) and has a looser consistency model than the head object,
which we can write atomically when an upload finishes

>
> Thanks again, Marcelo.
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> Livre
> de vírus. www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>.
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> Em qua., 10 de fev. de 2021 às 11:43, Casey Bodley 
> escreveu:
>
> > On Wed, Feb 10, 2021 at 8:31 AM Marcelo  wrote:
> > >
> > > Hello all!
> > >
> > > We have a cluster where there are HDDs for data and NVMEs for journals
> > and
> > > indexes. We recently added pure SSD hosts, and created a storage class
> > SSD.
> > > To do this, we create a default.rgw.hot.data pool, associate a crush rule
> > > using SSD and create a HOT storage class in the placement-target. The
> > > problem is when we send an object to use a HOT storage class, it is in
> > both
> > > the STANDARD storage class pool and the HOT pool.
> > >
> > > STANDARD pool:
> > > # rados -p default.rgw.buckets.data ls
> > > d86dade5-d401-427b-870a-0670ec3ecb65.385198.4_LICENSE
> > >
> > > # rados -p default.rgw.buckets.data stat
> > > d86dade5-d401-427b-870a-0670ec3ecb65.385198.4_LICENSE
> > >
> > default.rgw.buckets.data/d86dade5-d401-427b-870a-0670ec3ecb65.385198.4_LICENSE
> > > mtime 2021-02-09 14: 54: 14.00, size 0
> > >
> > >
> > > HOT pool:
> > > # rados -p default.rgw.hot.data ls
> > >
> > d86dade5-d401-427b-870a-0670ec3ecb65.385198.4__shadow_.rmpla1NTgArcUQdSLpW4qEgTDlbhn9f_0
> > >
> > >
> > > # rados -p default.rgw.hot.data stat
> > >
> > d86dade5-d401-427b-870a-0670ec3ecb65.385198.4__shadow_.rmpla1NTgArcUQdSLpW4qEgTDlbhn9f_0
> > >
> > default.rgw.hot.data/d86dade5-d401-427b-870a-0670ec3ecb65.385198.4__shadow_.rmpla1NTgArcUQdSLpW4qEgTDlbhn9f_0
> > > mtime 2021-02-09 14: 54: 14.00, size 15220
> > >
> > > The object itself is in the HOT pool, however it creates this other
> > object
> > > similar to an index in the STANDARD pool. Monitoring with iostat we
> > noticed
> > > that this behavior generates an unnecessary IO on disks that do not need
> > to
> > > be touched.
> > >
> > > Why this behavior? Are there any ways around it?
> >
> > this object in the STANDARD pool is called the 'head object', and it
> > holds the s3 object's metadata - including an attribute that says
> > which storage class the object's data is in
> >
> > when an S3 client downloads the object with a 'GET /bucket/LICENSE'
> > request, it doesn't specify the storage class. so radosgw has to find
> > its head object in a known location (the bucket's default storage
> > class pool) in order to figure out which pool holds the object's data
> >
> > >
> > > Thanks, Marcelo
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rados gateway static website

2021-03-30 Thread Casey Bodley
this error 2039 is ERR_NO_SUCH_WEBSITE_CONFIGURATION. if you want to
access a bucket via rgw_dns_s3website_name, you have to set a website
configuration on the bucket - see
https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html

On Tue, Mar 30, 2021 at 10:05 AM Marcel Kuiper  wrote:
>
>
> despite the examples that can be found on the internet I have troubles
> setting up a static website that serves from a S3 bucket If anyone could
> point me in the right direction that would be much appreciated
>
> Marcel
>
> I created an index.html in the bucket sky
>
> gm-rc3-jumphost01@ceph/s3cmd (master)$ ./s3cmd info s3://sky/index.html
> s3://sky/index.html (object):
> File size: 42046
> Last mod:  Tue, 30 Mar 2021 13:28:02 GMT
> MIME type: text/html
> Storage:   STANDARD
> MD5 sum:   93acaccebb23a18da33ec4294d99ea1a
> SSE:   none
> Policy:none
> CORS:  none
> ACL:   *anon*: READ
> ACL:   Generic Sky Account: FULL_CONTROL
>
> And curl returns
>
> gm-rc3-jumphost01@tmp/skills$ curl
> https://sky.static.gm.core.local/index.html
> 
>   404 Not Found
>   
>404 Not Found
>
> Code: NoSuchWebsiteConfiguration
> BucketName: sky
> RequestId: tx000ba-00606327b8-cca124-rc3-gm
> HostId: cca124-rc3-gm-rc3
>
> COnfig of de rados instance
>
> [client.radosgw.rc3-gm]
> debug_rgw = 20
> ms_debug = 1
> rgw_zonegroup = rc3
> rgw_zone = rc3-gm
> rgw_enable_static_website = true
> rgw_enable_apis = s3website
> rgw expose bucket = true
> rgw_dns_name = gm-rc3-radosgw.gm.core.local
> rgw_dns_s3website_name = static.gm.core.local
> rgw_resolve_cname = true
> host = gm-rc3-s3web01
> keyring = /etc/ceph/ceph.client.radosgw.rc3-gm.keyring
> log_file = /var/log/ceph/radosgw.log
> user = ceph
> rgw_frontends = civetweb port=443s
> ssl_certificate=/etc/ceph/ssl/key_cert_ca.pem
>
> DNS (from pdnsutil list-zone)
> *.static.gm.core.local  3600IN  CNAME   gm-rc3-s3web01.gm.core.local
>
> The logs shows
>
> 2021-03-30 15:32:53.725 7ff760fcd700  2
> RGWDataChangesLog::ChangesRenewThread: start
> 2021-03-30 15:32:58.409 7ff746798700 20 HTTP_ACCEPT=*/*
> 2021-03-30 15:32:58.409 7ff746798700 20
> HTTP_HOST=sky.static.gm.core.local
> 2021-03-30 15:32:58.409 7ff746798700 20 HTTP_USER_AGENT=curl/7.58.0
> 2021-03-30 15:32:58.409 7ff746798700 20 HTTP_VERSION=1.1
> 2021-03-30 15:32:58.409 7ff746798700 20 REMOTE_ADDR=10.128.160.47
> 2021-03-30 15:32:58.409 7ff746798700 20 REQUEST_METHOD=GET
> 2021-03-30 15:32:58.409 7ff746798700 20 REQUEST_URI=/index.html
> 2021-03-30 15:32:58.409 7ff746798700 20 SCRIPT_URI=/index.html
> 2021-03-30 15:32:58.409 7ff746798700 20 SERVER_PORT=443
> 2021-03-30 15:32:58.409 7ff746798700 20 SERVER_PORT_SECURE=443
> 2021-03-30 15:32:58.409 7ff746798700  1 == starting new request
> req=0x7ff746791740 =
> 2021-03-30 15:32:58.409 7ff746798700  2 req 196 0.000s initializing for
> trans_id = tx000c4-006063288a-cca124-rc3-gm
> 2021-03-30 15:32:58.409 7ff746798700 10 rgw api priority: s3=-1
> s3website=1
> 2021-03-30 15:32:58.409 7ff746798700 10 host=sky.static.gm.core.local
> 2021-03-30 15:32:58.409 7ff746798700 20 subdomain=sky
> domain=static.gm.core.local in_hosted_domain=1
> in_hosted_domain_s3website=1
> 2021-03-30 15:32:58.409 7ff746798700 20 final domain/bucket
> subdomain=sky domain=static.gm.core.local in_hosted_domain=1
> in_hosted_domain_s3website=1 s->info.domain=static.gm.core.local
> s->info.request_uri=/sky/index.html
> 2021-03-30 15:32:58.409 7ff746798700 20 get_handler
> handler=29RGWHandler_REST_Obj_S3Website
> 2021-03-30 15:32:58.409 7ff746798700 10
> handler=29RGWHandler_REST_Obj_S3Website
> 2021-03-30 15:32:58.409 7ff746798700  2 req 196 0.000s getting op 0
> 2021-03-30 15:32:58.409 7ff746798700 10
> op=28RGWGetObj_ObjStore_S3Website
> 2021-03-30 15:32:58.409 7ff746798700  2 req 196 0.000s s3:get_obj
> verifying requester
> 2021-03-30 15:32:58.409 7ff746798700 20 req 196 0.000s s3:get_obj
> rgw::auth::StrategyRegistry::s3_main_strategy_t: trying
> rgw::auth::s3::AWSAuthStrategy
> 2021-03-30 15:32:58.409 7ff746798700 20 req 196 0.000s s3:get_obj
> rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine
> 2021-03-30 15:32:58.409 7ff746798700 20 req 196 0.000s s3:get_obj
> rgw::auth::s3::S3AnonymousEngine granted access
> 2021-03-30 15:32:58.409 7ff746798700 20 req 196 0.000s s3:get_obj
> rgw::auth::s3::AWSAuthStrategy granted access
> 2021-03-30 15:32:58.409 7ff746798700  2 req 196 0.000s s3:get_obj
> normalizing buckets and tenants
> 2021-03-30 15:32:58.409 7ff746798700 10 s->object=index.html
> s->bucket=sky
> 2021-03-30 15:32:58.409 7ff746798700  2 req 196 0.000s s3:get_obj init
> permissions
> 2021-03-30 15:32:58.409 7ff746798700 15 decode_policy Read
> AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/";>skyGeneric
> Sky Account xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
> xsi:type="CanonicalUser">skyGeneric Sky
> AccountFULL_

[ceph-users] Re: RGW failed to start after upgrade to pacific

2021-04-06 Thread Casey Bodley
thanks for the details. this is a regression from changes to the
datalog storage for multisite - this -5 error is coming from the new
'fifo' backend. as a workaround, you can set the new
'rgw_data_log_backing' config variable back to 'omap'

Adam has fixes already merged to the pacific branch; be aware that the
first pacific point release will change the name of
'rgw_data_log_backing' to 'rgw_default_data_log_backing' and default
back to 'fifo'

On Tue, Apr 6, 2021 at 2:37 AM Martin Verges  wrote:
>
> Hello,
>
> we see same problems. Deleting all the pools and redeploy rgw solved it on
> that test cluster, however that is no solution for production ;)
>
> systemd[1]: Started Ceph rados gateway.
> radosgw[7171]: 2021-04-04T14:37:51.508+ 7fc6641efc00  0 deferred set
> uid:gid to 167:167 (ceph:ceph)
> radosgw[7171]: failed to chown /dev/null: (30) Read-only file system
> radosgw[7171]: 2021-04-04T14:37:51.508+ 7fc6641efc00  0 ceph version
> 16.2.0-31-g5922b2b9c1 (5922b2b9c17f0877f84b0b3f2557ab72a628cbfe) pacific
> (stable), process radosgw, pid 7171
> radosgw[7171]: 2021-04-04T14:37:51.508+ 7fc6641efc00  0 framework:
> beast
> radosgw[7171]: 2021-04-04T14:37:51.508+ 7fc6641efc00  0 framework conf
> key: ssl_port, val: 443
> radosgw[7171]: 2021-04-04T14:37:51.508+ 7fc6641efc00  0 framework conf
> key: port, val: 80
> radosgw[7171]: 2021-04-04T14:37:51.508+ 7fc6641efc00  0 framework conf
> key: ssl_certificate, val: /etc/ceph/rgwcert.pem
> radosgw[7171]: 2021-04-04T14:37:51.508+ 7fc6641efc00  1 radosgw_Main
> not setting numa affinity
> radosgw[7171]: 2021-04-04T14:37:51.680+ 7fc6641efc00 -1 static int
> rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx,
> std::__cxx11::string, std::unique_ptr*,
> optional_yield, std::optional,
> std::optional >, bool, uint64_t, uint64_t):925
> create_meta failed: r=-5
> radosgw[7171]: 2021-04-04T14:37:51.680+ 7fc6641efc00 -1 static int
> rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx,
> std::__cxx11::string, std::unique_ptr*,
> optional_yield, std::optional,
> std::optional >, bool, uint64_t, uint64_t):925
> create_meta failed: r=-5
> radosgw[7171]: 2021-04-04T14:37:51.680+ 7fc6641efc00 -1 int
> RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*,
> librados::v14_2_0::Rados*): Error when starting backend: Input/output error
> radosgw[7171]: 2021-04-04T14:37:51.680+ 7fc6641efc00  0 ERROR: failed
> to start datalog_rados service ((5) Input/output error
> radosgw[7171]: 2021-04-04T14:37:51.680+ 7fc6641efc00 -1 int
> RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*,
> librados::v14_2_0::Rados*): Error when starting backend: Input/output error
> radosgw[7171]: 2021-04-04T14:37:51.680+ 7fc6641efc00  0 ERROR: failed
> to init services (ret=(5) Input/output error)
> radosgw[7171]: 2021-04-04T14:37:51.700+ 7fc6641efc00 -1 Couldn't init
> storage provider (RADOS)
> radosgw[7171]: 2021-04-04T14:37:51.700+ 7fc6641efc00 -1 Couldn't init
> storage provider (RADOS)
> systemd[1]: ceph-rado...@rgw.new-croit-host-C0DE01.service: Main process
> exited, code=exited, status=5/NOTINSTALLED
> systemd[1]: ceph-rado...@rgw.new-croit-host-C0DE01.service: Unit entered
> failed state.
> systemd[1]: ceph-rado...@rgw.new-croit-host-C0DE01.service: Failed with
> result 'exit-code'.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> On Mon, 5 Apr 2021 at 19:59, Robert Sander 
> wrote:
>
> > Hi,
> >
> > Am 04.04.21 um 15:22 schrieb 胡 玮文:
> >
> > > bash[9823]: debug 2021-04-04T13:01:04.995+ 7ff80f172440 -1 static
> > int rgw::cls::fifo::FIFO::create(librados::v14_2_0::IoCtx,
> > std::__cxx11::string, std::unique_ptr*,
> > optional_yield, std::optional,
> > std::optional >, bool, uint64_t, uint64_t):925
> > create_meta failed: r=-5
> > > bash[9823]: debug 2021-04-04T13:01:04.995+ 7ff80f172440 -1 int
> > RGWDataChangesLog::start(const RGWZone*, const RGWZoneParams&, RGWSI_Cls*,
> > librados::v14_2_0::Rados*): Error when starting backend: Input/output error
> > > bash[9823]: debug 2021-04-04T13:01:04.995+ 7ff80f172440  0 ERROR:
> > failed to start datalog_rados service ((5) Input/output error
> > > bash[9823]: debug 2021-04-04T13:01:04.995+ 7ff80f172440  0 ERROR:
> > failed to init services (ret=(5) Input/output error)
> >
> > I see the same issues on an upgraded clust

[ceph-users] Re: Revisit Large OMAP Objects

2021-04-14 Thread Casey Bodley
On Wed, Apr 14, 2021 at 11:44 AM  wrote:
>
> Konstantin;
>
> Dynamic resharding is disabled in multisite environments.
>
> I believe you mean radosgw-admin reshard stale-instances rm.
>
> Documentation suggests this shouldn't be run in a multisite environment.  
> Does anyone know the reason for this?

say there's a bucket with 10 objects in it, and that's been fully
replicated to a secondary zone. if you want to remove the bucket, you
delete its objects then delete the bucket

when the bucket is deleted, rgw can't delete its bucket instance yet
because the secondary zone may not be caught up with sync - it
requires access to the bucket instance (and its index) to sync those
last 10 object deletions

so the risk with 'stales-instances rm' in multisite is that you might
delete instances before other zones catch up, which can lead to
orphaned objects

>
> Is it, in fact, safe, even in a multisite environment?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> -Original Message-
> From: Konstantin Shalygin [mailto:k0...@k0ste.ru]
> Sent: Wednesday, April 14, 2021 12:15 AM
> To: Dominic Hilsbos
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Revisit Large OMAP Objects
>
> Run reshard instances rm
> And reshard your bucket by hand or leave dynamic resharding process to do 
> this work
>
>
> k
>
> Sent from my iPhone
>
> > On 13 Apr 2021, at 19:33, dhils...@performair.com wrote:
> >
> > All;
> >
> > We run 2 Nautilus clusters, with RADOSGW replication (14.2.11 --> 14.2.16).
> >
> > Initially our bucket grew very quickly, as I was loading old data into it 
> > and we quickly ran into Large OMAP Object warnings.
> >
> > I have since done a couple manual reshards, which has fixed the warning on 
> > the primary cluster.  I have never been able to get rid of the issue on the 
> > cluster with the replica.
> >
> > I prior conversation on this list led me to this command:
> > radosgw-admin reshard stale-instances list --yes-i-really-mean-it
> >
> > The results of which look like this:
> > [
> >"nextcloud-ra:f91aeff8-a365-47b4-a1c8-928cd66134e8.185262.1",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.6",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.2",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.5",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.4",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.3",
> >"nextcloud:f91aeff8-a365-47b4-a1c8-928cd66134e8.53761.1",
> >"3520ae821f974340afd018110c1065b8/OS 
> > Development:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.1",
> >
> > "10dfdfadb7374ea1ba37bee1435d87ad/volumebackups:f91aeff8-a365-47b4-a1c8-928cd66134e8.4298264.2",
> >"WorkOrder:f91aeff8-a365-47b4-a1c8-928cd66134e8.44130.1"
> > ]
> >
> > I find this particularly interesting, as nextcloud-ra, /OS 
> > Development, /volumbackups, and WorkOrder buckets no longer exist.
> >
> > When I run:
> > for obj in $(rados -p 300.rgw.buckets.index ls | grep 
> > f91aeff8-a365-47b4-a1c8-928cd66134e8.3512190.1);   do   printf "%-60s 
> > %7d\n" $obj $(rados -p 300.rgw.buckets.index listomapkeys $obj | wc -l);   
> > done
> >
> > I get the expected 64 entries, with counts around 2 +/- 1000.
> >
> > Are the above listed stale instances ok to delete?  If so, how do I go 
> > about doing so?
> >
> > Thank you,
> >
> > Dominic L. Hilsbos, MBA
> > Director - Information Technology
> > Perform Air International Inc.
> > dhils...@performair.com
> > www.PerformAir.com
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Configuring an S3 gateway

2021-04-22 Thread Casey Bodley
On Thu, Apr 22, 2021 at 2:26 PM Fabrice Bacchella
 wrote:
>
> I'm trying to configure an S3 gateway with pacific and can't wrap my mind 
> around.
>
> In the configuration file, my configuration is:
>
> [client.radosgw.fa41]
>   rgw_data = /data/ceph/data/radosgw/$cluster.$id
>   log_file = /data/ceph/logs/$cluster-radosgw.$id.log
>   rgw_frontends = "beast ssl_endpoint=0.0.0.0:443 
> ssl_certificate=/data/ceph/conf/ceph.crt 
> ssl_private_key=/data/ceph/conf/ceph.key"
>
> and radosgw ignore it:
>
> # /usr/bin/radosgw -d --cluster ngceph --name client.fa41 --setuser ceph 
> --setgroup ceph

the --name should match the ceph.conf section. does '--name
client.radosgw.fa41' work?

> 2021-04-22T20:19:44.362+0200 7fdf51416480  0 deferred set uid:gid to 167:167 
> (ceph:ceph)
> 2021-04-22T20:19:44.363+0200 7fdf51416480  0 ceph version 16.2.1 
> (afb9061ab4117f798c858c741efa6390e48ccf10) pacific (stable), process radosgw, 
> pid 9780
> 2021-04-22T20:19:44.363+0200 7fdf51416480  0 framework: beast
> 2021-04-22T20:19:44.363+0200 7fdf51416480  0 framework conf key: port, val: 
> 7480
> 2021-04-22T20:19:44.363+0200 7fdf51416480  1 radosgw_Main not setting numa 
> affinity
> 2021-04-22T20:19:45.585+0200 7fdf51416480  0 framework: beast
> 2021-04-22T20:19:45.586+0200 7fdf51416480  0 framework conf key: 
> ssl_certificate, val: config://rgw/cert/$realm/$zone.crt
> 2021-04-22T20:19:45.586+0200 7fdf51416480  0 framework conf key: 
> ssl_private_key, val: config://rgw/cert/$realm/$zone.key
> 2021-04-22T20:19:45.586+0200 7fdf51416480  0 starting handler: beast
> 2021-04-22T20:19:45.592+0200 7fdf51416480  0 WARNING: cannot open socket for 
> endpoint=[::]:7480, Address family not supported by protocol
> 2021-04-22T20:19:45.627+0200 7fdf51416480  0 set uid:gid to 167:167 
> (ceph:ceph)
> 2021-04-22T20:19:45.811+0200 7fdf51416480  1 mgrc service_daemon_register 
> rgw.134130 metadata {arch=x86_64,ceph_release=pacific,ceph_version=ceph 
> version 16.2.1 (afb9061ab4117f798c858c741efa6390e48ccf10) pacific 
> (stable),ceph_version_short=16.2.1,cpu=Intel Core Processor (Haswell, no 
> TSX),distro=centos,distro_description=CentOS Linux 
> 8,distro_version=8,frontend_config#0=beast 
> port=7480,frontend_type#0=beast,hostname=fa41,id=fa41,kernel_description=#1 
> SMP Thu Apr 8 19:01:30 UTC 
> 2021,kernel_version=4.18.0-240.22.1.el8_3.x86_64,mem_swap_kb=0,mem_total_kb=16211232,num_handles=1,os=Linux,pid=9780,zone_id=2dc75a54-8c59-42bc-98a8-35542fdc4e52,zone_name=default,zonegroup_id=d11b8d14-7608-4b1d-a548-09b5dd813a7a,zonegroup_name=default}
>
> I don't get what I'm missing. Is there any typo in the configuration that I'm 
> missing ?
>
> I've verified using strace, and it reads the expected configuration file.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [v15.2.11] radosgw / RGW crash at start, Segmentation Fault

2021-05-07 Thread Casey Bodley
this is https://tracker.ceph.com/issues/50218, a radosgw build issue
specific to ubuntu bionic that affected all of our releases. the build
issue has been resolved, so the next point releases should resolve the
crashes

On Fri, May 7, 2021 at 10:51 AM Gilles Mocellin
 wrote:
>
> Hello,
>
> Since I upgrade to Ceph Octopus v15.2.11, on Ubuntu 18.04,
> Radosgw crash straight at start.
>
> On Two clusters, one Lab, and some test on a production cluster, shows
> the same crash for radosgw.
>
> As I don't find any similar bug in the Tracker, neither in this mailing
> list... Am I alone ?
>
> The logs are :
>
> May 07 14:04:59 fidcl-mrs4-sto-sds-07 systemd[1]: Started Ceph rados
> gateway.
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]: *** Caught signal
> (Segmentation fault) **
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  in thread
> 7ff556655140 thread_name:radosgw
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  ceph version
> 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  1: (()+0x3f040)
> [0x7ff554e19040]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  2:
> (std::locale::operator=(std::locale const&)+0x28) [0x55aadc0ade88]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  3:
> (std::ios_base::imbue(std::locale const&)+0x2e) [0x55aadc14ec3e]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  4:
> (std::basic_ios >::imbue(std::locale
> const&)+0x44) [0x55aadc0f5c54]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  5:
> (std::basic_ostream >&
> boost::asio::ip::operator<<  May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  6: (()+0x40ce0a)
> [0x7ff555847e0a]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  7:
> (radosgw_Main(int, char const**)+0x3430) [0x7ff5559c8430]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  8:
> (__libc_start_main()+0xe7) [0x7ff554dfbbf7]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  9:
> (_start()+0x2a) [0x55aadc0a836a]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:
> 2021-05-07T14:05:00.309+0200 7ff556655140 -1 *** Caught signal
> (Segmentation fault) **
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  in thread
> 7ff556655140 thread_name:radosgw
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  ceph version
> 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  1: (()+0x3f040)
> [0x7ff554e19040]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  2:
> (std::locale::operator=(std::locale const&)+0x28) [0x55aadc0ade88]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  3:
> (std::ios_base::imbue(std::locale const&)+0x2e) [0x55aadc14ec3e]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  4:
> (std::basic_ios >::imbue(std::locale
> const&)+0x44) [0x55aadc0f5c54]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  5:
> (std::basic_ostream >&
> boost::asio::ip::operator<<  May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  6: (()+0x40ce0a)
> [0x7ff555847e0a]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  7:
> (radosgw_Main(int, char const**)+0x3430) [0x7ff5559c8430]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  8:
> (__libc_start_main()+0xe7) [0x7ff554dfbbf7]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  9:
> (_start()+0x2a) [0x55aadc0a836a]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  NOTE: a copy of
> the executable, or `objdump -rdS ` is needed to interpret
> this.
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  0>
> 2021-05-07T14:05:00.309+0200 7ff556655140 -1 *** Caught signal
> (Segmentation fault) **
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  in thread
> 7ff556655140 thread_name:radosgw
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  ceph version
> 15.2.11 (e3523634d9c2227df9af89a4eac33d16738c49cb) octopus (stable)
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  1: (()+0x3f040)
> [0x7ff554e19040]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  2:
> (std::locale::operator=(std::locale const&)+0x28) [0x55aadc0ade88]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  3:
> (std::ios_base::imbue(std::locale const&)+0x2e) [0x55aadc14ec3e]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  4:
> (std::basic_ios >::imbue(std::locale
> const&)+0x44) [0x55aadc0f5c54]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  5:
> (std::basic_ostream >&
> boost::asio::ip::operator<<  May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  6: (()+0x40ce0a)
> [0x7ff555847e0a]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  7:
> (radosgw_Main(int, char const**)+0x3430) [0x7ff5559c8430]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  8:
> (__libc_start_main()+0xe7) [0x7ff554dfbbf7]
> May 07 14:05:00 fidcl-mrs4-sto-sds-07 radosgw[155303]:  9:
> (_start()+

[ceph-users] Re: Do people still use LevelDBStore?

2021-10-13 Thread Casey Bodley
+1 from a dev's perspective. we don't test leveldb, and we don't
expect it to perform as well as rocksdb in ceph, so i don't see any
value in keeping it

the rados team put a ton of effort into converting existing clusters
to rocksdb, so i would be very surprised if removing leveldb left any
users stuck without an upgrade path

On Wed, Oct 13, 2021 at 2:13 PM Ken Dreyer  wrote:
>
> I think it's a great idea to remove it.
>
> - Ken
>
> On Wed, Oct 13, 2021 at 12:52 PM Adam C. Emerson  wrote:
> >
> > Good day,
> >
> > Some time ago, the LevelDB maintainers turned -fno-rtti on in their
> > build. As we don't use -fno-rtti, building LevelDBStore
> > against newer LevelDB packages can fail.
> >
> > This has made me wonder, are there still people who use LevelDBStore
> > and rely on it, or can we deprecate and/or remove it?
> >
> > ___
> > Dev mailing list -- d...@ceph.io
> > To unsubscribe send an email to dev-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Casey Bodley
hey Tim,

your changes to rgw_admin_entry probably aren't taking effect on the
running radosgws. you'd need to restart them in order to set up the
new route

there also seems to be some confusion about the need for a bucket
named 'default'. radosgw just routes requests with paths starting with
'/{rgw_admin_entry}' to a separate set of admin-related rest apis.
otherwise they fall back to the s3 api, which treats '/foo' as a
request for bucket foo - that's why you see NoSuchBucket errors when
it's misconfigured

also note that, because of how these apis are nested,
rgw_admin_entry='default' would prevent users from creating and
operating on a bucket named 'default'

On Tue, Oct 17, 2023 at 7:03 AM Tim Holloway  wrote:
>
> Thank you, Ondřej!
>
> Yes, I set the admin entry set to "default". It's just the latest
> result of failed attempts ("admin" didn't work for me either). I did
> say there were some horrors in there!
>
> If I got your sample URL pattern right, the results of a GET on
> "http://x.y.z/default"; return 404, NoSuchBucket. If that means that I
> didn't properly set rgw_enable_apis, then I probably don't know how to
> set it right.
>
>Best Regards,
>   Tim
>
> On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> > Hello Tim,
> >
> > I was also struggling with this when I was configuring the object
> > gateway for the first time.
> >
> > There is a few things that you should check to make sure the
> > dashboard would work.
> >
> > 1. You need to have the admin api enabled on all rgws with the
> > rgw_enable_apis option. (As far as I know you are not able to force
> > the dashboard to use one rgw instance)
> > 2. It seems that you have the rgw_admin_entry set to a non default
> > value - the default is admin but it seems that you have “default" (by
> > the name of the bucket) make sure that you have this also set on all
> > rgws.
> >
> > You can confirm that both of these settings are set properly by
> > sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry}
> > “default" in your case -> it should return 405 Method Not Supported
> >
> > Btw there is actually no bucket that you would be able to see in the
> > administration. It’s just abstraction on the rgw.
> >
> > Reagards,
> >
> > Ondrej
> >
> > > On 16. 10. 2023, at 22:00, Tim Holloway  wrote:
> > >
> > > First, an abject apology for the horrors I'm about to unveil. I
> > > made a
> > > cold migration from GlusterFS to Ceph a few months back, so it was
> > > a
> > > learn-/screwup/-as-you-go affair.
> > >
> > > For reasons of presumed compatibility with some of my older
> > > servers, I
> > > started with Ceph Octopus. Unfortunately, Octopus seems to have
> > > been a
> > > nexus of transitions from older Ceph organization and management to
> > > a
> > > newer (cephadm) system combined with a relocation of many ceph
> > > resources and compounded by stale bits of documentation (notably
> > > some
> > > references to SysV procedures and an obsolete installer that
> > > doesn't
> > > even come with Octopus).
> > >
> > > A far bigger problem was a known issue where actions would be
> > > scheduled
> > > but never executed if the system was even slightly dirty. And of
> > > course, since my system was hopelessly dirty, that was a major
> > > issue.
> > > Finally I took a risk and bumped up to Pacific, where that issue no
> > > longer exists. I won't say that I'm 100% clean even now, but at
> > > least
> > > the remaining crud is in areas where it cannot do any harm.
> > > Presumably.
> > >
> > > Given that, the only bar now remaining to total joy has been my
> > > inability to connect via the Ceph Dashboard to the Object Gateway.
> > >
> > > This seems to be an oft-reported problem, but generally referenced
> > > relative to higher-level administrative interfaces like Kubernetes
> > > and
> > > rook. I'm interfacing more directly, however. Regardless, the error
> > > reported is notably familiar:
> > >
> > > [quote]
> > > The Object Gateway Service is not configured
> > > Error connecting to Object Gateway: RGW REST API failed request
> > > with
> > > status code 404
> > > (b'{"Code":"NoSuchBucket","Message":"","BucketName":"default","Requ
> > > estI
> > > d":"tx00' b'000dd0c65b8bda685b4-00652d8e0f-5e3a9b-
> > > default","HostId":"5e3a9b-default-defa' b'ult"}')
> > > Please consult the documentation on how to configure and enable the
> > > Object Gateway management functionality.
> > > [/quote]
> > >
> > > In point of fact, what this REALLY means in my case is that the
> > > bucket
> > > that is supposed to contain the necessary information for the
> > > dashboard
> > > and rgw to communicate has not been created. Presumably that
> > > SHOULDhave
> > > been done by the "ceph dashboard set-rgw-credentials" command, but
> > > apparently isn't, because the default zone has no buckets at all,
> > > much
> > > less one named "default".
> > >
> > > By way of reference, the dashboard is definitely trying to interact
> > > with the rgw cont

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Casey Bodley
On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey

rgw approved, thanks!

> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve
>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Casey Bodley
you're right that many docs still mention ceph.conf, after the mimic
release added a centralized config database to ceph-mon. you can read
about the mon-based 'ceph config' commands in
https://docs.ceph.com/en/reef/rados/configuration/ceph-conf/#commands

to modify rgw_admin_entry for all radosgw instances, you'd use a command like:

$ ceph config set client.rgw rgw_admin_entry admin

then restart radosgws because they only read that value on startup

On Tue, Oct 17, 2023 at 9:54 AM Tim Holloway  wrote:
>
> Thanks, Casey!
>
> I'm not really certain where to set this option. While Ceph is very
> well-behaved once you know what to do, the nature of Internet-based
> documentation (and occasionally incompletely-updated manuals) is that
> stale information is often given equal weight to the obsolete
> information. It's a problem I had as support for JavaServer Faces, in
> fact. I spent literally years correcting people who'd got their
> examples from obsoleted sources.
>
> If I was to concoct a "Really, Really Newbies Intro to Ceph" I think
> that the two most fundamental items explained would be "Ceph as
> traditional services" versus "Ceph as Containerized services" (As far
> as I can tell, both are still viable but containerization - at least
> for me - is a preferable approach). And the ceph.conf file versus
> storing operational parameters within Ceph entities (e.g. buckets or
> pseudo-buckets like RGW is doing). While lots of stuff still reference
> ceph.conf for configuration, I'm feeling like it's actually no longer
> authoritative for some options, may be an alternative source for others
> (with which source has priority being unclear) and stuff that Ceph no
> longer even looks at because it has moved on.
>
> Such is my plight.
>
> I have no problem with making the administrative interface look
> "bucket-like". Or for that matter, having the RGW report it as a
> (missing) bucket if it isn't configured. But knowing where to inject
> the magic that activates that interface eludes me and whether to do it
> directly on the RGW container hos (and how) or on my master host is
> totally unclear to me. It doesn't help that this is an item that has
> multiple values, not just on/off or that by default the docs seem to
> imply it should be already preset to standard values out of the box.
>
>Thanks,
>   Tim
>
> On Tue, 2023-10-17 at 09:11 -0400, Casey Bodley wrote:
> > hey Tim,
> >
> > your changes to rgw_admin_entry probably aren't taking effect on the
> > running radosgws. you'd need to restart them in order to set up the
> > new route
> >
> > there also seems to be some confusion about the need for a bucket
> > named 'default'. radosgw just routes requests with paths starting
> > with
> > '/{rgw_admin_entry}' to a separate set of admin-related rest apis.
> > otherwise they fall back to the s3 api, which treats '/foo' as a
> > request for bucket foo - that's why you see NoSuchBucket errors when
> > it's misconfigured
> >
> > also note that, because of how these apis are nested,
> > rgw_admin_entry='default' would prevent users from creating and
> > operating on a bucket named 'default'
> >
> > On Tue, Oct 17, 2023 at 7:03 AM Tim Holloway 
> > wrote:
> > >
> > > Thank you, Ondřej!
> > >
> > > Yes, I set the admin entry set to "default". It's just the latest
> > > result of failed attempts ("admin" didn't work for me either). I
> > > did
> > > say there were some horrors in there!
> > >
> > > If I got your sample URL pattern right, the results of a GET on
> > > "http://x.y.z/default"; return 404, NoSuchBucket. If that means that
> > > I
> > > didn't properly set rgw_enable_apis, then I probably don't know how
> > > to
> > > set it right.
> > >
> > >Best Regards,
> > >   Tim
> > >
> > > On Tue, 2023-10-17 at 08:35 +0200, Ondřej Kukla wrote:
> > > > Hello Tim,
> > > >
> > > > I was also struggling with this when I was configuring the object
> > > > gateway for the first time.
> > > >
> > > > There is a few things that you should check to make sure the
> > > > dashboard would work.
> > > >
> > > > 1. You need to have the admin api enabled on all rgws with the
> > > > rgw_enable_apis option. (As far as I know you are not able to
> > > > force
> > > >

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-18 Thread Casey Bodley
On Mon, Oct 16, 2023 at 2:52 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63219#note-2
> Release Notes - TBD
>
> Issue https://tracker.ceph.com/issues/63192 appears to be failing several 
> runs.
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known issue IIRC, Casey pls confirm/approve

sorry, missed this part

these point-to-point upgrade tests are failing because they're running
s3-tests against older quincy releases that don't have fixes for the
bugs they're testing. we don't maintain separate tests for each point
release, so we can't expect these upgrade tests to pass in general

specifically:
test_post_object_wrong_bucket is failing because it requires the
17.2.7 fix from https://github.com/ceph/ceph/pull/53757
test_set_bucket_tagging is failing because it requires the 17.2.7 fix
from https://github.com/ceph/ceph/pull/50103

so the rgw failures are expected, but i can't tell whether they're
masking other important upgrade test coverage

>
> client-upgrade-quincy-reef - Laura
>
> powercycle - Brad pls confirm
>
> ceph-volume - Guillaume pls take a look
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades -- N/A for quincy now after reef release.
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Modify user op status=-125

2023-10-24 Thread Casey Bodley
errno 125 is ECANCELED, which is the code we use when we detect a
racing write. so it sounds like something else is modifying that user
at the same time. does it eventually succeed if you retry?

On Tue, Oct 24, 2023 at 9:21 AM mahnoosh shahidi
 wrote:
>
> Hi all,
>
> I couldn't understand what does the status -125 mean from the docs. I'm
> getting 500 response status code when I call rgw admin APIs and the only
> log in the rgw log files is as follows.
>
> s3:get_obj recalculating target
> initializing for trans_id =
> tx0aa90f570fb8281cf-006537bf9e-84395fa-default
> s3:get_obj reading permissions
> getting op 1
> s3:put_obj verifying requester
> s3:put_obj normalizing buckets and tenants
> s3:put_obj init permissions
> s3:put_obj recalculating target
> s3:put_obj reading permissions
> s3:put_obj init op
> s3:put_obj verifying op mask
> s3:put_obj verifying op permissions
> s3:put_obj verifying op params
> s3:put_obj pre-executing
> s3:put_obj executing
> :modify_user completing
> WARNING: set_req_state_err err_no=125 resorting to 500
> :modify_user op status=-125
> :modify_user http status=500
> == req done req=0x7f3f85a78620 op status=-125 http_status=500
> latency=0.076000459s ==
>
> Can anyone explain what this error means and why it's happening?
>
> Best Regards,
> Mahnoosh
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Modify user op status=-125

2023-10-24 Thread Casey Bodley
i don't suppose you're using sts roles with AssumeRole?
https://tracker.ceph.com/issues/59495 tracks a bug where each
AssumeRole request was writing to the user metadata unnecessarily,
which would race with your admin api requests

On Tue, Oct 24, 2023 at 9:56 AM mahnoosh shahidi
 wrote:
>
> Thanks Casey for your explanation,
>
> Yes it succeeded eventually. Sometimes after about 100 retries. It's odd that 
> it stays in racing condition for that much time.
>
> Best Regards,
> Mahnoosh
>
> On Tue, Oct 24, 2023 at 5:17 PM Casey Bodley  wrote:
>>
>> errno 125 is ECANCELED, which is the code we use when we detect a
>> racing write. so it sounds like something else is modifying that user
>> at the same time. does it eventually succeed if you retry?
>>
>> On Tue, Oct 24, 2023 at 9:21 AM mahnoosh shahidi
>>  wrote:
>> >
>> > Hi all,
>> >
>> > I couldn't understand what does the status -125 mean from the docs. I'm
>> > getting 500 response status code when I call rgw admin APIs and the only
>> > log in the rgw log files is as follows.
>> >
>> > s3:get_obj recalculating target
>> > initializing for trans_id =
>> > tx0aa90f570fb8281cf-006537bf9e-84395fa-default
>> > s3:get_obj reading permissions
>> > getting op 1
>> > s3:put_obj verifying requester
>> > s3:put_obj normalizing buckets and tenants
>> > s3:put_obj init permissions
>> > s3:put_obj recalculating target
>> > s3:put_obj reading permissions
>> > s3:put_obj init op
>> > s3:put_obj verifying op mask
>> > s3:put_obj verifying op permissions
>> > s3:put_obj verifying op params
>> > s3:put_obj pre-executing
>> > s3:put_obj executing
>> > :modify_user completing
>> > WARNING: set_req_state_err err_no=125 resorting to 500
>> > :modify_user op status=-125
>> > :modify_user http status=500
>> > == req done req=0x7f3f85a78620 op status=-125 http_status=500
>> > latency=0.076000459s ==
>> >
>> > Can anyone explain what this error means and why it's happening?
>> >
>> > Best Regards,
>> > Mahnoosh
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-10-25 Thread Casey Bodley
if you have an administrative user (created with --admin), you should
be able to use its credentials with awscli to delete or overwrite this
bucket policy

On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham  
wrote:
>
> I have a bucket which got injected with bucket policy which locks the
> bucket even to the bucket owner. The bucket now cannot be accessed (even
> get its info or delete bucket policy does not work) I have looked in the
> radosgw-admin command for a way to delete a bucket policy but do not see
> anything. I presume I will need to somehow remove the bucket policy from
> however it is stored in the bucket metadata / omap etc. If anyone can point
> me in the right direction on that I would appreciate it. Thanks
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-10-25 Thread Casey Bodley
On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham  
wrote:
>
> Thank you, I am not sure (inherited cluster). I presume such an admin user 
> created after-the-fact would work?

yes

> Is there a good way to discover an admin user other than iterate over all 
> users and retrieve user information? (I presume radosgw-admin user info 
> --uid=" would illustrate such administrative access?

not sure there's an easy way to search existing users, but you could
create a temporary admin user for this repair

>
> Respectfully,
>
> Wes Dillingham
> w...@wesdillingham.com
> LinkedIn
>
>
> On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley  wrote:
>>
>> if you have an administrative user (created with --admin), you should
>> be able to use its credentials with awscli to delete or overwrite this
>> bucket policy
>>
>> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham  
>> wrote:
>> >
>> > I have a bucket which got injected with bucket policy which locks the
>> > bucket even to the bucket owner. The bucket now cannot be accessed (even
>> > get its info or delete bucket policy does not work) I have looked in the
>> > radosgw-admin command for a way to delete a bucket policy but do not see
>> > anything. I presume I will need to somehow remove the bucket policy from
>> > however it is stored in the bucket metadata / omap etc. If anyone can point
>> > me in the right direction on that I would appreciate it. Thanks
>> >
>> > Respectfully,
>> >
>> > *Wes Dillingham*
>> > w...@wesdillingham.com
>> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW access logs with bucket name

2023-10-30 Thread Casey Bodley
another option is to enable the rgw ops log, which includes the bucket
name for each request

the http access log line that's visible at log level 1 follows a known
apache format that users can scrape, so i've resisted adding extra
s3-specific stuff like bucket/object names there. there was some
recent discussion around this in
https://github.com/ceph/ceph/pull/50350, which had originally extended
that access log line

On Mon, Oct 30, 2023 at 6:03 AM Boris Behrens  wrote:
>
> Hi Dan,
>
> we are currently moving all the logging into lua scripts, so it is not an
> issue anymore for us.
>
> Thanks
>
> ps: the ceph analyzer is really cool. plusplus
>
> Am Sa., 28. Okt. 2023 um 22:03 Uhr schrieb Dan van der Ster <
> dan.vanders...@clyso.com>:
>
> > Hi Boris,
> >
> > I found that you need to use debug_rgw=10 to see the bucket name :-/
> >
> > e.g.
> > 2023-10-28T19:55:42.288+ 7f34dde06700 10 req 3268931155513085118
> > 0.0s s->object=... s->bucket=xyz-bucket-123
> >
> > Did you find a more convenient way in the meantime? I think we should
> > log bucket name at level 1.
> >
> > Cheers, Dan
> >
> > --
> > Dan van der Ster
> > CTO
> >
> > Clyso GmbH
> > p: +49 89 215252722 | a: Vancouver, Canada
> > w: https://clyso.com | e: dan.vanders...@clyso.com
> >
> > Try our Ceph Analyzer: https://analyzer.clyso.com
> >
> > On Thu, Mar 30, 2023 at 4:15 AM Boris Behrens  wrote:
> > >
> > > Sadly not.
> > > I only see the the path/query of a request, but not the hostname.
> > > So when a bucket is accessed via hostname (
> > https://bucket.TLD/object?query)
> > > I only see the object and the query (GET /object?query).
> > > When a bucket is accessed bia path (https://TLD/bucket/object?query) I
> > can
> > > see also the bucket in the log (GET bucket/object?query)
> > >
> > > Am Do., 30. März 2023 um 12:58 Uhr schrieb Szabo, Istvan (Agoda) <
> > > istvan.sz...@agoda.com>:
> > >
> > > > It has the full url begins with the bucket name in the beast logs http
> > > > requests, hasn’t it?
> > > >
> > > > Istvan Szabo
> > > > Staff Infrastructure Engineer
> > > > ---
> > > > Agoda Services Co., Ltd.
> > > > e: istvan.sz...@agoda.com
> > > > ---
> > > >
> > > > On 2023. Mar 30., at 17:44, Boris Behrens  wrote:
> > > >
> > > > Email received from the internet. If in doubt, don't click any link
> > nor
> > > > open any attachment !
> > > > 
> > > >
> > > > Bringing up that topic again:
> > > > is it possible to log the bucket name in the rgw client logs?
> > > >
> > > > currently I am only to know the bucket name when someone access the
> > bucket
> > > > via https://TLD/bucket/object instead of https://bucket.TLD/object.
> > > >
> > > > Am Di., 3. Jan. 2023 um 10:25 Uhr schrieb Boris Behrens  > >:
> > > >
> > > > Hi,
> > > >
> > > > I am looking forward to move our logs from
> > > >
> > > > /var/log/ceph/ceph-client...log to our logaggregator.
> > > >
> > > >
> > > > Is there a way to have the bucket name in the log file?
> > > >
> > > >
> > > > Or can I write the rgw_enable_ops_log into a file? Maybe I could work
> > with
> > > >
> > > > this.
> > > >
> > > >
> > > > Cheers and happy new year
> > > >
> > > > Boris
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend
> > im
> > > > groüen Saal.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > > >
> > > > --
> > > > This message is confidential and is for the sole use of the intended
> > > > recipient(s). It may also be privileged or otherwise protected by
> > copyright
> > > > or other legal rules. If you have received it by mistake please let us
> > know
> > > > by reply email and delete it from your system. It is prohibited to copy
> > > > this message or disclose its content to anyone. Any confidentiality or
> > > > privilege is not waived or lost by any mistaken delivery or
> > unauthorized
> > > > disclosure of the message. All messages sent to and from Agoda may be
> > > > monitored to ensure compliance with company policies, to protect the
> > > > company's interests and to remove potential malware. Electronic
> > messages
> > > > may be intercepted, amended, lost or deleted, or contain viruses.
> > > >
> > >
> > >
> > > --
> > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> > > groüen Saal.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe sen

[ceph-users] Ceph Leadership Team Meeting: 2023-11-1 Minutes

2023-11-01 Thread Casey Bodley
quincy 17.2.7: released!
* major 'dashboard v3' changes causing issues?
https://github.com/ceph/ceph/pull/54250 did not merge for 17.2.7
* planning a retrospective to discuss what kind of changes should go
in minor releases when members of the dashboard team are present

reef 18.2.1:
* most PRs already tested/merged
* possibly start validation next week?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Casey Bodley
On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/63443#note-1
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey

rgw results are approved. https://github.com/ceph/ceph/pull/54371
merged to reef but is needed on reef-release

> fs - Venky
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/quincy-x (reef) - Laura PTL
> powercycle - Brad
> perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> TIA
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-07 Thread Casey Bodley
On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
 wrote:
>
> Hello Wesley and Casey,
>
> We've ended up with the same issue and here it appears that even the user 
> with "--admin" isn't able to do anything. We're now unable to figure out if 
> it is due to bucket policies, ACLs or IAM of some sort. I'm seeing these IAM 
> errors in the logs
>
> ```
>
> Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851 
> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to 
> Handler error.
>
> Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due 
> to Handler error.

it's failing to parse the bucket policy document, but the error
message doesn't say what's wrong with it

disabling rgw_policy_reject_invalid_principals might help if it's
failing on the Principal

> Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
> 0.0s s3:list_bucket init_permissions on 
> :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, ret=-13
> Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req 13293029267332025583 
> 0.0s op->ERRORHANDLER: err_no=-13 new_err_no=-13
>
> ```
>
> Please help what's wrong here. We're in Ceph v17.2.7.
>
> Regards,
> Jayanth
>
> On Thu, Oct 26, 2023 at 7:14 PM Wesley Dillingham  
> wrote:
>>
>> Thank you, this has worked to remove the policy.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Wed, Oct 25, 2023 at 5:10 PM Casey Bodley  wrote:
>>
>> > On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham 
>> > wrote:
>> > >
>> > > Thank you, I am not sure (inherited cluster). I presume such an admin
>> > user created after-the-fact would work?
>> >
>> > yes
>> >
>> > > Is there a good way to discover an admin user other than iterate over
>> > all users and retrieve user information? (I presume radosgw-admin user info
>> > --uid=" would illustrate such administrative access?
>> >
>> > not sure there's an easy way to search existing users, but you could
>> > create a temporary admin user for this repair
>> >
>> > >
>> > > Respectfully,
>> > >
>> > > Wes Dillingham
>> > > w...@wesdillingham.com
>> > > LinkedIn
>> > >
>> > >
>> > > On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley  wrote:
>> > >>
>> > >> if you have an administrative user (created with --admin), you should
>> > >> be able to use its credentials with awscli to delete or overwrite this
>> > >> bucket policy
>> > >>
>> > >> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham <
>> > w...@wesdillingham.com> wrote:
>> > >> >
>> > >> > I have a bucket which got injected with bucket policy which locks the
>> > >> > bucket even to the bucket owner. The bucket now cannot be accessed
>> > (even
>> > >> > get its info or delete bucket policy does not work) I have looked in
>> > the
>> > >> > radosgw-admin command for a way to delete a bucket policy but do not
>> > see
>> > >> > anything. I presume I will need to somehow remove the bucket policy
>> > from
>> > >> > however it is stored in the bucket metadata / omap etc. If anyone can
>> > point
>> > >> > me in the right direction on that I would appreciate it. Thanks
>> > >> >
>> > >> > Respectfully,
>> > >> >
>> > >> > *Wes Dillingham*
>> > >> > w...@wesdillingham.com
>> > >> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> > >> > ___
>> > >> > ceph-users mailing list -- ceph-users@ceph.io
>> > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> > >> >
>> > >>
>> >
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Casey Bodley
i've opened https://tracker.ceph.com/issues/63485 to allow
admin/system users to override policy parsing errors like this. i'm
not sure yet where this parsing regression was introduced. in reef,
https://github.com/ceph/ceph/pull/49395 added better error messages
here, along with a rgw_policy_reject_invalid_principals option to be
strict about principal names


to remove a bucket policy that fails to parse with "Error reading IAM
Policy", you can follow these steps:

1. find the bucket's instance id using the 'bucket stats' command

$ radosgw-admin bucket stats --bucket {bucketname} | grep id

2. use the rados tool to remove the bucket policy attribute
(user.rgw.iam-policy) from the bucket instance metadata object

$ rados -p default.rgw.meta -N root rmxattr
.bucket.meta.{bucketname}:{bucketid} user.rgw.iam-policy

3. radosgws may be caching the existing bucket metadata and xattrs, so
you'd either need to restart them or clear their metadata caches

$ ceph daemon client.rgw.xyz cache zap

On Wed, Nov 8, 2023 at 9:06 AM Jayanth Reddy  wrote:
>
> Hello Wesley,
> Thank you for the response. I tried the same but ended up with 403.
>
> Regards,
> Jayanth
>
> On Wed, Nov 8, 2023 at 7:34 PM Wesley Dillingham  
> wrote:
>>
>> Jaynath:
>>
>> Just to be clear with the "--admin" user's key's you have attempted to 
>> delete the bucket policy using the following method: 
>> https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html
>>
>> This is what worked for me (on a 16.2.14 cluster). I didn't attempt to 
>> interact with the affected bucket in any way other than "aws s3api 
>> delete-bucket-policy"
>>
>> Respectfully,
>>
>> Wes Dillingham
>> w...@wesdillingham.com
>> LinkedIn
>>
>>
>> On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy  
>> wrote:
>>>
>>> Hello Casey,
>>>
>>> We're totally stuck at this point and none of the options seem to work. 
>>> Please let us know if there is something in metadata or index to remove 
>>> those applied bucket policies. We downgraded to v17.2.6 and encountering 
>>> the same.
>>>
>>> Regards,
>>> Jayanth
>>>
>>> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy  
>>> wrote:
>>>>
>>>> Hello Casey,
>>>>
>>>> And on further inspection, we identified that there were bucket policies 
>>>> set from the initial days; we were in v16.2.12.
>>>> We upgraded the cluster to v17.2.7 two days ago and it seems obvious that 
>>>> the IAM error logs are generated the next minute rgw daemon upgraded from 
>>>> v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
>>>>
>>>> I'm thinking to downgrade back to v17.2.6 and earlier, please let me know 
>>>> if this is a good option for now.
>>>>
>>>> Thanks,
>>>> Jayanth
>>>> 
>>>> From: Jayanth Reddy 
>>>> Sent: Tuesday, November 7, 2023 11:59:38 PM
>>>> To: Casey Bodley 
>>>> Cc: Wesley Dillingham ; ceph-users 
>>>> ; Adam Emerson 
>>>> Subject: Re: [ceph-users] Re: owner locked out of bucket via bucket policy
>>>>
>>>> Hello Casey,
>>>>
>>>> Thank you for the quick response. I see 
>>>> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please 
>>>> let me know.
>>>>
>>>> Regards
>>>> Jayanth
>>>>
>>>> On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley  wrote:
>>>>
>>>> On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>>>>  wrote:
>>>> >
>>>> > Hello Wesley and Casey,
>>>> >
>>>> > We've ended up with the same issue and here it appears that even the 
>>>> > user with "--admin" isn't able to do anything. We're now unable to 
>>>> > figure out if it is due to bucket policies, ACLs or IAM of some sort. 
>>>> > I'm seeing these IAM errors in the logs
>>>> >
>>>> > ```
>>>> >
>>>> > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851 
>>>> > 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due 
>>>> > to Handler error.
>>>> >
>>>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583 
>>>> > 0.0s s3:list_b

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-09 Thread Casey Bodley
On Wed, Nov 8, 2023 at 11:10 AM Yuri Weinstein  wrote:
>
> We merged 3 PRs and rebuilt "reef-release" (Build 2)
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek 2 jobs failed in "objectstore/bluestore" tests
> (see Build 2)
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey reapprove on Build 2

rgw reapproved

> fs - Venky, approve on Build 2
> orch - Adam King
> upgrade/quincy-x (reef) - Laura PTL
> powercycle - Brad (known issues)
>
> We need to close
> https://tracker.ceph.com/issues/63391
> (https://github.com/ceph/ceph/pull/54392) - Travis, Guillaume
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for 
> this?
>
> On Wed, Nov 8, 2023 at 6:33 AM Travis Nielsen  wrote:
> >
> > Yuri, we need to add this issue as a blocker for 18.2.1. We discovered this 
> > issue after the release of 17.2.7, and don't want to hit the same blocker 
> > in 18.2.1 where some types of OSDs are failing to be created in new 
> > clusters, or failing to start in upgraded clusters.
> > https://tracker.ceph.com/issues/63391
> >
> > Thanks!
> > Travis
> >
> > On Wed, Nov 8, 2023 at 4:41 AM Venky Shankar  wrote:
> >>
> >> Hi Yuri,
> >>
> >> On Wed, Nov 8, 2023 at 2:32 AM Yuri Weinstein  wrote:
> >> >
> >> > 3 PRs above mentioned were merged and I am returning some tests:
> >> > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
> >> >
> >> > Still seeing approvals.
> >> > smoke - Laura, Radek, Prashant, Venky in progress
> >> > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > rgw - Casey in progress
> >> > fs - Venky
> >>
> >> There's a failure in the fs suite
> >>
> >> 
> >> https://pulpito.ceph.com/vshankar-2023-11-07_05:14:36-fs-reef-release-distro-default-smithi/7450325/
> >>
> >> Seems to be related to nfs-ganesha. I've reached out to Frank Filz
> >> (#cephfs on ceph slack) to have a look. WIll update as soon as
> >> possible.
> >>
> >> > orch - Adam King
> >> > rbd - Ilya approved
> >> > krbd - Ilya approved
> >> > upgrade/quincy-x (reef) - Laura PTL
> >> > powercycle - Brad
> >> > perf-basic - in progress
> >> >
> >> >
> >> > On Tue, Nov 7, 2023 at 8:38 AM Casey Bodley  wrote:
> >> > >
> >> > > On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein  
> >> > > wrote:
> >> > > >
> >> > > > Details of this release are summarized here:
> >> > > >
> >> > > > https://tracker.ceph.com/issues/63443#note-1
> >> > > >
> >> > > > Seeking approvals/reviews for:
> >> > > >
> >> > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures)
> >> > > > rados - Neha, Radek, Travis, Ernesto, Adam King
> >> > > > rgw - Casey
> >> > >
> >> > > rgw results are approved. https://github.com/ceph/ceph/pull/54371
> >> > > merged to reef but is needed on reef-release
> >> > >
> >> > > > fs - Venky
> >> > > > orch - Adam King
> >> > > > rbd - Ilya
> >> > > > krbd - Ilya
> >> > > > upgrade/quincy-x (reef) - Laura PTL
> >> > > > powercycle - Brad
> >> > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures)
> >> > > >
> >> > > > Please reply to this email with approval and/or trackers of known
> >> > > > issues/PRs to address them.
> >> > > >
> >> > > > TIA
> >> > > > YuriW
> >> > > > ___
> >> > > > ceph-users mailing list -- ceph-users@ceph.io
> >> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> > > >
> >> > >
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >>
> >> --
> >> Cheers,
> >> Venky
> >> ___
> >> Dev mailing list -- d...@ceph.io
> >> To unsubscribe send an email to dev-le...@ceph.io
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW: user modify default_storage_class does not work

2023-11-13 Thread Casey Bodley
my understanding is that default placement is stored at the bucket
level, so changes to the user's default placement only take effect for
newly-created buckets

On Sun, Nov 12, 2023 at 9:48 PM Huy Nguyen  wrote:
>
> Hi community,
> I'm using Ceph version 16.2.13. I tried to set default_storage_class but 
> seems like it didn't work.
>
> Here is steps I did:
> I already had a storage class name COLD, then I modify the user 
> default_storage_class like this:
> radosgw-admin user modify --uid testuser --placement-id default-placement 
> --storage-class COLD
>
> after that, user info has show correctly:
> radosgw-admin user info --uid testuser
> {
> ...
> "op_mask": "read, write, delete",
> "default_placement": "default-placement",
> "default_storage_class": "COLD",
> ...
>
> Then I put a file using boto3, without specify any storage class:
> s3.Object(bucket_name, 'testdefault-object').put(Body="0"*1000))
>
> But the object still jump into the STANDARD storage class. I don't know if 
> this is a bug or did I miss something?
>
> Thanks
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help on rgw metrics (was rgw_user_counters_cache)

2024-01-31 Thread Casey Bodley
On Wed, Jan 31, 2024 at 3:43 AM garcetto  wrote:
>
> good morning,
>   i was struggling trying to understand why i cannot find this setting on
> my reef version, is it because is only on latest dev ceph version and not
> before?

that's right, this new feature will be part of the squid release. we
don't plan to backport it to reef

>
> https://docs.ceph.com/en/*latest*
> /radosgw/metrics/#user-bucket-counter-caches
>
> Reef gives 404
> https://docs.ceph.com/en/reef/radosgw/metrics/
>
> thank you!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-01-31 Thread Casey Bodley
On Mon, Jan 29, 2024 at 4:39 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/64151#note-1
>
> Seeking approvals/reviews for:
>
> rados - Radek, Laura, Travis, Ernesto, Adam King
> rgw - Casey

rgw approved, thanks

> fs - Venky
> rbd - Ilya
> krbd - in progress
>
> upgrade/nautilus-x (pacific) - Casey PTL (regweed tests failed)
> upgrade/octopus-x (pacific) - Casey PTL (regweed tests failed)
>
> upgrade/pacific-x (quincy) - in progress
> upgrade/pacific-p2p - Ilya PTL (maybe rbd related?)
>
> ceph-volume - Guillaume
>
> TIA
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Casey Bodley
On Fri, Feb 2, 2024 at 11:21 AM Chris Palmer  wrote:
>
> Hi Matthew
>
> AFAIK the upgrade from quincy/deb11 to reef/deb12 is not possible:
>
>   * The packaging problem you can work around, and a fix is pending
>   * You have to upgrade both the OS and Ceph in one step
>   * The MGR will not run under deb12 due to the PyO3 lack of support for
> subinterpreters.
>
> If you do attempt an upgrade, you will end up stuck with a partially
> upgraded cluster. The MONs will be on deb12/reef and cannot be
> downgraded, and the MGR will be stuck on deb11/quincy, We have a test
> cluster in that state with no way forward or back.
>
> I fear the MGR problem will spread as time goes on and PyO3 updates
> occur. And it's not good that it can silently corrupt in the existing
> apparently-working installations.
>
> No-one has picked up issue 64213 that I raised yet.
>
> I'm tempted to raise another issue for qa : the debian 12 package cannot
> have been tested as it just won't work either as an upgrade or a new
> install.

you're right that the debian packages don't get tested:

https://docs.ceph.com/en/reef/start/os-recommendations/#platforms

>
> Regards, Chris
>
>
> On 02/02/2024 14:40, Matthew Darwin wrote:
> > Chris,
> >
> > Thanks for all the investigations you are doing here. We're on
> > quincy/debian11.  Is there any working path at this point to
> > reef/debian12?  Ideally I want to go in two steps.  Upgrade ceph first
> > or upgrade debian first, then do the upgrade to the other one. Most of
> > our infra is already upgraded to debian 12, except ceph.
> >
> > On 2024-01-29 07:27, Chris Palmer wrote:
> >> I have logged this as https://tracker.ceph.com/issues/64213
> >>
> >> On 16/01/2024 14:18, DERUMIER, Alexandre wrote:
> >>> Hi,
> >>>
> > ImportError: PyO3 modules may only be initialized once per
> > interpreter
> > process
> >
> > and ceph -s reports "Module 'dashboard' has failed dependency: PyO3
> > modules may only be initialized once per interpreter process
> >>> We have the same problem on proxmox8 (based on debian12) with ceph
> >>> quincy or reef.
> >>>
> >>> It seem to be related to python version on debian12
> >>>
> >>> (we have no fix for this currently)
> >>>
> >>>
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-08 Thread Casey Bodley
thanks, i've created https://tracker.ceph.com/issues/64360 to track
these backports to pacific/quincy/reef

On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
>
> Hi,
>
> Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?
>
> You definitely want to build the Ubuntu / debian packages with the
> proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
>
> Thanks,
>
> Gr. Stefan
>
> P.s. Kudos to Mark Nelson for figuring it out / testing.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to solve data fixity

2024-02-09 Thread Casey Bodley
i've cc'ed Matt who's working on the s3 object integrity feature
https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html,
where rgw compares the generated checksum with the client's on ingest,
then stores it with the object so clients can read it back for later
integrity checks. you can track the progress in
https://tracker.ceph.com/issues/63951

On Fri, Feb 9, 2024 at 8:49 AM Josh Baergen  wrote:
>
> MPU etags are an MD5-of-MD5s, FWIW. If the users knows how the parts are
> uploaded then it can be used to verify contents, both just after upload and
> then at download time (both need to be validated if you want end-to-end
> validation - but then you're trusting the system to not change the etag
> underneath you).
>
> Josh
>
> On Fri, Feb 9, 2024, 6:16 a.m. Michal Strnad 
> wrote:
>
> > Thank you for your response.
> >
> > We have already done some Lua scripting in the past, and it wasn't
> > entirely enjoyable :-), but we may have to do it again. Scrubbing is
> > still enabled, and turning it off definitely won't be an option.
> > However, due to the project requirements, it would be great if
> > Ceph could, on upload completion, initiate and compute hash (
> > md5, sha256) and store it to object's metadata, so that user later
> > could validate if the downloaded data are correct.
> >
> > We can't use Etag for that as it is does not contain md5 in case of
> > multipart upload.
> >
> > Michal
> >
> >
> > On 2/9/24 13:53, Anthony D'Atri wrote:
> > > You could use Lua scripting perhaps to do this at ingest, but I'm very
> > curious about scrubs -- you have them turned off completely?
> > >
> > >
> > >> On Feb 9, 2024, at 04:18, Michal Strnad 
> > wrote:
> > >>
> > >> Hi all!
> > >>
> > >> In the context of a repository-type project, we need to address a
> > situation where we cannot use periodic checks in Ceph (scrubbing) due to
> > the project's nature. Instead, we need the ability to write a checksum into
> > the metadata of the uploaded file via API. In this context, we are not
> > concerned about individual file parts, but rather the file as a whole.
> > Users will calculate the checksum and write it. Based on this hash, we
> > should be able to trigger a check of the given files. We are aware that
> > tools like s3cmd can write MD5 hashes to file metadata, but is there a more
> > general approach? Does anyone have experience with this, or can you suggest
> > a tool that can accomplish this?
> > >>
> > >> Thx
> > >> Michal
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-21 Thread Casey Bodley
On Tue, Feb 20, 2024 at 10:58 AM Yuri Weinstein  wrote:
>
> We have restarted QE validation after fixing issues and merging several PRs.
> The new Build 3 (rebase of pacific) tests are summarized in the same
> note (see Build 3 runs) https://tracker.ceph.com/issues/64151#note-1
>
> Seeking approvals:
>
> rados - Radek, Junior, Travis, Ernesto, Adam King
> rgw - Casey

rgw approved

> fs - Venky
> rbd - Ilya
> krbd - Ilya
>
> upgrade/octopus-x (pacific) - Adam King, Casey PTL
>
> upgrade/pacific-p2p - Casey PTL

Yuri and i managed to get a green run here, approved

>
> ceph-volume - Guillaume, fixed by
> https://github.com/ceph/ceph/pull/55658 retesting
>
> On Thu, Feb 8, 2024 at 8:43 AM Casey Bodley  wrote:
> >
> > thanks, i've created https://tracker.ceph.com/issues/64360 to track
> > these backports to pacific/quincy/reef
> >
> > On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
> > >
> > > Hi,
> > >
> > > Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?
> > >
> > > You definitely want to build the Ubuntu / debian packages with the
> > > proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
> > >
> > > Thanks,
> > >
> > > Gr. Stefan
> > >
> > > P.s. Kudos to Mark Nelson for figuring it out / testing.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting: 2024-2-21 Minutes

2024-02-21 Thread Casey Bodley
Estimate on release timeline for 17.2.8?
- after pacific 16.2.15 and reef 18.2.2 hotfix
(https://tracker.ceph.com/issues/64339,
https://tracker.ceph.com/issues/64406)

Estimate on release timeline for 19.2.0?
- target April, depending on testing and RCs
- Testing plan for Squid beyond dev freeze (regression and upgrade
tests, performance tests, RCs)

Can we fix old.ceph.com?
- continued discussion about the need to revive the pg calc tool

T release name?
- please add and vote for suggestions in https://pad.ceph.com/p/t
- need name before we can open "t kickoff" pr
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: list topic shows endpoint url and username e password

2024-02-23 Thread Casey Bodley
thanks Giada, i see that you created
https://tracker.ceph.com/issues/64547 for this

unfortunately, this topic metadata doesn't really have a permission
model at all. topics are shared across the entire tenant, and all
users have access to read/overwrite those topics

a lot of work was done for https://tracker.ceph.com/issues/62727 to
add topic ownership and permission policy, and those changes will be
in the squid release

i've cc'ed Yuval and Krunal who worked on that - could these changes
be reasonably backported to quincy and reef?

On Fri, Feb 23, 2024 at 9:59 AM Giada Malatesta
 wrote:
>
> Hello everyone,
>
> we are facing a problem regarding the topic operations to send
> notification, particularly when using amqp protocol.
>
> We are using Ceph version 18.2.1. We have created a topic by giving as
> attributes all needed information and so the push-endpoint (in our case
> a rabbit endpoint that is used to collect notification messages). Then
> we have configured all the buckets in our cluster Ceph so that it is
> possible to send notification when some changes occur.
>
> The problem regards particularly the list_topic operation: we noticed
> that any authenticated user is able to get a full list of the created
> topics and with them to get all the information, including endpoint,
> and so username and password and IP and port, when using the
> boto3.set_stream_logger(), which is not good for our goal since we do
> not want the users to know implementation details.
>
> There is the possibility to solve this problem? Any help would be useful.
>
> Thanks and best regards.
>
> GM.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Hanging request in S3

2024-03-06 Thread Casey Bodley
hey Christian, i'm guessing this relates to
https://tracker.ceph.com/issues/63373 which tracks a deadlock in s3
DeleteObjects requests when multisite is enabled.
rgw_multi_obj_del_max_aio can be set to 1 as a workaround until the
reef backport lands

On Wed, Mar 6, 2024 at 2:41 PM Christian Kugler  wrote:
>
> Hi,
>
> I am having some trouble with some S3 requests and I am at a loss.
>
> After upgrading to reef a couple of weeks ago some requests get stuck and
> never
> return. The two Ceph clusters are set up to sync the S3 realm
> bidirectionally.
> The bucket has 479 shards (dynamic resharding) at the moment.
>
> Putting an object (/etc/services) into the bucket via s3cmd works, and
> deleting
> it works as well. So I know it is not just the entire bucket that is somehow
> faulty.
>
> When I try to delete a specific prefix it the request for listing all
> objects
> never comes back. In the example below I only included the request in
> question
> which I aborted with ^C.
>
> $ s3cmd rm -r
> s3://sql20/pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/ -d
> [...snip...]
> DEBUG: Canonical Request:
> GET
> /sql20/
> prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
> host:[...snip...]
> x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> x-amz-date:20240306T183435Z
>
> host;x-amz-content-sha256;x-amz-date
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
> --
> DEBUG: signature-v4 headers: {'x-amz-date': '20240306T183435Z',
> 'Authorization': 'AWS4-HMAC-SHA256
> Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
> 'x-amz-content-sha256':
> 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'}
> DEBUG: Processing request, please wait...
> DEBUG: get_hostname(sql20): [...snip...]
> DEBUG: ConnMan.get(): re-using connection: [...snip...]#1
> DEBUG: format_uri():
> /sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
> DEBUG: Sending request method_string='GET',
> uri='/sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F',
> headers={'x-amz-date': '20240306T183435Z', 'Authorization':
> 'AWS4-HMAC-SHA256
> Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
> 'x-amz-content-sha256':
> 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'},
> body=(0 bytes)
> ^CDEBUG: Response:
> {}
> See ya!
>
> The request did not show up normally in the logs so I set debug_rgw=20 and
> debug_ms=20 via ceph config set.
>
> I tried to isolate the request and looked for its request id:
> 13321243250692796422
> The following is a grep for the request id:
>
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket verifying op params
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket pre-executing
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket check rate limiting
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket executing
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket list_objects_ordered: starting attempt 1
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.0s
> s3:list_bucket cls_bucket_list_ordered: request from each of 479 shard(s)
> for 8 entries to get 1001 total entries
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: currently processing
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
> from shard 437
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
> obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
> state=0x55d4237419e8 s->prefetch_data=0
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: skipping
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz[]
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: currently processing
> pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
> from shard 202
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
> obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
> state=0x55d4237419e8 s->prefetch_data=0
> Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
> s3:list_bucket cls_bucket_list_ordered: skippin

[ceph-users] Re: Disable signature url in ceph rgw

2024-03-07 Thread Casey Bodley
anything we can do to narrow down the policy issue here? any of the
Principal, Action, Resource, or Condition matches could be failing
here. you might try replacing each with a wildcard, one at a time,
until you see the policy take effect

On Wed, Dec 13, 2023 at 5:04 AM Marc Singer  wrote:
>
> Hi
>
> As my attachment is very messy, I cleaned it up and provide a much
> simpler version for your tests bellow.
> These policies seem to get ignored when the URL is presigned.
>
> {
> "Version":"2012-10-17",
> "Id":"userbucket%%%policy",
> "Statement":[
>{
>   "Sid":"username%%%read",
>   "Effect":"Allow",
>   "Principal":{
>  "AWS":"arn:aws:iam:::user/username"
>   },
>   "Action":[
>  "s3:ListBucket",
>  "s3:ListBucketVersions",
>  "s3:GetObject",
>  "s3:GetObjectVersion"
>   ],
>   "Resource":[
>  "arn:aws:s3:::userbucket",
>  "arn:aws:s3:::userbucket/*"
>   ],
>   "Condition":{
>  "IpAddress":{
> "aws:SourceIp":[
>"redacted"
> ]
>  }
>   }
>},
>{
>   "Sid":"username%%%write",
>   "Effect":"Allow",
>   "Principal":{
>  "AWS":"arn:aws:iam:::user/username"
>   },
>   "Action":[
>  "s3:PutObject",
>  "s3:DeleteObject",
>  "s3:DeleteObjectVersion",
>  "s3:ListBucketMultipartUploads",
>  "s3:ListMultipartUploadParts",
>  "s3:AbortMultipartUpload"
>   ],
>   "Resource":[
>  "arn:aws:s3:::userbucket",
>  "arn:aws:s3:::userbucket/*"
>   ],
>   "Condition":{
>  "IpAddress":{
> "aws:SourceIp":[
>"redacted"
> ]
>  }
>   }
>},
>{
>   "Sid":"username%%%policy_control",
>   "Effect":"Deny",
>   "Principal":{
>  "AWS":"arn:aws:iam:::user/username"
>   },
>   "Action":[
>  "s3:PutObjectAcl",
>  "s3:GetObjectAcl",
>  "s3:PutBucketAcl",
>  "s3:GetBucketPolicy",
>  "s3:DeleteBucketPolicy",
>  "s3:PutBucketPolicy"
>   ],
>   "Resource":[
>  "arn:aws:s3:::userbucket",
>  "arn:aws:s3:::userbucket/*"
>   ]
>}
> ]
> }
>
> Thanks and yours sincerely
>
> Marc Singer
>
> On 2023-12-12 10:24, Marc Singer wrote:
> > Hi
> >
> > First, all requests with presigned URLs should be restricted.
> >
> > This is how the request is blocked with the nginx sidecar (it's just a
> > simple parameter in the URL that is forbidden):
> >
> > if ($arg_Signature) { return 403 'Signature parameter forbidden';
> >
> > }
> >
> > Our bucket policies are created automatically with a custom
> > microservice. You find an example in attachment from a random "managed"
> > bucket. These buckets are affected by the issue.
> >
> > There is a policy that stops users from changing the policy.
> >
> > I might have done a mistake when redacting replacing a user with the
> > same values.
> >
> > Thanks you and have a great day
> >
> > Marc
> >
> > On 12/9/23 00:37, Robin H. Johnson wrote:
> >> On Fri, Dec 08, 2023 at 10:41:59AM +0100,marc@singer.services  wrote:
> >>> Hi Ceph users
> >>>
> >>> We are using Ceph Pacific (16) in this specific deployment.
> >>>
> >>> In our use case we do not want our users to be able to generate
> >>> signature v4 URLs because they bypass the policies that we set on
> >>> buckets (e.g IP restrictions).
> >>> Currently we have a sidecar reverse proxy running that filters
> >>> requests with signature URL specific request parameters.
> >>> This is obviously not very efficient and we are looking to replace
> >>> this somehow in the future.
> >>>
> >>> 1. Is there an option in RGW to disable this signed URLs (e.g
> >>> returning status 403)?
> >>> 2. If not is this planned or would it make sense to add it as a
> >>> configuration option?
> >>> 3. Or is the behaviour of not respecting bucket policies in RGW with
> >>> signature v4 URLs a bug and they should be actually applied?
> >> Trying to clarify your ask:
> >> - you want ALL requests, including presigned URLs, to be subject to
> >> the
> >>IP restrictions encoded in your bucket policy?
> >>e.g. auth (signature AND IP-list)
> >>
> >> That should be possible with bucket policy.
> >>
> >> Can you post the current bucket policy that you have? (redact with
> >> distinct values the IPs, userids, bucket name, any paths, but
> >> otherwise
> >> keep it complete).
> >>
> >> You cannot fundamentally stop anybody from generating presigned URLs,
> >> because that's purely a client-side operation. Generating presigned
> >> URLs
> >> requires an access key and secret key, a

[ceph-users] v17.2.7 Quincy now supports Ubuntu 22.04 (Jammy Jellyfish)

2024-03-29 Thread Casey Bodley
Ubuntu 22.04 packages are now available for the 17.2.7 Quincy release.

The upcoming Squid release will not support Ubuntu 20.04 (Focal
Fossa). Ubuntu users planning to upgrade from Quincy to Squid will
first need to perform a distro upgrade to 22.04.

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-17.2.7.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: b12291d110049b2f35e32e0de30d70e9a4c060d2
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
On Wed, Apr 3, 2024 at 11:58 AM Lorenz Bausch  wrote:
>
> Hi everybody,
>
> we upgraded our containerized Red Hat Pacific cluster to the latest
> Quincy release (Community Edition).

i'm afraid this is not an upgrade path that we try to test or support.
Red Hat makes its own decisions about what to backport into its
releases. my understanding is that Red Hat's pacific-based 5.3 release
includes all of the rgw multisite resharding changes which were not
introduced upstream until the Reef release. this includes changes to
data formats that an upstream Quincy release would not understand. in
this case, you might have more luck upgrading to Reef?

> The upgrade itself went fine, the cluster is HEALTH_OK, all daemons run
> the upgraded version:
>
>  %< 
> $ ceph -s
>cluster:
>  id: 68675a58-cf09-4ebd-949c-b9fcc4f2264e
>  health: HEALTH_OK
>
>services:
>  mon: 5 daemons, quorum node02,node03,node04,node05,node01 (age 25h)
>  mgr: node03.ztlair(active, since 25h), standbys: node01.koymku,
> node04.uvxgvp, node02.znqnhg, node05.iifmpc
>  osd: 408 osds: 408 up (since 22h), 408 in (since 7d)
>  rgw: 19 daemons active (19 hosts, 1 zones)
>
>data:
>  pools:   11 pools, 8481 pgs
>  objects: 236.99M objects, 544 TiB
>  usage:   1.6 PiB used, 838 TiB / 2.4 PiB avail
>  pgs: 8385 active+clean
>   79   active+clean+scrubbing+deep
>   17   active+clean+scrubbing
>
>io:
>  client:   42 MiB/s rd, 439 MiB/s wr, 2.15k op/s rd, 1.64k op/s wr
>
> ---
>
> $ ceph versions | jq .overall
> {
>"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)": 437
> }
>  >% 
>
> After all the daemons were upgraded we started noticing some RGW buckets
> which are inaccessible.
> s3cmd failed with NoSuchKey:
>
>  %< 
> $ s3cmd la -l
> ERROR: S3 error: 404 (NoSuchKey)
>  >% 
>
> The buckets still exists according to "radosgw-admin bucket list".
> Out of the ~600 buckets, 13 buckets are unaccessible at the moment:
>
>  %< 
> $ radosgw-admin bucket radoslist --tenant xy --uid xy --bucket xy
> 2024-04-03T12:13:40.607+0200 7f0dbf4c4680  0 int
> RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> CLSRGWIssueBucketList for
> xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> 2024-04-03T12:13:40.609+0200 7f0dbf4c4680  0 int
> RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> CLSRGWIssueBucketList for
> xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
>  >% 
>
> The affected buckets are comparatively large, around 4 - 7 TB,
> but not all buckets of that size are affected.
>
> Using "rados -p rgw.buckets.data ls" it seems like all the objects are
> still there,
> although "rados -p rgw.buckets.data get objectname -" only prints
> unusable (?) binary data,
> even for objects of intact buckets.
>
> Overall we're facing around 60 TB of customer data which are just gone
> at the moment.
> Is there a way to recover from this situation or further narrowing down
> the root cause of the problem?
>
> Kind regards,
> Lorenz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
to expand on this diagnosis: with multisite resharding, we changed how
buckets name/locate their bucket index shard objects. any buckets that
were resharded under this Red Hat pacific release would be using the
new object names. after upgrading to the Quincy release, rgw would
look at the wrong object names when trying to list those buckets. 404
NoSuchKey is the response i would expect in that case

On Wed, Apr 3, 2024 at 12:20 PM Casey Bodley  wrote:
>
> On Wed, Apr 3, 2024 at 11:58 AM Lorenz Bausch  wrote:
> >
> > Hi everybody,
> >
> > we upgraded our containerized Red Hat Pacific cluster to the latest
> > Quincy release (Community Edition).
>
> i'm afraid this is not an upgrade path that we try to test or support.
> Red Hat makes its own decisions about what to backport into its
> releases. my understanding is that Red Hat's pacific-based 5.3 release
> includes all of the rgw multisite resharding changes which were not
> introduced upstream until the Reef release. this includes changes to
> data formats that an upstream Quincy release would not understand. in
> this case, you might have more luck upgrading to Reef?
>
> > The upgrade itself went fine, the cluster is HEALTH_OK, all daemons run
> > the upgraded version:
> >
> >  %< 
> > $ ceph -s
> >cluster:
> >  id: 68675a58-cf09-4ebd-949c-b9fcc4f2264e
> >  health: HEALTH_OK
> >
> >services:
> >  mon: 5 daemons, quorum node02,node03,node04,node05,node01 (age 25h)
> >  mgr: node03.ztlair(active, since 25h), standbys: node01.koymku,
> > node04.uvxgvp, node02.znqnhg, node05.iifmpc
> >  osd: 408 osds: 408 up (since 22h), 408 in (since 7d)
> >  rgw: 19 daemons active (19 hosts, 1 zones)
> >
> >data:
> >  pools:   11 pools, 8481 pgs
> >  objects: 236.99M objects, 544 TiB
> >  usage:   1.6 PiB used, 838 TiB / 2.4 PiB avail
> >  pgs: 8385 active+clean
> >   79   active+clean+scrubbing+deep
> >   17   active+clean+scrubbing
> >
> >io:
> >  client:   42 MiB/s rd, 439 MiB/s wr, 2.15k op/s rd, 1.64k op/s wr
> >
> > ---
> >
> > $ ceph versions | jq .overall
> > {
> >"ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> > (stable)": 437
> > }
> >  >% 
> >
> > After all the daemons were upgraded we started noticing some RGW buckets
> > which are inaccessible.
> > s3cmd failed with NoSuchKey:
> >
> >  %< 
> > $ s3cmd la -l
> > ERROR: S3 error: 404 (NoSuchKey)
> >  >% 
> >
> > The buckets still exists according to "radosgw-admin bucket list".
> > Out of the ~600 buckets, 13 buckets are unaccessible at the moment:
> >
> >  %< 
> > $ radosgw-admin bucket radoslist --tenant xy --uid xy --bucket xy
> > 2024-04-03T12:13:40.607+0200 7f0dbf4c4680  0 int
> > RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> > RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> > string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> > rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> > CLSRGWIssueBucketList for
> > xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> > 2024-04-03T12:13:40.609+0200 7f0dbf4c4680  0 int
> > RGWRados::cls_bucket_list_ordered(const DoutPrefixProvider*,
> > RGWBucketInfo&, int, const rgw_obj_index_key&, const string&, const
> > string&, uint32_t, bool, uint16_t, RGWRados::ent_map_t&, bool*, bool*,
> > rgw_obj_index_key*, optional_yield, RGWBucketListNameFilter):
> > CLSRGWIssueBucketList for
> > xy:xy[6955f50e-5b23-4534-9b77-c7078f60f0d0.171713434.3]) failed
> >  >% 
> >
> > The affected buckets are comparatively large, around 4 - 7 TB,
> > but not all buckets of that size are affected.
> >
> > Using "rados -p rgw.buckets.data ls" it seems like all the objects are
> > still there,
> > although "rados -p rgw.buckets.data get objectname -" only prints
> > unusable (?) binary data,
> > even for objects of intact buckets.
> >
> > Overall we're facing around 60 TB of customer data which are just gone
> > at the moment.
> > Is there a way to recover from this situation or further narrowing down
> > the root cause of the problem?
> >
> > Kind regards,
> > Lorenz
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgraded to Quincy 17.2.7: some S3 buckets inaccessible

2024-04-03 Thread Casey Bodley
On Wed, Apr 3, 2024 at 3:09 PM Lorenz Bausch  wrote:
>
> Hi Casey,
>
> thank you so much for analysis! We tested the upgraded intensively, but
> the buckets in our test environment were probably too small to get
> dynamically resharded.
>
> > after upgrading to the Quincy release, rgw would
> > look at the wrong object names when trying to list those buckets.
> As we're currently running Quincy, do you think objects/bucket indexes
> might already be altered in a way which makes them also unusable for
> Reef?

for multisite resharding support, the bucket instance metadata now
stores an additional 'layout' structure which contains all of the
information necessary to locate its bucket index objects. on reshard,
the Red Hat pacific release would have stored that information with
the bucket. the upstream Reef release should be able to interpret that
layout data correctly

however, if the Quincy release overwrites that bucket instance
metadata (via an operation like PutBucketAcl, PutBucketPolicy, etc),
the corresponding layout information would be erased such that an
upgrade to Reef would not be able to find the real bucket index shard
objects

>
> Kind regards,
> Lorenz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrating from S3 to Ceph RGW (Cloud Sync Module)

2024-04-11 Thread Casey Bodley
unfortunately, this cloud sync module only exports data from ceph to a
remote s3 endpoint, not the other way around:

"This module syncs zone data to a remote cloud service. The sync is
unidirectional; data is not synced back from the remote zone."

i believe that rclone supports copying from one s3 endpoint to
another. does anyone have experience with that?

On Thu, Apr 11, 2024 at 4:45 PM James McClune  wrote:
>
> Hello Ceph User Community,
>
> I currently have a large Amazon S3 environment with terabytes of data
> spread over dozens of buckets. I'm looking to migrate from Amazon S3 to an
> on-site Ceph cluster using the RGW. I'm trying to figure out the
> most efficient way to achieve this. Looking through the documentation, I
> found articles related to the cloud sync module, released in Mimic (
> https://docs.ceph.com/en/latest/radosgw/cloud-sync-module/). I also watched
> a video on the cloud sync module as well. It *sounds* like this is the
> functionality I'm looking for.
>
> Given I'm moving away from Amazon S3, I'm really just looking for a one-way
> replication between the buckets (i.e. Provide an Amazon S3 access
> key/secret which is associated to the buckets and the same for the Ceph
> environment, so object data can be replicated one-to-one, without creating
> ad-hoc tooling). Once the data is replicated from S3 to Ceph, I plan on
> modifying my boto connection objects to use the new Ceph environment. Is
> what I'm describing feasible with the cloud sync module? Just looking for
> some affirmation, given I'm not well versed in Ceph's RGW, especially
> around multi-site configurations.
>
> Thanks,
> Jimmy
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-12 Thread Casey Bodley
On Fri, Apr 12, 2024 at 2:38 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/65393#note-1
> Release Notes - TBD
> LRC upgrade - TBD
>
> Seeking approvals/reviews for:
>
> smoke - infra issues, still trying, Laura PTL
>
> rados - Radek, Laura approved? Travis?  Nizamudeen?
>
> rgw - Casey approved?

rgw approved

> fs - Venky approved?
> orch - Adam King approved?
>
> krbd - Ilya approved
> powercycle - seems fs related, Venky, Brad PTL
>
> ceph-volume - will require
> https://github.com/ceph/ceph/pull/56857/commits/63fe3921638f1fb7fc065907a9e1a64700f8a600
> Guillaume is fixing it.
>
> TIA
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best practice regarding rgw scaling

2024-05-23 Thread Casey Bodley
On Thu, May 23, 2024 at 11:50 AM Szabo, Istvan (Agoda)
 wrote:
>
> Hi,
>
> Wonder what is the best practice to scale RGW, increase the thread numbers or 
> spin up more gateways?
>
>
>   *
> Let's say I have 21000 connections on my haproxy
>   *
> I have 3 physical gateway servers so let's say each of them need to server 
> 7000 connections
>
> This means with 512 thread pool size each of them needs 13 gateway altogether 
> 39 in the cluster.
> or
> 3 gateway and each 8192 rgw thread?

with the beast frontend, rgw_max_concurrent_requests is the most
relevant config option here. while you might benefit from more than
512 threads at scale, you won't need a thread per connection

i'd also point out the relationship between concurrent requests and
memory usage: with default tunings, each PutObject
(rgw_put_obj_min_window_size) and GetObject (rgw_get_obj_window_size)
request may buffer up to 16MB of object data

>
> Thank you
>
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Weekly Minutes 2024-06-10

2024-06-10 Thread Casey Bodley
# quincy now past estimated 2024-06-01 end-of-life

will 17.2.8 be the last point release? maybe not, depending on timing

# centos 8 eol

* Casey tried to summarize the fallout in
https://lists.ceph.io/hyperkitty/list/d...@ceph.io/thread/H7I4Q4RAIT6UZQNPPZ5O3YB6AUXLLAFI/
* c8 builds were disabled with https://github.com/ceph/ceph-build/pull/2235
* Patrick points out that we'll no longer be able to test upgrades
from octopus/pacific to quincy/reef

## reef 18.2.3 validation delayed

* need to remove references to centos 8 in the qa suites
** reef backport started in https://github.com/ceph/ceph/pull/57932
** still blocked on fs upgrade suite (for main/squid also)
* the plan is to move upgrade suites to cephadm where possible
** concern about lack of package-based upgrade testing

## alternatives to centos for container base distro?

to be discussed in public on the mailing list

# Cephalocon program committee - volunteers?

* Josh Durgin
* Patrick Donnelly
* Joseph Mundackal (first time - so happy to help in anyway i can)
* Matt Benjamin (talk review)

# Crimson Tech lead change

* congrats to Matan Breizman

# docs backports to Reef and to Quincy fail

* Zac would like to learn about the doc build infrastructure in order
to fix issues like this
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw API issues

2022-07-15 Thread Casey Bodley
are you running quincy? it looks like this '/admin/info' API was new
to that release

https://docs.ceph.com/en/quincy/radosgw/adminops/#info

On Fri, Jul 15, 2022 at 7:04 AM Marcus Müller  wrote:
>
> Hi all,
>
> I’ve created a test user on our radosgw to work with the API. I’ve done the 
> following:
>
> ~#radosgw-admin user create --uid=testuser--display-name=„testuser"
>
> ~#radosgw-admin caps add --uid=testuser --caps={caps}
> "caps": [
> {
> "type": "amz-cache",
> "perm": "*"
> },
> {
> "type": "bilog",
> "perm": "*"
> },
> {
> "type": "buckets",
> "perm": "*"
> },
> {
> "type": "datalog",
> "perm": "*"
> },
> {
> "type": "mdlog",
> "perm": "*"
> },
> {
> "type": "metadata",
> "perm": "*"
> },
> {
> "type": "oidc-provider",
> "perm": "*"
> },
> {
> "type": "roles",
> "perm": "*"
> },
> {
> "type": "usage",
> "perm": "*"
> },
> {
> "type": "user-policy",
> "perm": "*"
> },
> {
> "type": "users",
> "perm": "*"
> },
> {
> "type": "zone",
> "perm": "*"
> }
> ],
>
>
> But for my GET request (with Authorization Header) I only get a "405 - Method 
> not Allowed" answer. This is my request url: 
> https://s3.example.de/admin/info?format=json 
> 
>
> Where is the issue here?
>
>
> Regards,
> Marcus
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw API issues

2022-07-18 Thread Casey Bodley
there's a shell script in
https://github.com/ceph/ceph/blob/main/examples/rgw_admin_curl.sh.
there are also some client libraries listed in
https://docs.ceph.com/en/pacific/radosgw/adminops/#binding-libraries

On Mon, Jul 18, 2022 at 7:06 AM Marcus Müller  wrote:
>
> Thank you! We are running Pacific, that was my issue here.
>
> Can someone share a example of a full API request and answer with curl? I’m 
> still having issues, now getting 401 or 403 answers (but providing Auth-User 
> and Auth-Key).
>
> Regards
> Marcus
>
>
>
> Am 15.07.2022 um 15:23 schrieb Casey Bodley :
>
> are you running quincy? it looks like this '/admin/info' API was new
> to that release
>
> https://docs.ceph.com/en/quincy/radosgw/adminops/#info
>
> On Fri, Jul 15, 2022 at 7:04 AM Marcus Müller  
> wrote:
>
>
> Hi all,
>
> I’ve created a test user on our radosgw to work with the API. I’ve done the 
> following:
>
> ~#radosgw-admin user create --uid=testuser--display-name=„testuser"
>
> ~#radosgw-admin caps add --uid=testuser --caps={caps}
>"caps": [
>{
>"type": "amz-cache",
>"perm": "*"
>},
>{
>"type": "bilog",
>"perm": "*"
>},
>{
>"type": "buckets",
>"perm": "*"
>},
>{
>"type": "datalog",
>"perm": "*"
>},
>{
>"type": "mdlog",
>"perm": "*"
>},
>{
>"type": "metadata",
>"perm": "*"
>},
>{
>"type": "oidc-provider",
>"perm": "*"
>},
>{
>"type": "roles",
>"perm": "*"
>},
>{
>"type": "usage",
>"perm": "*"
>},
>{
>"type": "user-policy",
>"perm": "*"
>},
>{
>"type": "users",
>"perm": "*"
>},
>{
>"type": "zone",
>"perm": "*"
>}
>],
>
>
> But for my GET request (with Authorization Header) I only get a "405 - Method 
> not Allowed" answer. This is my request url: 
> https://s3.example.de/admin/info?format=json 
> <https://s3.example.de/admin/info?format=json>
>
> Where is the issue here?
>
>
> Regards,
> Marcus
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: RGW Bucket Notifications and MultiPart Uploads

2022-07-20 Thread Casey Bodley
On Wed, Jul 20, 2022 at 12:57 AM Yuval Lifshitz  wrote:
>
> yes, that would work. you would get a "404" until the object is fully
> uploaded.

just note that you won't always get 404 before multipart complete,
because multipart uploads can overwrite existing objects

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-25 Thread Casey Bodley
On Sun, Jul 24, 2022 at 11:33 AM Yuri Weinstein  wrote:
>
> Still seeking approvals for:
>
> rados - Travis, Ernesto, Adam
> rgw - Casey

rgw approved

> fs, kcephfs, multimds - Venky, Patrick
> ceph-ansible - Brad pls take a look
>
> Josh, upgrade/client-upgrade-nautilus-octopus failed, do we need to fix it, 
> pls take a look/approve.
>
>
> On Fri, Jul 22, 2022 at 10:06 AM Neha Ojha  wrote:
>>
>> On Thu, Jul 21, 2022 at 8:47 AM Ilya Dryomov  wrote:
>> >
>> > On Thu, Jul 21, 2022 at 4:24 PM Yuri Weinstein  wrote:
>> > >
>> > > Details of this release are summarized here:
>> > >
>> > > https://tracker.ceph.com/issues/56484
>> > > Release Notes - https://github.com/ceph/ceph/pull/47198
>> > >
>> > > Seeking approvals for:
>> > >
>> > > rados - Neha, Travis, Ernesto, Adam
>>
>> rados approved!
>> known issue https://tracker.ceph.com/issues/55854
>>
>> Thanks,
>> Neha
>>
>> >
>> > > rgw - Casey
>> > > fs, kcephfs, multimds - Venky, Patrick
>> > > rbd - Ilya, Deepika
>> > > krbd  Ilya, Deepika
>> >
>> > rbd and krbd approved.
>> >
>> > Thanks,
>> >
>> > Ilya
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rgw: considering deprecation of SSE-KMS integration with OpenStack Barbican

2022-08-05 Thread Casey Bodley
Barbican was the first key management server used for rgw's Server
Side Encryption feature. it's integration is documented in
https://docs.ceph.com/en/quincy/radosgw/barbican/

we've since added SSE-KMS support for Vault and KMIP, and the SSE-S3
feature (coming soon to quincy) requires Vault

our Barbican tests stopped working about 6 months ago (see
https://tracker.ceph.com/issues/54247), and nobody is familiar enough
with the ecosystem to fix it. these tests are pinned to old versions
of keystone (17.0.0 which was ossuri?) and barbican (5.0.0 which was
pike?), but something changed and they no longer work

rgw can't maintain features that we can't test. if Barbican support is
important to the community, we'd love some assistance in
updating/fixing these tests. if there is no interest, we'll likely
deprecate it in R and remove it in S

our team feels that Vault is a more attractive target for continued
development. does Barbican offer any specific advantages? please let
us know your thoughts!

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Problem adding secondary realm to rados-gw

2022-08-22 Thread Casey Bodley
On Mon, Aug 22, 2022 at 12:37 PM Matt Dunavant
 wrote:
>
> Hello,
>
>
> I'm trying to add a secondary realm to my ceph cluster but I'm getting the 
> following error after running a 'radosgw-admin realm pull --rgw-realm=$REALM 
> --url=http://URL:80 --access-key=$KEY --secret=$SECRET':
>
>
> request failed: (5) Input/output error
>
>
> Nothing on google seems to help with the error happening as the realm pull is 
> attempted. Realm is up and running in our primary site just fime. Here's the 
> logs when adding --debug-rgw=20:
>
>
> 2022-08-22T12:21:36.049-0400 7f46b94a9c80 20 check_secure_mon_conn(): auth 
> registy supported: methods=[2] modes=[2,1]
> 2022-08-22T12:21:36.049-0400 7f46b94a9c80 20 check_secure_mon_conn(): mode 1 
> is insecure
> 2022-08-22T12:21:36.057-0400 7f46b94a9c80 20 > HTTP_DATE -> Mon Aug 22 
> 16:21:36 2022
> 2022-08-22T12:21:36.057-0400 7f46b94a9c80 10 get_canon_resource(): 
> dest=/admin/realm
> 2022-08-22T12:21:36.057-0400 7f46b94a9c80 10 generated canonical header: GET
>
>
> Mon Aug 22 16:21:36 2022
> /admin/realm
> 2022-08-22T12:21:36.057-0400 7f46b94a9c80 15 generated auth header: AWS 
> RA91T371DX79FLGEXX3F:ZaEtUSD7tG/foKjqlaX4FFV/Z60=
> 2022-08-22T12:21:36.057-0400 7f46b94a9c80 20 sending request to 
> http://URL:80/admin/realm?id=26e78bc5-714c-4993-a4bd-b07918bd223a
> 2022-08-22T12:21:36.057-0400 7f46b94a9c80 20 register_request 
> mgr=0x55e4e1b7f8d0 req_data->id=0, curl_handle=0x55e4e1b7f8b0
> 2022-08-22T12:21:36.057-0400 7f468dfdb700 20 reqs_thread_entry: start
> 2022-08-22T12:21:36.057-0400 7f468dfdb700 20 link_request 
> req_data=0x55e4e1a06330 req_data->id=0, curl_handle=0x55e4e1b7f8b0
> request failed: (5) Input/output error
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 receive_http_header
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 received header:HTTP/1.1 302 
> Moved Temporarily
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 receive_http_header
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 received header:Date: Mon, 22 
> Aug 2022 16:21:36 GMT
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 receive_http_header
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 received 
> header:Proxy-Connection: close
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 receive_http_header
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 received header:Via: 1.1 
> proxy201.convokesystems.com
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 receive_http_header
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 received header:Location: 
> http://10.2.22.20:15871/cgi-bin/blockpage.cgi?ws-session=4060937779

http://URL:80 appears to be going through a proxy that's trying to
redirect this request

> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 receive_http_header
> 2022-08-22T12:21:36.113-0400 7f468dfdb700 10 received header:Content-Length: 0
>
> Ceph version is 16.2.10. Any ideas?
>
> Thanks,
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-13 Thread Casey Bodley
On Tue, Sep 13, 2022 at 4:03 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/57472#note-1
> Release Notes - https://github.com/ceph/ceph/pull/48072
>
> Seeking approvals for:
>
> rados - Neha, Travis, Ernesto, Adam
> rgw - Casey

rgw approved

> fs - Venky
> orch - Adam
> rbd - Ilya, Deepika
> krbd - missing packages, Adam Kr is looking into it
> upgrade/octopus-x - missing packages, Adam Kr is looking into it
> ceph-volume - Guillaume is looking into it
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - LRC upgrade pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Public RGW access without any LB in front?

2022-09-19 Thread Casey Bodley
hi Boris, it looks like your other questions have been covered but
i'll snipe this one:

On Fri, Sep 16, 2022 at 7:55 AM Boris Behrens  wrote:
>
> How good is it handling bad HTTP request, sent by an attacker?)

rgw relies on the boost.beast library to parse these http requests.
that library has had ongoing security reviews:
https://www.boost.org/doc/libs/1_79_0/libs/beast/doc/html/beast/quick_start/security_review_bishop_fox.html

a strict http parser can protect against a lot of known attacks. that
doesn't mean rgw won't do bad things interpreting valid requests, but
i don't think proxies help with those kinds of bugs either

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-30 Thread Casey Bodley
On Thu, Sep 29, 2022 at 12:40 PM Neha Ojha  wrote:
>
>
>
> On Mon, Sep 19, 2022 at 9:38 AM Yuri Weinstein  wrote:
>>
>> Update:
>>
>> Remaining =>
>> upgrade/octopus-x - Neha pls review/approve
>
>
> Both the failures in 
> http://pulpito.front.sepia.ceph.com/yuriw-2022-09-16_16:33:35-upgrade:octopus-x-quincy-release-distro-default-smithi/
>  seem related to RGW. Casey, can you please confirm that these are not 
> regressions?

those look like the same failures tracked in
https://tracker.ceph.com/issues/55498, so not a regression

>
> Thanks,
> Neha
>
>>
>>
>> We are in process upgrading the gibba cluster and then LRC and then
>> will make the RC available for users testing
>>
>> On Tue, Sep 13, 2022 at 1:02 PM Yuri Weinstein  wrote:
>> >
>> > Details of this release are summarized here:
>> >
>> > https://tracker.ceph.com/issues/57472#note-1
>> > Release Notes - https://github.com/ceph/ceph/pull/48072
>> >
>> > Seeking approvals for:
>> >
>> > rados - Neha, Travis, Ernesto, Adam
>> > rgw - Casey
>> > fs - Venky
>> > orch - Adam
>> > rbd - Ilya, Deepika
>> > krbd - missing packages, Adam Kr is looking into it
>> > upgrade/octopus-x - missing packages, Adam Kr is looking into it
>> > ceph-volume - Guillaume is looking into it
>> >
>> > Please reply to this email with approval and/or trackers of known
>> > issues/PRs to address them.
>> >
>> > Josh, Neha - LRC upgrade pending major suites approvals.
>> > RC release - pending major suites approvals.
>> >
>> > Thx
>> > YuriW
>>
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: octopus 15.2.17 RGW daemons begin to crash regularly

2022-10-06 Thread Casey Bodley
hey Boris,

that looks a lot like https://tracker.ceph.com/issues/40018 where an
exception was thrown when trying to read a socket's remote_endpoint().
i didn't think that local_endpoint() could fail the same way, but i've
opened https://tracker.ceph.com/issues/57784 to track this and the fix
should look the same

On Thu, Oct 6, 2022 at 12:12 PM Boris Behrens  wrote:
>
> Any ideas on this?
>
> Am So., 2. Okt. 2022 um 00:44 Uhr schrieb Boris Behrens :
>
> > Hi,
> > we are experiencing that the rgw daemons crash and I don't understand why,
> > Maybe someone here can lead me to a point where I can dig further.
> >
> > {
> > "backtrace": [
> > "(()+0x43090) [0x7f143ca06090]",
> > "(gsignal()+0xcb) [0x7f143ca0600b]",
> > "(abort()+0x12b) [0x7f143c9e5859]",
> > "(()+0x9e911) [0x7f1433441911]",
> > "(()+0xaa38c) [0x7f143344d38c]",
> > "(()+0xaa3f7) [0x7f143344d3f7]",
> > "(()+0xaa6a9) [0x7f143344d6a9]",
> > "(boost::asio::detail::do_throw_error(boost::system::error_code
> > const&, char const*)+0x96) [0x7f143ce73c76]",
> > "(boost::asio::basic_socket > boost::asio::io_context::executor_type>::local_endpoint() const+0x134)
> > [0x7f143cf3d914]",
> > "(()+0x36e355) [0x7f143cf23355]",
> > "(()+0x36fa59) [0x7f143cf24a59]",
> > "(()+0x36fbbc) [0x7f143cf24bbc]",
> > "(make_fcontext()+0x2f) [0x7f143d69958f]"
> > ],
> > "ceph_version": "15.2.17",
> > "crash_id":
> > "2022-10-01T09:55:55.134763Z_dfb496e9-a789-4471-a087-2a6405aa07df",
> > "entity_name": "",
> > "os_id": "ubuntu",
> > "os_name": "Ubuntu",
> > "os_version": "20.04.4 LTS (Focal Fossa)",
> > "os_version_id": "20.04",
> > "process_name": "radosgw",
> > "stack_sig":
> > "29b20e8702f17ff69135a92fc83b17dbee9b12ba5756ad5992c808c783c134ca",
> > "timestamp": "2022-10-01T09:55:55.134763Z",
> > "utsname_hostname": "",
> > "utsname_machine": "x86_64",
> > "utsname_release": "5.4.0-100-generic",
> > "utsname_sysname": "Linux",
> > "utsname_version": "#113-Ubuntu SMP Thu Feb 3 18:43:29 UTC 2022"
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> > groüen Saal.
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rgw compression any experience?

2022-10-17 Thread Casey Bodley
On Mon, Oct 17, 2022 at 6:12 AM Szabo, Istvan (Agoda)
 wrote:
>
> Hi,
>
> I’m looking in ceph octopus in my existing cluster to have object compression.
> Any feedback/experience appreciated.
> Also I’m curious is it possible to set after cluster setup or need to setup 
> at the beginning?

it's fine to enable compression after deployment. existing objects can
still be read, but only the newly-written objects will be compressed

>
> Thank you
>
> 
> This message is confidential and is for the sole use of the intended 
> recipient(s). It may also be privileged or otherwise protected by copyright 
> or other legal rules. If you have received it by mistake please let us know 
> by reply email and delete it from your system. It is prohibited to copy this 
> message or disclose its content to anyone. Any confidentiality or privilege 
> is not waived or lost by any mistaken delivery or unauthorized disclosure of 
> the message. All messages sent to and from Agoda may be monitored to ensure 
> compliance with company policies, to protect the company's interests and to 
> remove potential malware. Electronic messages may be intercepted, amended, 
> lost or deleted, or contain viruses.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Too strong permission for RGW in OpenStack

2022-10-18 Thread Casey Bodley
On Tue, Oct 18, 2022 at 4:01 AM Michal Strnad  wrote:
>
> Hi.
>
> We have ceph cluster with a lot of users who use S3 and RBD protocols.
> Now we need to give access to one use group with OpenStack, so they run
> RGW on their side, but we have to set "ceph caps" for this RGW. In the
> documentation for OpenStack is following
>
> ceph auth get-or-create client.radosgw osd 'allow rwx' mon 'allow rwx'
> -o /etc/ceph/ceph.client.radosgw.keyring
>
> which means full permission. Can we limit the permission somehow so RGW
> from OpenStack cannot reach the data of other users? Would it be enough
> if RGW has only some swift account?

the radosgw process requires those caps to read and write from the
ceph cluster. the S3 and Swift protocols have their own models for
access control, separate from these ceph caps. by default, buckets are
not shared between rgw users. you can use ACLs or S3 bucket policy to
grant access to other users

>
> I would appreciate any advice.
>
> Best regards,
> Michal Strnad
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting Minutes - 2022 Oct 19

2022-10-19 Thread Casey Bodley
only one agenda item discussed today:
* 17.2.5 is almost ready, Upgrade testing has been completed in
upstream gibba and LRC clusters!

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting Minutes - 2022 Oct 26

2022-10-26 Thread Casey Bodley
lab issues blocking centos container builds and teuthology testing:
* https://tracker.ceph.com/issues/57914
* delays testing for 16.2.11

upcoming events:
* Ceph Developer Monthly (APAC) next week, please add topics:
https://tracker.ceph.com/projects/ceph/wiki/CDM_02-NOV-2022
* Ceph Virtual 2022 starts next Thursday:
https://ceph.io/en/community/events/2022/ceph-virtual/

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Configuring rgw connection timeouts

2022-11-16 Thread Casey Bodley
hi Thilo, you can find a 'request_timeout_ms' frontend option
documented in https://docs.ceph.com/en/quincy/radosgw/frontends/

On Wed, Nov 16, 2022 at 12:32 PM Thilo-Alexander Ginkel
 wrote:
>
> Hi there,
>
> we are using Ceph Quincy's rgw S3 API to retrieve one file ("GET") over a
> longer time period (i.e., reads alternate with periods of no activity).
>
> Eventually the connection is closed by the rgw before the file has been
> completely read.
>
> Is there a way to increase the read (?) timeout to keep the connection
> alive despite the intermittent read inactivity?
>
> Thanks & kind regards,
> Thilo
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Configuring rgw connection timeouts

2022-11-17 Thread Casey Bodley
it doesn't look like cephadm supports extra frontend options during
deployment. but these are stored as part of the `rgw_frontends` config
option, so you can use a command like 'ceph config set' after
deployment to add request_timeout_ms

On Thu, Nov 17, 2022 at 11:18 AM Thilo-Alexander Ginkel
 wrote:
>
> Hi Casey,
>
> one followup question: We are using cephadm to deploy our Ceph cluster. How 
> would we configure the timeout setting using a service spec through cephadm?
>
> Thanks,
> Thilo

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: failure resharding radosgw bucket

2022-11-23 Thread Casey Bodley
hi Jan,

On Wed, Nov 23, 2022 at 12:45 PM Jan Horstmann  wrote:
>
> Hi list,
> I am completely lost trying to reshard a radosgw bucket which fails
> with the error:
>
> process_single_logshard: Error during resharding bucket
> 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs:(2) No such file or
> directory
>
> But let me start from the beginning. We are running a ceph cluster
> version 15.2.17. Recently we received a health warning because of
> "large omap objects". So I grepped through the logs to get more
> information about the object and then mapped that to a radosgw bucket
> instance ([1]).
> I believe this should normally be handled by dynamic resharding of the
> bucket, which has already been done 23 times for this bucket ([2]).
> For recent resharding tries the radosgw is logging the error mentioned
> at the beginning. I tried to reshard manually by following the process
> in [3], but that consequently leads to the same error.
> When running the reshard with debug options ( --debug-rgw=20 --debug-
> ms=1) I can get some additional insight on where exactly the failure
> occurs:
>
> 2022-11-23T10:41:20.754+ 7f58cf9d2080  1 --
> 10.38.128.3:0/1221656497 -->
> [v2:10.38.128.6:6880/44286,v1:10.38.128.6:6881/44286] --
> osd_op(unknown.0.0:46 5.6 5:66924383:reshard::reshard.05:head
> [call rgw.reshard_get in=149b] snapc 0=[]
> ondisk+read+known_if_redirected e44374) v8 -- 0x56092dd46a10 con
> 0x56092dcfd7a0
> 2022-11-23T10:41:20.754+ 7f58bb889700  1 --
> 10.38.128.3:0/1221656497 <== osd.210 v2:10.38.128.6:6880/44286 4 
> osd_op_reply(46 reshard.05 [call] v0'0 uv1180019 ondisk = -2
> ((2) No such file or directory)) v8  162+0+0 (crc 0 0 0)
> 0x7f58b00dc020 con 0x56092dcfd7a0
>
>
> I am not sure how to interpret this and how to debug this any further.
> Of course I can provide the full output if that helps.
>
> Thanks and regards,
> Jan
>
> [1]
> root@ceph-mon1:~# grep -r 'Large omap object found. Object'
> /var/log/ceph/ceph.log
> 2022-11-15T14:47:28.900679+ osd.47 (osd.47) 10890 : cluster [WRN]
> Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> 336457 Size (bytes): 117560231
> 2022-11-17T04:51:43.593811+ osd.50 (osd.50) 90 : cluster [WRN]
> Large omap object found. Object: 3:0de49b75:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.10:head PG: 3.aed927b0 (3.30) Key count:
> 205346 Size (bytes): 71669614
> 2022-11-18T02:55:07.182419+ osd.47 (osd.47) 10917 : cluster [WRN]
> Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> 449776 Size (bytes): 157310435
> 2022-11-19T09:56:47.630679+ osd.29 (osd.29) 114 : cluster [WRN]
> Large omap object found. Object: 3:61ad76c5:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.12:head PG: 3.a36eb586 (3.6) Key count:
> 213843 Size (bytes): 74703544
> 2022-11-20T13:04:39.979349+ osd.72 (osd.72) 83 : cluster [WRN]
> Large omap object found. Object: 3:2b3227e7:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.22:head PG: 3.e7e44cd4 (3.14) Key count:
> 326676 Size (bytes): 114453145
> 2022-11-21T02:53:32.410698+ osd.50 (osd.50) 151 : cluster [WRN]
> Large omap object found. Object: 3:0de49b75:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.10:head PG: 3.aed927b0 (3.30) Key count:
> 216764 Size (bytes): 75674839
> 2022-11-22T18:04:09.757825+ osd.47 (osd.47) 10964 : cluster [WRN]
> Large omap object found. Object: 3:9660022b:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.9:head PG: 3.d4400669 (3.29) Key count:
> 449776 Size (bytes): 157310435
> 2022-11-23T00:44:55.316254+ osd.29 (osd.29) 163 : cluster [WRN]
> Large omap object found. Object: 3:61ad76c5:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.12:head PG: 3.a36eb586 (3.6) Key count:
> 213843 Size (bytes): 74703544
> 2022-11-23T09:10:07.842425+ osd.55 (osd.55) 13968 : cluster [WRN]
> Large omap object found. Object: 3:3fa378c9:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.20:head PG: 3.931ec5fc (3.3c) Key count:
> 219204 Size (bytes): 76509687
> 2022-11-23T09:11:15.516973+ osd.72 (osd.72) 112 : cluster [WRN]
> Large omap object found. Object: 3:2b3227e7:::.dir.ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19.22:head PG: 3.e7e44cd4 (3.14) Key count:
> 326676 Size (bytes): 114453145
> root@ceph-mon1:~# radosgw-admin metadata list "bucket.instance" | grep
> ee3fa6a3-4af3-4ac2-86c2-d2c374080b54.63073818.19
> "68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs:ee3fa6a3-4af3-4ac2-
> 86c2-d2c374080b54.63073818.19",
>
> [2]
> root@ceph-mon1:~# radosgw-admin bucket stats --bucket
> 68ddc61c613a4e3096ca8c349ee37f56/snapshotnfs
> {
> "bucket": "snapshotnfs",
> "num_shards": 23,
> "tenant": "68ddc61c613a4e3096ca8c349ee37f56",
> "zonegroup": "bf22bf53-c135-450b-946f-97e16d1bc326",
> "plac

[ceph-users] Re: 16.2.11 pacific QE validation status

2022-12-20 Thread Casey Bodley
thanks Yuri, rgw approved based on today's results from
https://pulpito.ceph.com/yuriw-2022-12-20_15:27:49-rgw-pacific_16.2.11_RC2-distro-default-smithi/

On Mon, Dec 19, 2022 at 12:08 PM Yuri Weinstein  wrote:

> If you look at the pacific 16.2.8 QE validation history (
> https://tracker.ceph.com/issues/55356), we had pacific-x, nautilus-x, and
> pacific-p2p all green with one exception (
> https://tracker.ceph.com/issues/51652)
>
> Now we see so many failures in this point release with references to old
> issues.
>
> Is there anything we can fix to make them less "red"?
>
> Thx
> YuriW
>
> On Thu, Dec 15, 2022 at 2:56 PM Laura Flores  wrote:
>
>> I reviewed the upgrade runs:
>>
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_15:57:57-upgrade:nautilus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_21:47:46-upgrade:nautilus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_15:58:18-upgrade:octopus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-14_15:41:10-upgrade:octopus-x-pacific_16.2.11_RC-distro-default-smithi/
>>
>> Failures:
>>   1. https://tracker.ceph.com/issues/50618 -- known bug assigned to
>> Ilya; assuming it's not a big deal since it's been around for over a year
>>
>> Details:
>>   1. qemu_xfstests_luks1 failed on xfstest 168 - Ceph - RBD
>>
>>
>>
>> https://pulpito.ceph.com/yuriw-2022-12-13_15:58:24-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/
>>
>> https://pulpito.ceph.com/yuriw-2022-12-14_15:40:37-upgrade:pacific-p2p-pacific_16.2.11_RC-distro-default-smithi/
>>
>> Failures, unrelated:
>>   1. https://tracker.ceph.com/issues/58223 -- new failure reported by me
>> 7 days ago; seems infrastructure related and not regression-related
>>   2. https://tracker.ceph.com/issues/52590 -- closed by Casey; must not
>> be of importance
>>   3. https://tracker.ceph.com/issues/58289 -- new failure raised by me
>> today; seems related to other "wait_for_recovery" failures, which are
>> generally not cause for concern since they're so infrequent.
>>   4. https://tracker.ceph.com/issues/51652 -- known bug from over a year
>> ago
>>
>> Details;
>>   1. failure on `sudo fuser -v /var/lib/dpkg/lock-frontend` -
>> Infrastructure
>>   2. "[ FAILED ] CmpOmap.cmp_vals_u64_invalid_default" in
>> upgrade:pacific-p2p-pacific - Ceph - RGW
>>   3. "AssertionError: wait_for_recovery: failed before timeout expired"
>> from down pg in pacific-p2p-pacific - Ceph - RADOS
>>   4. heartbeat timeouts on filestore OSDs while deleting objects in
>> upgrade:pacific-p2p-pacific - Ceph - RADOS
>>
>> On Thu, Dec 15, 2022 at 4:34 PM Brad Hubbard  wrote:
>>
>>> On Fri, Dec 16, 2022 at 3:15 AM Yuri Weinstein 
>>> wrote:
>>> >
>>> > Details of this release are summarized here:
>>> >
>>> > https://tracker.ceph.com/issues/58257#note-1
>>> > Release Notes - TBD
>>> >
>>> > Seeking approvals for:
>>> >
>>> > rados - Neha (https://github.com/ceph/ceph/pull/49431 is still being
>>> > tested and will be merged soon)
>>> > rook - Sébastien Han
>>> > cephadm - Adam
>>> > dashboard - Ernesto
>>> > rgw - Casey (rwg will be rerun on the latest SHA1)
>>> > rbd - Ilya, Deepika
>>> > krbd - Ilya, Deepika
>>> > fs - Venky, Patrick
>>> > upgrade/nautilus-x (pacific) - Neha, Laura
>>> > upgrade/octopus-x (pacific) - Neha, Laura
>>> > upgrade/pacific-p2p - Neha - Neha, Laura
>>> > powercycle - Brad
>>>
>>> The failure here is due to fallout from the recent lab issues and was
>>> fixed in main by https://github.com/ceph/ceph/pull/49021 I'm waiting
>>> to see if there are plans to backport this to pacific and quincy since
>>> that will be needed.
>>>
>>> > ceph-volume - Guillaume, Adam K
>>> >
>>> > Thx
>>> > YuriW
>>> >
>>> > ___
>>> > Dev mailing list -- d...@ceph.io
>>> > To unsubscribe send an email to dev-le...@ceph.io
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Brad
>>>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
>>
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage
>>
>> Red Hat Inc. 
>>
>> Chicago, IL
>>
>> lflo...@redhat.com
>> M: +17087388804
>> @RedHat    Red Hat
>>   Red Hat
>> 
>> 
>>
>> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-20 Thread Casey Bodley
On Fri, Jan 20, 2023 at 11:39 AM Yuri Weinstein  wrote:
>
> The overall progress on this release is looking much better and if we
> can approve it we can plan to publish it early next week.
>
> Still seeking approvals
>
> rados - Neha, Laura
> rook - Sébastien Han
> cephadm - Adam
> dashboard - Ernesto
> rgw - Casey

+1 rgw still approved

> rbd - Ilya (full rbd run in progress now)
> krbd - Ilya
> fs - Venky, Patrick
> upgrade/nautilus-x (pacific) - passed thx Adam Kraitman!
> upgrade/octopus-x (pacific) - almost passed, still running 1 job
> upgrade/pacific-p2p - Neha (same as in 16.2.8)
> powercycle - Brad (see new SELinux denials)
>
> On Tue, Jan 17, 2023 at 10:45 AM Yuri Weinstein  wrote:
> >
> > OK I will rerun failed jobs filtering rhel in
> >
> > Thx!
> >
> > On Tue, Jan 17, 2023 at 10:43 AM Adam Kraitman  wrote:
> > >
> > > Hey the satellite issue was fixed
> > >
> > > Thanks
> > >
> > > On Tue, Jan 17, 2023 at 7:43 PM Laura Flores  wrote:
> > >>
> > >> This was my summary of rados failures. There was nothing new or amiss,
> > >> although it is important to note that runs were done with filtering out
> > >> rhel 8.
> > >>
> > >> I will leave it to Neha for final approval.
> > >>
> > >> Failures:
> > >> 1. https://tracker.ceph.com/issues/58258
> > >> 2. https://tracker.ceph.com/issues/58146
> > >> 3. https://tracker.ceph.com/issues/58458
> > >> 4. https://tracker.ceph.com/issues/57303
> > >> 5. https://tracker.ceph.com/issues/54071
> > >>
> > >> Details:
> > >> 1. rook: kubelet fails from connection refused - Ceph - Orchestrator
> > >> 2. test_cephadm.sh: Error: Error initializing source docker://
> > >> quay.ceph.io/ceph-ci/ceph:master - Ceph - Orchestrator
> > >> 3. qa/workunits/post-file.sh: postf...@drop.ceph.com: Permission 
> > >> denied
> > >> - Ceph
> > >> 4. rados/cephadm: Failed to fetch package version from
> > >> https://shaman.ceph.com/api/search/?status=ready&project=ceph&flavor=default&distros=ubuntu%2F22.04%2Fx86_64&sha1=b34ca7d1c2becd6090874ccda56ef4cd8dc64bf7
> > >> - Ceph - Orchestrator
> > >> 5. rados/cephadm/osds: Invalid command: missing required parameter
> > >> hostname() - Ceph - Orchestrator
> > >>
> > >> On Tue, Jan 17, 2023 at 9:48 AM Yuri Weinstein  
> > >> wrote:
> > >>
> > >> > Please see the test results on the rebased RC 6.6 in this comment:
> > >> >
> > >> > https://tracker.ceph.com/issues/58257#note-2
> > >> >
> > >> > We're still having infrastructure issues making testing difficult.
> > >> > Therefore all reruns were done excluding the rhel 8 distro
> > >> > ('--filter-out rhel_8')
> > >> >
> > >> > Also, the upgrades failed and Adam is looking into this.
> > >> >
> > >> > Seeking new approvals
> > >> >
> > >> > rados - Neha, Laura
> > >> > rook - Sébastien Han
> > >> > cephadm - Adam
> > >> > dashboard - Ernesto
> > >> > rgw - Casey
> > >> > rbd - Ilya
> > >> > krbd - Ilya
> > >> > fs - Venky, Patrick
> > >> > upgrade/nautilus-x (pacific) - Adam Kraitman
> > >> > upgrade/octopus-x (pacific) - Adam Kraitman
> > >> > upgrade/pacific-p2p - Neha - Adam Kraitman
> > >> > powercycle - Brad
> > >> >
> > >> > Thx
> > >> >
> > >> > On Fri, Jan 6, 2023 at 8:37 AM Yuri Weinstein  
> > >> > wrote:
> > >> > >
> > >> > > Happy New Year all!
> > >> > >
> > >> > > This release remains to be in "progress"/"on hold" status as we are
> > >> > > sorting all infrastructure-related issues.
> > >> > >
> > >> > > Unless I hear objections, I suggest doing a full rebase/retest QE
> > >> > > cycle (adding PRs merged lately) since it's taking much longer than
> > >> > > anticipated when sepia is back online.
> > >> > >
> > >> > > Objections?
> > >> > >
> > >> > > Thx
> > >> > > YuriW
> > >> > >
> > >> > > On Thu, Dec 15, 2022 at 9:14 AM Yuri Weinstein 
> > >> > wrote:
> > >> > > >
> > >> > > > Details of this release are summarized here:
> > >> > > >
> > >> > > > https://tracker.ceph.com/issues/58257#note-1
> > >> > > > Release Notes - TBD
> > >> > > >
> > >> > > > Seeking approvals for:
> > >> > > >
> > >> > > > rados - Neha (https://github.com/ceph/ceph/pull/49431 is still 
> > >> > > > being
> > >> > > > tested and will be merged soon)
> > >> > > > rook - Sébastien Han
> > >> > > > cephadm - Adam
> > >> > > > dashboard - Ernesto
> > >> > > > rgw - Casey (rwg will be rerun on the latest SHA1)
> > >> > > > rbd - Ilya, Deepika
> > >> > > > krbd - Ilya, Deepika
> > >> > > > fs - Venky, Patrick
> > >> > > > upgrade/nautilus-x (pacific) - Neha, Laura
> > >> > > > upgrade/octopus-x (pacific) - Neha, Laura
> > >> > > > upgrade/pacific-p2p - Neha - Neha, Laura
> > >> > > > powercycle - Brad
> > >> > > > ceph-volume - Guillaume, Adam K
> > >> > > >
> > >> > > > Thx
> > >> > > > YuriW
> > >> > ___
> > >> > Dev mailing list -- d...@ceph.io
> > >> > To unsubscribe send an email to dev-le...@ceph.io
> > >> >
> > >>
> > >>
> > >> --
> > >>
> > >> Laura Flores
> > >>
> > >> She/Her/Hers
> > >>
> > >> Software Engineer,

[ceph-users] CLT meeting summary 2023-02-01

2023-02-01 Thread Casey Bodley
distro testing for reef
* https://github.com/ceph/ceph/pull/49443 adds centos9 and ubuntu22 to
supported distros
* centos9 blocked by teuthology bug https://tracker.ceph.com/issues/58491
  - lsb_release command no longer exists, use /etc/os-release instead
  - ceph stopped depending on lsb_release in 2021 with
https://github.com/ceph/ceph/pull/42770
* ubuntu22 not blocked by teuthology, but the new python version
breaks most of the rgw tests

can we drop centos8 or ubuntu20 support for reef?
* we usually support the latest centos and two ubuntu LTSs
* users need an upgrade path that doesn't require OS and ceph upgrade
at the same time
* we might be able to drop centos8 support for Reef by adding centos9
support to Quincy
* python versioning issues make longer-term support of older distros
problematic. related work:
  - https://github.com/ceph/ceph/pull/41979
  - https://github.com/ceph/ceph/pull/47501

ondisk format changes in minor releases
* https://github.com/ceph/ceph/pull/48915 introduced some BlueFS log
changes in 16.2.11 which makes it incompatible with previous Pacific
releases. Hence no downgrade is permitted any more.
  - doc text tracked in https://tracker.ceph.com/issues/58625
* how do we prevent these issues in the future?
  - better testing of mixed-version rgw/mds/mgr/etc

infrastructure update
* a planned network outage yesterday still affecting LRC
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrate a bucket from replicated pool to ec pool

2023-02-11 Thread Casey Bodley
hi Boris,

On Sat, Feb 11, 2023 at 7:07 AM Boris Behrens  wrote:
>
> Hi,
> we use rgw as our backup storage, and it basically holds only compressed
> rbd snapshots.
> I would love to move these out of the replicated into a ec pool.
>
> I've read that I can set a default placement target for a user (
> https://docs.ceph.com/en/octopus/radosgw/placement/). What does happen to
> the existing user data?

changes to the user's default placement target/storage class don't
apply to existing buckets, only newly-created ones. a bucket's default
placement target/storage class can't be changed after creation

>
> How do I move the existing data to the new pool?

you might add the EC pool as a new storage class in the existing
placement target, and use lifecycle transitions to move the objects.
but the bucket's default storage class would still be replicated, so
new uploads would go there unless the client adds a
x-amz-storage-class header to override it. if you want to change those
defaults, you'd need to create a new bucket and copy the objects over

> Does it somehow interfere with ongoing data upload (it is one internal
> user, with 800 buckets which constantly get new data and old data removed)?

lifecycle transitions would be transparent to the user, but migrating
to new buckets would not

>
> Cheers
>  Boris
>
> ps: Can't wait to see some of you at the cephalocon :)
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migrate a bucket from replicated pool to ec pool

2023-02-13 Thread Casey Bodley
On Mon, Feb 13, 2023 at 4:31 AM Boris Behrens  wrote:
>
> Hi Casey,
>
>> changes to the user's default placement target/storage class don't
>> apply to existing buckets, only newly-created ones. a bucket's default
>> placement target/storage class can't be changed after creation
>
>
> so I can easily update the placement rules for this user and can migrate 
> existing buckets one at a time. Very cool. Thanks
>
>>
>> you might add the EC pool as a new storage class in the existing
>> placement target, and use lifecycle transitions to move the objects.
>> but the bucket's default storage class would still be replicated, so
>> new uploads would go there unless the client adds a
>> x-amz-storage-class header to override it. if you want to change those
>> defaults, you'd need to create a new bucket and copy the objects over
>
>
> Can you link me to documentation. It might be the monday, but I do not 
> understand that totally.

https://docs.ceph.com/en/octopus/radosgw/placement/#adding-a-storage-class
should cover the addition of a new storage class for your EC pool

>
> Do you know how much more CPU/RAM EC takes, and when (putting, reading, 
> deleting objects, recovering OSD failure)?

i don't have any data on that myself. maybe others on the list can share theirs?

>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im 
> groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [RGW - octopus] too many omapkeys on versioned bucket

2023-02-13 Thread Casey Bodley
On Mon, Feb 13, 2023 at 8:41 AM Boris Behrens  wrote:
>
> I've tried it the other way around and let cat give out all escaped chars
> and the did the grep:
>
> # cat -A omapkeys_list | grep -aFn '/'
> 9844:/$
> 9845:/^@v913^@$
> 88010:M-^@1000_/^@$
> 128981:M-^@1001_/$
>
> Did anyone ever saw something like this?
>
> Am Mo., 13. Feb. 2023 um 14:31 Uhr schrieb Boris Behrens :
>
> > So here is some more weirdness:
> > I've piped a list of all omapkeys into a file: (dedacted customer data
> > with placeholders in <>)
> >
> > # grep -aFn '//' omapkeys_list
> > 9844://
> > 9845://v913
> > 88010:�1000_//
> > 128981:�1001_//
> >
> > # grep -aFn '/'
> > omapkeys_list
> > 
> >
> > # vim omapkeys_list +88010 (copy pasted from terminal)
> > <80>1000_//^@
> >
> > Any idea what this is?
> >
> > Am Mo., 13. Feb. 2023 um 13:57 Uhr schrieb Boris Behrens :
> >
> >> Hi,
> >> I have one bucket that showed up with a large omap warning, but the
> >> amount of objects in the bucket, does not align with the amount of omap
> >> keys. The bucket is sharded to get rid of the "large omapkeys" warning.
> >>
> >> I've counted all the omapkeys of one bucket and it came up with 33.383.622
> >> (rados -p INDEXPOOL listomapkeys INDEXOBJECT | wc -l)
> >> I've checked the amount of actual rados objects and it came up with
> >> 17.095.877
> >> (rados -p DATAPOOL ls | grep BUCKETMARKER | wc -l)
> >> I've checked the bucket index and it came up with 16.738.482
> >> (radosgw-admin bi list --bucket BUCKET | grep -F '"idx":' | wc -l)
> >>
> >> I have tried to fix it with
> >> radosgw-admin bucket check --check-objects --fix --bucket BUCKET
> >> but this did not change anything.
> >>
> >> Is this a known bug or might there be something else going on. How can I
> >> investigate further?
> >>
> >> Cheers
> >>  Boris
> >> --
> >> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> >> groüen Saal.
> >>
> >
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> > groüen Saal.
> >
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

hi Boris,

the bucket index is more complicated for versioned buckets than normal
ones. i wrote a high-level summary of this in
https://docs.ceph.com/en/latest/dev/radosgw/bucket_index/#s3-object-versioning

each object version may have additional keys starting with 1000_ and
1001_. the keys starting with 1000_ are sorted by time (most recent
version first), and the 1001_ keys correspond to the ‘olh' entry. the
output of `radosgw-admin bi list` should distinguish between these
index entry types using the names "plain", "instance", and "olh"

it's hard to tell from your email whether there's anything wrong, but
i hope this helps with your debugging
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OpenSSL in librados

2023-02-26 Thread Casey Bodley
On Sun, Feb 26, 2023 at 8:20 AM Ilya Dryomov  wrote:
>
> On Sun, Feb 26, 2023 at 2:15 PM Patrick Schlangen  
> wrote:
> >
> > Hi Ilya,
> >
> > > Am 26.02.2023 um 14:05 schrieb Ilya Dryomov :
> > >
> > > Isn't OpenSSL 1.0 long out of support?  I'm not sure if extending
> > > librados API to support a workaround for something that went EOL over
> > > three years ago is worth it.
> >
> > fair point. However, as long as ceph still supports compiling against 
> > OpenSSL 1.0 and has special code paths to initialize OpenSSL for versions 
> > <= 1.0, I think this should be fixed. The other option would be to remove 
> > OpenSSL 1.0 support completely.
> >
> > What do you think?
>
> Removing OpenSSL 1.0 support is fine with me but it would need a wider
> discussion.  I'm CCing the development list.
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

if librados still works with openssl 1.0 when you're not using it
elsewhere in the process, i don't see a compelling reason to break
that. maybe just add a #warning about it to librados.h?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CompleteMultipartUploadResult has empty ETag response

2023-02-28 Thread Casey Bodley
On Tue, Feb 28, 2023 at 8:19 AM Lars Dunemark  wrote:
>
> Hi,
>
> I notice that CompleteMultipartUploadResult does return an empty ETag
> field when completing an multipart upload in v17.2.3.
>
> I haven't had the possibility to verify from which version this changed
> and can't find in the changelog that it should be fixed in newer version.
>
> The response looks like:
>
> 
> http://s3.amazonaws.com/doc/2006-03-01/ 
>  ">
>  s3.myceph.com/test-bucket/test.file
>  test-bucket
>  test.file
>  
> 
>
> I have found a old issue that is closed around 9 years ago with the same
> issue so I guess that this has been fixed before.
> https://tracker.ceph.com/issues/6830 
>
> It looks like my account to the tracker is still not activated so I
> can't create or comment on the issue.

thanks Lars, i've opened https://tracker.ceph.com/issues/58879 to
track the regression

>
> Best regards,
> Lars Dunemark
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-22 Thread Casey Bodley
On Tue, Mar 21, 2023 at 4:06 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The reruns were in the queue for 4 days because of some slowness issues.
> The core team (Neha, Radek, Laura, and others) are trying to narrow
> down the root cause.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> the core)
> rgw - Casey

there were some java_s3test failures related to
https://tracker.ceph.com/issues/58554. i've added the fix to
https://github.com/ceph/java_s3tests/commits/ceph-quincy, so a rerun
should resolve those failures
there were also some 'Failed to fetch package version' failures in the
rerun that warranted another rerun anyway

there's also an urgent priority bug fix in
https://github.com/ceph/ceph/pull/50625 that i'd really like to add to
this release; sorry for the late notice

> fs - Venky (the fs suite has an unusually high amount of failed jobs,
> any reason to suspect it in the observed slowness?)
> orch - Adam King
> rbd - Ilya
> krbd - Ilya
> upgrade/octopus-x - Laura is looking into failures
> upgrade/pacific-x - Laura is looking into failures
> upgrade/quincy-p2p - Laura is looking into failures
> client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> is looking into it
> powercycle - Brad
> ceph-volume - needs a rerun on merged
> https://github.com/ceph/ceph-ansible/pull/7409
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Also, share any findings or hypnosis about the slowness in the
> execution of the suite.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> Thx
> YuriW
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-23 Thread Casey Bodley
hi Ernesto and lists,

> [1] https://github.com/ceph/ceph/pull/47501

are we planning to backport this to quincy so we can support centos 9
there? enabling that upgrade path on centos 9 was one of the
conditions for dropping centos 8 support in reef, which i'm still keen
to do

if not, can we find another resolution to
https://tracker.ceph.com/issues/58832? as i understand it, all of
those python packages exist in centos 8. do we know why they were
dropped for centos 9? have we looked into making those available in
epel? (cc Ken and Kaleb)

On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  wrote:
>
> Hi Kevin,
>
>>
>> Isn't this one of the reasons containers were pushed, so that the packaging 
>> isn't as big a deal?
>
>
> Yes, but the Ceph community has a strong commitment to provide distro 
> packages for those users who are not interested in moving to containers.
>
>> Is it the continued push to support lots of distros without using containers 
>> that is the problem?
>
>
> If not a problem, it definitely makes it more challenging. Compiled 
> components often sort this out by statically linking deps whose packages are 
> not widely available in distros. The approach we're proposing here would be 
> the closest equivalent to static linking for interpreted code (bundling).
>
> Thanks for sharing your questions!
>
> Kind regards,
> Ernesto
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-23 Thread Casey Bodley
On Wed, Mar 22, 2023 at 9:27 AM Casey Bodley  wrote:
>
> On Tue, Mar 21, 2023 at 4:06 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The reruns were in the queue for 4 days because of some slowness issues.
> > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > down the root cause.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > the core)
> > rgw - Casey
>
> there were some java_s3test failures related to
> https://tracker.ceph.com/issues/58554. i've added the fix to
> https://github.com/ceph/java_s3tests/commits/ceph-quincy, so a rerun
> should resolve those failures
> there were also some 'Failed to fetch package version' failures in the
> rerun that warranted another rerun anyway
>
> there's also an urgent priority bug fix in
> https://github.com/ceph/ceph/pull/50625 that i'd really like to add to
> this release; sorry for the late notice

this fix merged, so rgw is now approved. thanks Yuri

>
> > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > any reason to suspect it in the observed slowness?)
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/octopus-x - Laura is looking into failures
> > upgrade/pacific-x - Laura is looking into failures
> > upgrade/quincy-p2p - Laura is looking into failures
> > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > is looking into it
> > powercycle - Brad
> > ceph-volume - needs a rerun on merged
> > https://github.com/ceph/ceph-ansible/pull/7409
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Also, share any findings or hypnosis about the slowness in the
> > execution of the suite.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-03-27 Thread Casey Bodley
On Fri, Mar 24, 2023 at 3:46 PM Yuri Weinstein  wrote:
>
> Details of this release are updated here:
>
> https://tracker.ceph.com/issues/59070#note-1
> Release Notes - TBD
>
> The slowness we experienced seemed to be self-cured.
> Neha, Radek, and Laura please provide any findings if you have them.
>
> Seeking approvals/reviews for:
>
> rados - Neha, Radek, Travis, Ernesto, Adam King (rerun on Build 2 with
> PRs merged on top of quincy-release)
> rgw - Casey (rerun on Build 2 with PRs merged on top of quincy-release)

rgw approved

> fs - Venky
>
> upgrade/octopus-x - Neha, Laura (package issue Adam Kraitman any updates?)
> upgrade/pacific-x - Neha, Laura, Ilya see 
> https://tracker.ceph.com/issues/58914
> upgrade/quincy-p2p - Neha, Laura
> client-upgrade-octopus-quincy-quincy - Neha, Laura (package issue Adam
> Kraitman any updates?)
> powercycle - Brad
>
> Please reply to this email with approval and/or trackers of known
> issues/PRs to address them.
>
> Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> RC release - pending major suites approvals.
>
> On Tue, Mar 21, 2023 at 1:04 PM Yuri Weinstein  wrote:
> >
> > Details of this release are summarized here:
> >
> > https://tracker.ceph.com/issues/59070#note-1
> > Release Notes - TBD
> >
> > The reruns were in the queue for 4 days because of some slowness issues.
> > The core team (Neha, Radek, Laura, and others) are trying to narrow
> > down the root cause.
> >
> > Seeking approvals/reviews for:
> >
> > rados - Neha, Radek, Travis, Ernesto, Adam King (we still have to test
> > and merge at least one PR https://github.com/ceph/ceph/pull/50575 for
> > the core)
> > rgw - Casey
> > fs - Venky (the fs suite has an unusually high amount of failed jobs,
> > any reason to suspect it in the observed slowness?)
> > orch - Adam King
> > rbd - Ilya
> > krbd - Ilya
> > upgrade/octopus-x - Laura is looking into failures
> > upgrade/pacific-x - Laura is looking into failures
> > upgrade/quincy-p2p - Laura is looking into failures
> > client-upgrade-octopus-quincy-quincy - missing packages, Adam Kraitman
> > is looking into it
> > powercycle - Brad
> > ceph-volume - needs a rerun on merged
> > https://github.com/ceph/ceph-ansible/pull/7409
> >
> > Please reply to this email with approval and/or trackers of known
> > issues/PRs to address them.
> >
> > Also, share any findings or hypnosis about the slowness in the
> > execution of the suite.
> >
> > Josh, Neha - gibba and LRC upgrades pending major suites approvals.
> > RC release - pending major suites approvals.
> >
> > Thx
> > YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-03-27 Thread Casey Bodley
i would hope that packaging for epel9 would be relatively easy, given
that the epel8 packages already exist. as a first step, we'd need to
build a full list of the missing packages. the tracker issue only
complains about python3-asyncssh python3-pecan and python3-routes, but
some of their dependencies may be missing too

On Mon, Mar 27, 2023 at 3:06 PM Ken Dreyer  wrote:
>
> I hope we don't backport such a big change to Quincy. That will have a
> large impact on how we build in restricted environments with no
> internet access.
>
> We could get the missing packages into EPEL.
>
> - Ken
>
> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  wrote:
> >
> > Hi Casey,
> >
> > The original idea was to leave this to Reef alone, but given that the 
> > CentOS 9 Quincy release is also blocked by missing Python packages, I think 
> > that it'd make sense to backport it.
> >
> > I'm coordinating with Pere (in CC) to expedite this. We may need help to 
> > troubleshoot Shaman/rpmbuild issues. Who would be the best one to help with 
> > that?
> >
> > Regarding your last question, I don't know who's the maintainer of those 
> > packages in EPEL. There's this BZ (https://bugzilla.redhat.com/2166620) 
> > requesting that specific package, but that's only one out of the dozen of 
> > missing packages (plus transitive dependencies)...
> >
> > Kind Regards,
> > Ernesto
> >
> >
> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  wrote:
> >>
> >> hi Ernesto and lists,
> >>
> >> > [1] https://github.com/ceph/ceph/pull/47501
> >>
> >> are we planning to backport this to quincy so we can support centos 9
> >> there? enabling that upgrade path on centos 9 was one of the
> >> conditions for dropping centos 8 support in reef, which i'm still keen
> >> to do
> >>
> >> if not, can we find another resolution to
> >> https://tracker.ceph.com/issues/58832? as i understand it, all of
> >> those python packages exist in centos 8. do we know why they were
> >> dropped for centos 9? have we looked into making those available in
> >> epel? (cc Ken and Kaleb)
> >>
> >> On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  wrote:
> >> >
> >> > Hi Kevin,
> >> >
> >> >>
> >> >> Isn't this one of the reasons containers were pushed, so that the 
> >> >> packaging isn't as big a deal?
> >> >
> >> >
> >> > Yes, but the Ceph community has a strong commitment to provide distro 
> >> > packages for those users who are not interested in moving to containers.
> >> >
> >> >> Is it the continued push to support lots of distros without using 
> >> >> containers that is the problem?
> >> >
> >> >
> >> > If not a problem, it definitely makes it more challenging. Compiled 
> >> > components often sort this out by statically linking deps whose packages 
> >> > are not widely available in distros. The approach we're proposing here 
> >> > would be the closest equivalent to static linking for interpreted code 
> >> > (bundling).
> >> >
> >> > Thanks for sharing your questions!
> >> >
> >> > Kind regards,
> >> > Ernesto
> >> > ___
> >> > Dev mailing list -- d...@ceph.io
> >> > To unsubscribe send an email to dev-le...@ceph.io
> >>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RGW don't use .rgw.root multisite configuration

2023-04-11 Thread Casey Bodley
there's a rgw_period_root_pool option for the period objects too. but
it shouldn't be necessary to override any of these

On Sun, Apr 9, 2023 at 11:26 PM  wrote:
>
> Up :)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
>
>
> Hi,
> I see that this PR: https://github.com/ceph/ceph/pull/48030
> made it into ceph 17.2.6, as per the change log  at: 
> https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> But my scenario is as follows:
> I have two clusters set up as multisite. Because of  the lack of replication 
> for IAM roles, we have set things up so that roles on the primary 'manually' 
> get replicated to the secondary site via a python script. Thus, if I create a 
> role on the primary, add/delete users or buckets from said role, the role, 
> including the AssumeRolePolicyDocument and policies, gets pushed to the 
> replicated site. This has served us well for three years.
> With the advent of this fix, what should I do before I upgrade to 17.2.6 
> (currently on 17.2.5, rocky 8)
>
> I know that in my situation, roles of the same name have different RoleIDs on 
> the two sites. What should I do before I upgrade? Possibilities that *could* 
> happen if i dont rectify things as we upgrade:
> 1. The different RoleIDs lead to two roles of the same name on the replicated 
> site, perhaps with the system unable to address/look at/modify either
> 2. Roles just don't get repiicated to the second site

no replication would happen until the metadata changes again on the
primary zone. once that gets triggered, the role metadata would
probably fail to sync due to the name conflicts

>
> or other similar situations, all of which I want to avoid.
> Perhaps the safest thing to do is to remove all roles on the secondary site, 
> upgrade, and then force a replication of roles (How would I *force* that for 
> iAM roles if it is the correct answer?)

this removal will probably be necessary to avoid those conflicts. once
that's done, you can force a metadata full sync on the secondary zone
by running 'radosgw-admin metadata sync init' there, then restarting
its gateways. this will have to resync all of the bucket and user
metadata as well

> Here is the original bug report:
>
> https://tracker.ceph.com/issues/57364
> Thanks!
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph 17.2.6 and iam roles (pr#48030)

2023-04-11 Thread Casey Bodley
On Tue, Apr 11, 2023 at 3:53 PM Casey Bodley  wrote:
>
> On Tue, Apr 11, 2023 at 3:19 PM Christopher Durham  wrote:
> >
> >
> > Hi,
> > I see that this PR: https://github.com/ceph/ceph/pull/48030
> > made it into ceph 17.2.6, as per the change log  at: 
> > https://docs.ceph.com/en/latest/releases/quincy/  That's great.
> > But my scenario is as follows:
> > I have two clusters set up as multisite. Because of  the lack of 
> > replication for IAM roles, we have set things up so that roles on the 
> > primary 'manually' get replicated to the secondary site via a python 
> > script. Thus, if I create a role on the primary, add/delete users or 
> > buckets from said role, the role, including the AssumeRolePolicyDocument 
> > and policies, gets pushed to the replicated site. This has served us well 
> > for three years.
> > With the advent of this fix, what should I do before I upgrade to 17.2.6 
> > (currently on 17.2.5, rocky 8)
> >
> > I know that in my situation, roles of the same name have different RoleIDs 
> > on the two sites. What should I do before I upgrade? Possibilities that 
> > *could* happen if i dont rectify things as we upgrade:
> > 1. The different RoleIDs lead to two roles of the same name on the 
> > replicated site, perhaps with the system unable to address/look at/modify 
> > either
> > 2. Roles just don't get repiicated to the second site
>
> no replication would happen until the metadata changes again on the
> primary zone. once that gets triggered, the role metadata would
> probably fail to sync due to the name conflicts
>
> >
> > or other similar situations, all of which I want to avoid.
> > Perhaps the safest thing to do is to remove all roles on the secondary 
> > site, upgrade, and then force a replication of roles (How would I *force* 
> > that for iAM roles if it is the correct answer?)
>
> this removal will probably be necessary to avoid those conflicts. once
> that's done, you can force a metadata full sync on the secondary zone
> by running 'radosgw-admin metadata sync init' there, then restarting
> its gateways. this will have to resync all of the bucket and user
> metadata as well

p.s. don't use the DeleteRole rest api on the secondary zone after
upgrading, as the request would get forwarded to the primary zone and
delete it there too. you can use 'radosgw-admin role delete' on the
secondary instead

>
> > Here is the original bug report:
> >
> > https://tracker.ceph.com/issues/57364
> > Thanks!
> > -Chris
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Rados gateway data-pool replacement.

2023-04-19 Thread Casey Bodley
On Wed, Apr 19, 2023 at 5:13 AM Gaël THEROND  wrote:
>
> Hi everyone, quick question regarding radosgw zone data-pool.
>
> I’m currently planning to migrate an old data-pool that was created with
> inappropriate failure-domain to a newly created pool with appropriate
> failure-domain.
>
> If I’m doing something like:
> radosgw-admin zone modify —rgw-zone default —data-pool 
>
> Will data from the old pool be migrated to the new one or do I need to do
> something else to migrate those data out of the old pool?

radosgw won't migrate anything. you'll need to use rados tools to do
that first. make sure you stop all radosgws in the meantime so it
doesn't write more objects to the old data pool

> I’ve read a lot
> of mail archive with peoples willing to do that but I can’t get a clear
> answer from those archives.
>
> I’m running on nautilus release of it ever help.
>
> Thanks a lot!
>
> PS: This mail is a redo of the old one as I’m not sure the former one
> worked (missing tags).
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: quincy user metadata constantly changing versions on multisite slave with radosgw roles

2023-04-20 Thread Casey Bodley
On Wed, Apr 19, 2023 at 7:55 PM Christopher Durham  wrote:
>
> Hi,
>
> I am using 17.2.6 on rocky linux for both the master and the slave site
> I noticed that:
> radosgw-admin sync status
> often shows that the metadata sync is behind a minute or two on the slave. 
> This didn't make sense, as the metadata isn't changing as far as I know.
> radosgw-admin mdlog list
>
> (on slave) showed me that there were user changes in metadata very often. 
> After doing a little research, here is a scenario I was able to develop:
>
> 1. user continually writes to a bucket he owns, pointing his aws cli (for 
> this test) to the master side endpoint as specified in  ~/.aws/config
>
> while this is running, do the following on the slave:
> radosgw-admin metadata get user:
> This always shows the same result for the user, no changes. The data gets to 
> the slave side bucket.
>
> 2. Restart the continual copy, but this time use a role that the user is a 
> member of via profile in ~/.aws/credentials and .~/aws/config, again writing 
> to the master endpoint as specified in ~/.aws/config
>
> aws --profile  s3 cp  s3:///file
> where profile is set up to use a role definiiton. The data gets to the bucket 
> on both sides. I do not have access if I do not use the role  profille (to 
> confirm I set it up right) However, while doing this second test, if I 
> continually do:
> radosgw-admin metadata get user:
> on the slave, I see a definite increase in versions. Here is a section of the 
> json output:
>
> "key": "user:",
> "ver": {   "tag": "somestring",   "ver:" 12145}
> the 12145 value increases over and over again, and the mtime value in the 
> json output increases too based on the current date. (not shown here).  The 
> same value, when queried on the master side, remains 1, and the mtime value 
> is the date the user was created or last changed by an admin. If I write a 
> file only once, the vers value increases by 1 too, but not sure if the 
> increase in vers is ncessarily 1:1 with the number of writes. This seems to 
> be the source of my continual metadata lag.  Am I missing something? I 
> suspect that this has been happening for awhile and not specific to 17.2.6 as 
> I just upgraded and the ver value is over 12000 for the user that I 
> discovered. (I used a python script to sync roles between master and slave 
> prior to 17.2.6. Now roles are replicated in 17.2.6).
> -Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

thanks Chris,

it looks like AssumeRole is writing to the user metadata
unnecessarily. i opened https://tracker.ceph.com/issues/59495 to track
this
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can I delete rgw log entries?

2023-04-20 Thread Casey Bodley
On Sun, Apr 16, 2023 at 11:47 PM Richard Bade  wrote:
>
> Hi Everyone,
> I've been having trouble finding an answer to this question. Basically
> I'm wanting to know if stuff in the .log pool is actively used for
> anything or if it's just logs that can be deleted.
> In particular I was wondering about sync logs.
> In my particular situation I have had some tests of zone sync setup,
> but now I've removed the secondary zone and pools. My primary zone is
> filled with thousands of logs like this:
> data_log.71
> data.full-sync.index.e2cf2c3e-7870-4fc4-8ab9-d78a17263b4f.47
> meta.full-sync.index.7
> datalog.sync-status.shard.e2cf2c3e-7870-4fc4-8ab9-d78a17263b4f.13
> bucket.sync-status.f3113d30-ecd3-4873-8537-aa006e54b884:{bucketname}:default.623958784.455
>
> I assume that because I'm not doing any sync anymore I can delete all
> the sync related logs? Is anyone able to confirm this?

yes

> What about if the sync is running? Are these being written and read
> from and therefore must be left alone?

right. while a multisite configuration is operating, the replication
logs will be trimmed in the background. in addition to the replication
logs, the log pool also contains sync status objects. these track the
progress of replication, and removing those objects would generally
cause sync to start over from the beginning

> It seems like these are more of a status than just a log and that
> deleting them might confuse the sync process. If so, does that mean
> that the log pool is not just output that can be removed as needed?
> Are there perhaps other things in there that need to stay?

the log pool is used by several subsystems like multisite sync,
garbage collection, bucket notifications, and lifecycle. those
features won't work reliably if you delete their rados objects

>
> Regards,
> Richard
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team meeting minutes - 2023 April 26

2023-04-26 Thread Casey Bodley
# ceph windows tests
PR check will be made required once regressions are fixed
windows build currently depends on gcc11 which limits use of c++20
features. investigating newer gcc or clang toolchain

# 16.2.13 release
final testing in progress

# prometheus metric regressions
https://tracker.ceph.com/issues/59505
related to previous discussion on 4/12 about quincy backports
integration test coverage needed for ceph-exporter and the mgr module

# lab update
centos/rhel tests were failing due to problematic mirrorlists
fixed in https://github.com/ceph/ceph-cm-ansible/pull/731
more sanity checks in progress at
https://github.com/ceph/ceph-cm-ansible/pull/733

# cephalocon feedback
dev summit etherpads: https://pad.ceph.com/p/cephalocon-dev-summit-2023
collect more notes here: https://pad.ceph.com/p/cephalocon-2023-brainstorm

request for dev-focused longer term discussion
could have specific user-focused and dev-focused sessions
dense conference, hard to fit everything in 3 days
could have longer component updates during conf, with time for questions
perhaps 3 days of conf, dev-specific discussions a day before (no cfp,
one big room, then option for breakout), user-feedback sessions during
the normal con
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-04-26 Thread Casey Bodley
are there any volunteers willing to help make these python packages
available upstream?

On Tue, Mar 28, 2023 at 5:34 AM Ernesto Puerta  wrote:
>
> Hey Ken,
>
> This change doesn't not involve any further internet access other than the 
> already required for the "make dist" stage (e.g.: npm packages). That said, 
> where feasible, I also prefer to keep the current approach for a minor 
> version.
>
> Kind Regards,
> Ernesto
>
>
> On Mon, Mar 27, 2023 at 9:06 PM Ken Dreyer  wrote:
>>
>> I hope we don't backport such a big change to Quincy. That will have a
>> large impact on how we build in restricted environments with no
>> internet access.
>>
>> We could get the missing packages into EPEL.
>>
>> - Ken
>>
>> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  wrote:
>> >
>> > Hi Casey,
>> >
>> > The original idea was to leave this to Reef alone, but given that the 
>> > CentOS 9 Quincy release is also blocked by missing Python packages, I 
>> > think that it'd make sense to backport it.
>> >
>> > I'm coordinating with Pere (in CC) to expedite this. We may need help to 
>> > troubleshoot Shaman/rpmbuild issues. Who would be the best one to help 
>> > with that?
>> >
>> > Regarding your last question, I don't know who's the maintainer of those 
>> > packages in EPEL. There's this BZ (https://bugzilla.redhat.com/2166620) 
>> > requesting that specific package, but that's only one out of the dozen of 
>> > missing packages (plus transitive dependencies)...
>> >
>> > Kind Regards,
>> > Ernesto
>> >
>> >
>> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  wrote:
>> >>
>> >> hi Ernesto and lists,
>> >>
>> >> > [1] https://github.com/ceph/ceph/pull/47501
>> >>
>> >> are we planning to backport this to quincy so we can support centos 9
>> >> there? enabling that upgrade path on centos 9 was one of the
>> >> conditions for dropping centos 8 support in reef, which i'm still keen
>> >> to do
>> >>
>> >> if not, can we find another resolution to
>> >> https://tracker.ceph.com/issues/58832? as i understand it, all of
>> >> those python packages exist in centos 8. do we know why they were
>> >> dropped for centos 9? have we looked into making those available in
>> >> epel? (cc Ken and Kaleb)
>> >>
>> >> On Fri, Sep 2, 2022 at 12:01 PM Ernesto Puerta  
>> >> wrote:
>> >> >
>> >> > Hi Kevin,
>> >> >
>> >> >>
>> >> >> Isn't this one of the reasons containers were pushed, so that the 
>> >> >> packaging isn't as big a deal?
>> >> >
>> >> >
>> >> > Yes, but the Ceph community has a strong commitment to provide distro 
>> >> > packages for those users who are not interested in moving to containers.
>> >> >
>> >> >> Is it the continued push to support lots of distros without using 
>> >> >> containers that is the problem?
>> >> >
>> >> >
>> >> > If not a problem, it definitely makes it more challenging. Compiled 
>> >> > components often sort this out by statically linking deps whose 
>> >> > packages are not widely available in distros. The approach we're 
>> >> > proposing here would be the closest equivalent to static linking for 
>> >> > interpreted code (bundling).
>> >> >
>> >> > Thanks for sharing your questions!
>> >> >
>> >> > Kind regards,
>> >> > Ernesto
>> >> > ___
>> >> > Dev mailing list -- d...@ceph.io
>> >> > To unsubscribe send an email to dev-le...@ceph.io
>> >>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Radosgw multisite replication issues

2023-04-27 Thread Casey Bodley
On Thu, Apr 27, 2023 at 11:36 AM Tarrago, Eli (RIS-BCT)
 wrote:
>
> After working on this issue for a bit.
> The active plan is to fail over master, to the “west” dc. Perform a realm 
> pull from the west so that it forces the failover to occur. Then have the 
> “east” DC, then pull the realm data back. Hopefully will get both sides back 
> in sync..
>
> My concern with this approach is both sides are “active”, meaning the client 
> has been writing data to both endpoints. Will this cause an issue where 
> “west” will have data that the metadata does not have record of, and then 
> delete the data?

no object data would be deleted as a result of metadata failover issues, no

>
> Thanks
>
> From: Tarrago, Eli (RIS-BCT) 
> Date: Thursday, April 20, 2023 at 3:13 PM
> To: Ceph Users 
> Subject: Radosgw multisite replication issues
> Good Afternoon,
>
> I am experiencing an issue where east-1 is no longer able to replicate from 
> west-1, however, after a realm pull, west-1 is now able to replicate from 
> east-1.
>
> In other words:
> West <- Can Replicate <- East
> West -> Cannot Replicate -> East
>
> After confirming the access and secret keys are identical on both sides, I 
> restarted all radosgw services.
>
> Here is the current status of the cluster below.
>
> Thank you for your help,
>
> Eli Tarrago
>
>
> root@east01:~# radosgw-admin zone get
> {
> "id": "ddd66ab8-0417-46ee-a53b-043352a63f93",
> "name": "rgw-east",
> "domain_root": "rgw-east.rgw.meta:root",
> "control_pool": "rgw-east.rgw.control",
> "gc_pool": "rgw-east.rgw.log:gc",
> "lc_pool": "rgw-east.rgw.log:lc",
> "log_pool": "rgw-east.rgw.log",
> "intent_log_pool": "rgw-east.rgw.log:intent",
> "usage_log_pool": "rgw-east.rgw.log:usage",
> "roles_pool": "rgw-east.rgw.meta:roles",
> "reshard_pool": "rgw-east.rgw.log:reshard",
> "user_keys_pool": "rgw-east.rgw.meta:users.keys",
> "user_email_pool": "rgw-east.rgw.meta:users.email",
> "user_swift_pool": "rgw-east.rgw.meta:users.swift",
> "user_uid_pool": "rgw-east.rgw.meta:users.uid",
> "otp_pool": "rgw-east.rgw.otp",
> "system_key": {
> "access_key": "PW",
> "secret_key": "H6"
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": "rgw-east.rgw.buckets.index",
> "storage_classes": {
> "STANDARD": {
> "data_pool": "rgw-east.rgw.buckets.data"
> }
> },
> "data_extra_pool": "rgw-east.rgw.buckets.non-ec",
> "index_type": 0
> }
> }
> ],
> "realm_id": "98e0e391-16fb-48da-80a5-08437fd81789",
> "notif_pool": "rgw-east.rgw.log:notif"
> }
>
> root@west01:~# radosgw-admin zone get
> {
>"id": "b2a4a31c-1505-4fdc-b2e0-ea07d9463da1",
> "name": "rgw-west",
> "domain_root": "rgw-west.rgw.meta:root",
> "control_pool": "rgw-west.rgw.control",
> "gc_pool": "rgw-west.rgw.log:gc",
> "lc_pool": "rgw-west.rgw.log:lc",
> "log_pool": "rgw-west.rgw.log",
> "intent_log_pool": "rgw-west.rgw.log:intent",
> "usage_log_pool": "rgw-west.rgw.log:usage",
> "roles_pool": "rgw-west.rgw.meta:roles",
> "reshard_pool": "rgw-west.rgw.log:reshard",
> "user_keys_pool": "rgw-west.rgw.meta:users.keys",
> "user_email_pool": "rgw-west.rgw.meta:users.email",
> "user_swift_pool": "rgw-west.rgw.meta:users.swift",
> "user_uid_pool": "rgw-west.rgw.meta:users.uid",
> "otp_pool": "rgw-west.rgw.otp",
> "system_key": {
> "access_key": "PxxW",
> "secret_key": "Hxx6"
> },
> "placement_pools": [
> {
> "key": "default-placement",
> "val": {
> "index_pool": "rgw-west.rgw.buckets.index",
> "storage_classes": {
> "STANDARD": {
> "data_pool": "rgw-west.rgw.buckets.data"
> }
> },
> "data_extra_pool": "rgw-west.rgw.buckets.non-ec",
> "index_type": 0
> }
> }
> ],
> "realm_id": "98e0e391-16fb-48da-80a5-08437fd81789",
> "notif_pool": "rgw-west.rgw.log:notif"
> east01:~# radosgw-admin metadata sync status
> {
> "sync_status": {
> "info": {
> "status": "init",
> "num_shards": 0,
> "period": "",
> "realm_epoch": 0
> },
> "markers": []
> },
> "full_sync": {
> "total": 0,
> "complete": 0
> }
> }
>
> west01:~#  radosgw-admin metadata sync status
> {
> "sync_status": {
> "info": {
> "status": "sync",
> "num_shards": 64,
> "period": "44b6b308-e2d8-4835-8518-c90447e7b55c",
> "realm_epoch": 3
> },
> "markers": [
>  

[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-02 Thread Casey Bodley
On Thu, Apr 27, 2023 at 5:21 PM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/59542#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> smoke - Radek, Laura
> rados - Radek, Laura
>   rook - Sébastien Han
>   cephadm - Adam K
>   dashboard - Ernesto
>
> rgw - Casey

rgw approved

> rbd - Ilya
> krbd - Ilya
> fs - Venky, Patrick
> upgrade/octopus-x (pacific) - Laura (look the same as in 16.2.8)
> upgrade/pacific-p2p - Laura
> powercycle - Brad (SELinux denials)
> ceph-volume - Guillaume, Adam K
>
> Thx
> YuriW
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.13 pacific QE validation status

2023-05-08 Thread Casey Bodley
On Sun, May 7, 2023 at 5:25 PM Yuri Weinstein  wrote:
>
> All PRs were cherry-picked and the new RC1 build is:
>
> https://shaman.ceph.com/builds/ceph/pacific-release/8f93a58b82b94b6c9ac48277cc15bd48d4c0a902/
>
> Rados, fs and rgw were rerun and results are summarized here:
> https://tracker.ceph.com/issues/59542#note-1
>
> Seeking final approvals:
>
> rados - Radek
> fs - Venky
> rgw - Casey

rgw approved, thanks

>
> On Fri, May 5, 2023 at 8:27 AM Yuri Weinstein  wrote:
> >
> > I got verbal approvals for the listed PRs:
> >
> > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > https://github.com/ceph/ceph/pull/51344  -- Venky approved
> > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > https://github.com/ceph/ceph/pull/50894  -- Radek approved
> >
> > Suites rados and fs will need to be retested on updates pacific-release 
> > branch.
> >
> >
> > On Thu, May 4, 2023 at 9:13 AM Yuri Weinstein  wrote:
> > >
> > > In summary:
> > >
> > > Release Notes:  https://github.com/ceph/ceph/pull/51301
> > >
> > > We plan to finish this release next week and we have the following PRs
> > > planned to be added:
> > >
> > > https://github.com/ceph/ceph/pull/51232 -- Venky approved
> > > https://github.com/ceph/ceph/pull/51344  -- Venky in progress
> > > https://github.com/ceph/ceph/pull/51200 -- Casey approved
> > > https://github.com/ceph/ceph/pull/50894  -- Radek in progress
> > >
> > > As soon as these PRs are finalized, I will cherry-pick them and
> > > rebuild "pacific-release" and rerun appropriate suites.
> > >
> > > On Thu, May 4, 2023 at 9:07 AM Radoslaw Zarzynski  
> > > wrote:
> > > >
> > > > If we get some time, I would like to include:
> > > >
> > > >   https://github.com/ceph/ceph/pull/50894.
> > > >
> > > > Regards,
> > > > Radek
> > > >
> > > > On Thu, May 4, 2023 at 5:56 PM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > Hi Yuri,
> > > > >
> > > > > On Wed, May 3, 2023 at 7:10 PM Venky Shankar  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, May 2, 2023 at 8:25 PM Yuri Weinstein  
> > > > > > wrote:
> > > > > > >
> > > > > > > Venky, I did plan to cherry-pick this PR if you approve this 
> > > > > > > (this PR
> > > > > > > was used for a rerun)
> > > > > >
> > > > > > OK. The fs suite failure is being looked into
> > > > > > (https://tracker.ceph.com/issues/59626).
> > > > >
> > > > > Fix is being tracked by
> > > > >
> > > > > https://github.com/ceph/ceph/pull/51344
> > > > >
> > > > > Once ready, it needs to be included in 16.2.13 and would require a fs
> > > > > suite re-run (although re-renning the failed tests should suffice,
> > > > > however, I'm a bit inclined in putting it through the fs suite).
> > > > >
> > > > > >
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 7:51 AM Venky Shankar 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Yuri,
> > > > > > > >
> > > > > > > > On Fri, Apr 28, 2023 at 2:53 AM Yuri Weinstein 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Details of this release are summarized here:
> > > > > > > > >
> > > > > > > > > https://tracker.ceph.com/issues/59542#note-1
> > > > > > > > > Release Notes - TBD
> > > > > > > > >
> > > > > > > > > Seeking approvals for:
> > > > > > > > >
> > > > > > > > > smoke - Radek, Laura
> > > > > > > > > rados - Radek, Laura
> > > > > > > > >   rook - Sébastien Han
> > > > > > > > >   cephadm - Adam K
> > > > > > > > >   dashboard - Ernesto
> > > > > > > > >
> > > > > > > > > rgw - Casey
> > > > > > > > > rbd - Ilya
> > > > > > > > > krbd - Ilya
> > > > > > > > > fs - Venky, Patrick
> > > > > > > >
> > > > > > > > There are a couple of new failures which are qa/test related - 
> > > > > > > > I'll
> > > > > > > > have a look at those (they _do not_ look serious).
> > > > > > > >
> > > > > > > > Also, Yuri, do you plan to merge
> > > > > > > >
> > > > > > > > https://github.com/ceph/ceph/pull/51232
> > > > > > > >
> > > > > > > > into the pacific-release branch although it's tagged with one 
> > > > > > > > of your
> > > > > > > > other pacific runs?
> > > > > > > >
> > > > > > > > > upgrade/octopus-x (pacific) - Laura (look the same as in 
> > > > > > > > > 16.2.8)
> > > > > > > > > upgrade/pacific-p2p - Laura
> > > > > > > > > powercycle - Brad (SELinux denials)
> > > > > > > > > ceph-volume - Guillaume, Adam K
> > > > > > > > >
> > > > > > > > > Thx
> > > > > > > > > YuriW
> > > > > > > > > ___
> > > > > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Cheers,
> > > > > > > > Venky
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Cheers,
> > > > > > Venky
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Cheers,
> > > > > Venky
> > > > > ___
> > > > > Dev mailing list -- d...@ceph.io
> >

[ceph-users] Re: Radosgw multisite replication issues

2023-05-11 Thread Casey Bodley
ansfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20857f2700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5

these errors would correspond to GetObject requests, and show up as
's3:get_obj' in the radosgw log


> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f2092ffd700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f2080fe9700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20817ea700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f208b7fe700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20867f4700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f2086ff5700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f20b12b8700  0 WARNING: curl operation timed 
> out, network average transfer speed less than 1024 Bytes per second during 
> 300 seconds.
> 2023-05-09T15:46:21.069+ 7f2085ff3700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
> 2023-05-09T15:46:21.069+ 7f20827ec700  0 rgw async rados processor: 
> store->fetch_remote_obj() returned r=-5
>
>
> From: Casey Bodley 
> Date: Thursday, April 27, 2023 at 12:37 PM
> To: Tarrago, Eli (RIS-BCT) 
> Cc: Ceph Users 
> Subject: Re: [ceph-users] Re: Radosgw multisite replication issues
> *** External email: use caution ***
>
>
>
> On Thu, Apr 27, 2023 at 11:36 AM Tarrago, Eli (RIS-BCT)
>  wrote:
> >
> > After working on this issue for a bit.
> > The active plan is to fail over master, to the “west” dc. Perform a realm 
> > pull from the west so that it forces the failover to occur. Then have the 
> > “east” DC, then pull the realm data back. Hopefully will get both sides 
> > back in sync..
> >
> > My concern with this approach is both sides are “active”, meaning the 
> > client has been writing data to both endpoints. Will this cause an issue 
> > where “west” will have data that the metadata does not have record of, and 
> > then delete the data?
>
> no object data would be deleted as a result of metadata failover issues, no
>
> >
> > Thanks
> >
> > From: Tarrago, Eli (RIS-BCT) 
> > Date: Thursday, April 20, 2023 at 3:13 PM
> > To: Ceph Users 
> > Subject: Radosgw multisite replication issues
> > Good Afternoon,
> >
> > I am experiencing an issue where east-1 is no longer able to replicate from 
> > west-1, however, after a realm pull, west-1 is now able to replicate from 
> > east-1.
> >
> > In other words:
> > West <- Can Replicate <- East
> > West -> Cannot Replicate -> East
> >
> > After confirming the access and secret keys are identical on both sides, I 
> > restarted all radosgw services.
> >
> > Here is the current status of the cluster below.
> >
> > Thank you for your help,
> >
> > Eli Tarrago
> >
> >
> > root@east01:~# radosgw-admin zone get
> > {
> > "id": "ddd66ab8-0417-46ee-a53b-043352a63f93",
> > "name": "rgw-east",
> > "domain_root": "rgw-east.rgw.meta:root",
> > "control_pool": "rgw-east.rgw.control",
> > "gc_pool": "rgw-east.rgw.log:gc",
> > "lc_pool": "rgw-east.rgw.log:lc",
> > "log_pool": "rgw-east.rgw.log",
> > "intent_log_pool": "rgw-east.rgw.log:intent",
> > "usage_log_pool": "

[ceph-users] Re: multisite sync and multipart uploads

2023-05-11 Thread Casey Bodley
sync doesn't distinguish between multipart and regular object uploads.
once a multipart upload completes, sync will replicate it as a single
object using an s3 GetObject request

replicating the parts individually would have some benefits. for
example, when sync retries are necessary, we might only have to resend
one part instead of the entire object. but it's far simpler to
replicate objects in a single atomic step

On Thu, May 11, 2023 at 1:07 PM Yixin Jin  wrote:
>
> Hi guys,
>
> With Quincy release, does anyone know how multisite sync deals with multipart 
> uploads? I mean those part objects of some incomplete multipart uploads. Are 
> those objects also sync-ed over either with full-sync or incremental sync? I 
> did a quick experiment and notice that these objects are not sync-ed over. Is 
> it intentional or is there a defect of it?
>
> Thanks,
> Yixin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: how to enable multisite resharding feature?

2023-05-17 Thread Casey Bodley
i'm afraid that feature will be new in the reef release. multisite
resharding isn't supported on quincy

On Wed, May 17, 2023 at 11:56 AM Alexander Mamonov  wrote:
>
> https://docs.ceph.com/en/latest/radosgw/multisite/#feature-resharding
> When I try this I get:
> root@ceph-m-02:~# radosgw-admin zone modify --rgw-zone=sel 
> --enable-feature=resharding
> ERROR: invalid flag --enable-feature=resharding
> root@ceph-m-02:~# ceph version
> ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Creating a bucket with bucket constructor in Ceph v16.2.7

2023-05-18 Thread Casey Bodley
On Wed, May 17, 2023 at 11:13 PM Ramin Najjarbashi
 wrote:
>
> Hi
>
> I'm currently using Ceph version 16.2.7 and facing an issue with bucket
> creation in a multi-zone configuration. My setup includes two zone groups:
>
> ZG1 (Master) and ZG2, with one zone in each zone group (zone-1 in ZG1 and
> zone-2 in ZG2).
>
> The objective is to create buckets in a specific zone group (ZG2) using the
> bucket constructor.
> However, despite setting the desired zone group (abrak) in the request, the
> bucket is still being created in the master zone group (ZG1).
> I have defined the following endpoint pattern for each zone group:
>
> s3.{zg}.mydomain.com
>
> I am using the s3cmd client to interact with the Ceph cluster. I have
> ensured that I provide the necessary endpoint and region information while
> executing the bucket creation command. Despite my efforts, the bucket
> consistently gets created in ZG1 instead of ZG2.

this is expected behavior for the metadata consistency model. all
metadata gets created on the metadata master zone first, and syncs to
all other zones in the realm from there. so your buckets will be
visible to every zonegroup

however, ZG2 is still the 'bucket location', and its object data
should only reside in ZG2's zones. any s3 requests on that bucket sent
to ZG1 will get redirected to ZG2 and serviced there

if you don't want any metadata shared between the two zonegroups, you
can put them in separate realms. but that includes user metadata as
well

>
> - Ceph Version: 16.2.7
> - Zone Group 1 (ZG1) Endpoint: http://s3.zonegroup1.mydomain.com
> - Zone Group 2 (ZG2) Endpoint: http://s3.zonegroup2.mydomain.com
> - Desired Bucket Creation Region: zg2-api-name
>
>  have reviewed the Ceph documentation and made necessary configuration
> changes, but I have not been able to achieve the desired result.I kindly
> request your assistance in understanding why the bucket constructor is not
> honoring the specified region and always defaults to ZG1. I would greatly
> appreciate any insights, recommendations, or potential solutions to resolve
> this issue.
>
>  Thank you for your time and support.
>
> -
> Here are the details of my setup:
> -
>
> ```sh
> s3cmd --region zg2-api-name mb s3://test-zg2s3cmd info
> s3://test-zg2s3://test-zg2/ (bucket):
>Location:  zg2-api-name
>Payer: BucketOwner
>Expiration Rule: none
>Policy:none
>CORS:  none
>ACL:   development: FULL_CONTROL
> ```
>
> this is my config file:
>
> ```ini
> [default]
> access_key = 
> secret_key = 
> host_base = s3.zonegroup1.mydomain.com
> host_bucket = s3.%(location)s.mydomain.com
> #host_bucket = %(bucket)s.s3.zonegroup1.mydomain.com
> #host_bucket = s3.%(location)s.mydomain.com
> #host_bucket = s3.%(region)s.mydomain.com
> bucket_location = zg1-api-name
> use_https = False
> ```
>
>
> Zonegroup configuration for the `zonegroup1` region:
>
> ```json
> {
> "id": "fb3f818a-ca9b-4b12-b431-7cdcd80006d",
> "name": "zg1-api-name",
> "api_name": "zg1-api-name",
> "is_master": "false",
> "endpoints": [
> "http://s3.zonegroup1.mydomain.com";,
> ],
> "hostnames": [
> "s3.zonegroup1.mydomain.com",
> ],
> "hostnames_s3website": [
> "s3-website.zonegroup1.mydomain.com",
> ],
> "master_zone": "at2-stg-zone",
> "zones": [
> {
> "id": "at2-stg-zone",
> "name": "at2-stg-zone",
> "endpoints": [
> "http://s3.zonegroup1.mydomain.com";
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 11,
> "read_only": "false",
> "tier_type": "",
> "sync_from_all": "true",
> "sync_from": [],
> "redirect_zone": ""
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": [],
> "storage_classes": [
> "STANDARD"
> ]
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "fa2f8194-4a9d-4b98-b411-9cdcd1e5506a",
> "sync_policy": {
> "groups": []
> }
> }
> ```
>
> Zonegroup configuration for the `zonegroup2` region:
>
> ```json
> {
> "id": "a513d60c-44a2-4289-a23d-b7a511be6ee4",
> "name": "zg2-api-name",
> "api_name": "zg2-api-name",
> "is_master": "false",
> "endpoints": [
> "http://s3.zonegroup2.mydomain.com";
> ],
> "hostnames": [
> "s3.zonegroup2.mydomain.com"
> ],
> "hostnames_s3website": [],
> "master_zone": "zonegroup2-sh-1",
> "zones": [
> {
> "id": "zonegroup2-sh-1",
> "name": "zonegroup2-sh-1",
> "endpoints": [
> "http://s3.zonegroup2.mydomain.com";
> ],
> "log_meta": "false",
> "log_data": "false",

[ceph-users] Re: Ceph Mgr/Dashboard Python depedencies: a new approach

2023-05-18 Thread Casey Bodley
thanks Ken! using copr sounds like a great way to unblock testing for
reef until everything lands in epel

for the teuthology part, i raised a pull request against teuthology's
install task to add support for copr repositories
(https://github.com/ceph/teuthology/pull/1844) and updated my ceph pr
that adds centos9 as a supported distro
(https://github.com/ceph/ceph/pull/50441) to enable that

i tested that combination in the rgw suite, and all of the packages
were installed successfully:
http://qa-proxy.ceph.com/teuthology/cbodley-2023-05-18_13:12:32-rgw:verify-main-distro-default-smithi/7277538/teuthology.log

for reference, the teuthology-suite command line for that test was:
$ teuthology-suite -s rgw:verify -m smithi --ceph-repo
https://github.com/ceph/ceph.git -S
fb28670387326ed3faf2b9cefac018ca68093364 --suite-repo
https://github.com/cbodley/ceph.git --suite-branch
wip-qa-distros-centos9 --teuthology-branch wip-install-copr -p 75
--limit 1 --seed 0 --filter centos_latest

On Wed, May 17, 2023 at 3:12 PM Ken Dreyer  wrote:
>
> Originally we had about a hundred packages in
> https://copr.fedorainfracloud.org/coprs/ceph/el9/ before they were
> wiped out in rhbz#2143742. I went back over the list of outstanding
> deps today. EPEL lacks only five packages now. I've built those into
> the Copr today.
>
> You can enable it with "dnf copr enable -y ceph/el9" . I think we
> should add this command to the container Dockerfile, Teuthology tasks,
> install-deps.sh, or whatever needs to run on el9 that is missing these
> packages.
>
> These tickets track moving the final five builds from the Copr into EPEL9:
>
> python-asyncssh - https://bugzilla.redhat.com/2196046
> python-pecan - https://bugzilla.redhat.com/2196045
> python-routes - https://bugzilla.redhat.com/2166620
> python-repoze-lru - no BZ yet
> python-logutils - provide karma here:
> https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2023-6baae8389d
>
> I was interested to see almost all of these are already in progress .
> That final one (logutils) should go to EPEL's stable repo in a week
> (faster with karma).
>
> - Ken
>
>
>
>
> On Wed, Apr 26, 2023 at 11:00 AM Casey Bodley  wrote:
> >
> > are there any volunteers willing to help make these python packages
> > available upstream?
> >
> > On Tue, Mar 28, 2023 at 5:34 AM Ernesto Puerta  wrote:
> > >
> > > Hey Ken,
> > >
> > > This change doesn't not involve any further internet access other than 
> > > the already required for the "make dist" stage (e.g.: npm packages). That 
> > > said, where feasible, I also prefer to keep the current approach for a 
> > > minor version.
> > >
> > > Kind Regards,
> > > Ernesto
> > >
> > >
> > > On Mon, Mar 27, 2023 at 9:06 PM Ken Dreyer  wrote:
> > >>
> > >> I hope we don't backport such a big change to Quincy. That will have a
> > >> large impact on how we build in restricted environments with no
> > >> internet access.
> > >>
> > >> We could get the missing packages into EPEL.
> > >>
> > >> - Ken
> > >>
> > >> On Fri, Mar 24, 2023 at 7:32 AM Ernesto Puerta  
> > >> wrote:
> > >> >
> > >> > Hi Casey,
> > >> >
> > >> > The original idea was to leave this to Reef alone, but given that the 
> > >> > CentOS 9 Quincy release is also blocked by missing Python packages, I 
> > >> > think that it'd make sense to backport it.
> > >> >
> > >> > I'm coordinating with Pere (in CC) to expedite this. We may need help 
> > >> > to troubleshoot Shaman/rpmbuild issues. Who would be the best one to 
> > >> > help with that?
> > >> >
> > >> > Regarding your last question, I don't know who's the maintainer of 
> > >> > those packages in EPEL. There's this BZ 
> > >> > (https://bugzilla.redhat.com/2166620) requesting that specific 
> > >> > package, but that's only one out of the dozen of missing packages 
> > >> > (plus transitive dependencies)...
> > >> >
> > >> > Kind Regards,
> > >> > Ernesto
> > >> >
> > >> >
> > >> > On Thu, Mar 23, 2023 at 2:19 PM Casey Bodley  
> > >> > wrote:
> > >> >>
> > >> >> hi Ernesto and lists,
> > >> >>
> > >> >> > [1] https://github.com/ceph/ceph/pull/47501
> > >> >>
> > &g

[ceph-users] Re: Encryption per user Howto

2023-05-22 Thread Casey Bodley
rgw supports the 3 flavors of S3 Server-Side Encryption, along with
the PutBucketEncryption api for per-bucket default encryption. you can
find the docs in https://docs.ceph.com/en/quincy/radosgw/encryption/

On Mon, May 22, 2023 at 10:49 AM huxia...@horebdata.cn
 wrote:
>
> Dear Alexander,
>
> Thanks a lot for helpful comments and insights. Regarding CephFS and RGW, Per 
> user seems to be daunting and complex.
>
> What if encryption on the server side without per user requirment? would it 
> be relatively easy to achieve, and how?
>
> best regards,
>
> Samuel
>
>
>
>
>
> huxia...@horebdata.cn
>
> From: Alexander E. Patrakov
> Date: 2023-05-21 15:44
> To: huxia...@horebdata.cn
> CC: ceph-users
> Subject: Re: [ceph-users] Encryption per user Howto
> Hello Samuel,
>
> On Sun, May 21, 2023 at 3:48 PM huxia...@horebdata.cn
>  wrote:
> >
> > Dear Ceph folks,
> >
> > Recently one of our clients approached us with a request on encrpytion per 
> > user, i.e. using individual encrytion key for each user and encryption  
> > files and object store.
> >
> > Does anyone know (or have experience) how to do with CephFS and Ceph RGW?
>
> For CephFS, this is unachievable.
>
> For RGW, please use Vault for storing encryption keys. Don't forget
> about the proper high-availability setup. Use an AppRole to manage
> tokens. Use Vault Agent as a proxy that adds the token to requests
> issued by RGWs. Then create a bucket for each user and set the
> encryption policy for this bucket using the PutBucketEncryption API
> that is available through AWS CLI. Either SSE-S3 or SSE-KMS will work
> for you. SSE-S3 is easier to manage. Each object will then be
> encrypted using a different key derived from its name and a per-bucket
> master key which never leaves Vault.
>
> Note that users will be able to create additional buckets by
> themselves, and they won't be encrypted, so tell them either not to do
> that or to encrypt the new buckets similarly.
>
> --
> Alexander E. Patrakov
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Important: RGW multisite bug may silently corrupt encrypted objects on replication

2023-05-26 Thread Casey Bodley
Our downstream QE team recently observed an md5 mismatch of replicated
objects when testing rgw's server-side encryption in multisite. This
corruption is specific to s3 multipart uploads, and only affects the
replicated copy - the original object remains intact. The bug likely
affects Ceph releases all the way back to Luminous where server-side
encryption was first introduced.

To expand on the cause of this corruption: Encryption of multipart
uploads requires special handling around the part boundaries, because
each part is uploaded and encrypted separately. In multisite, objects
are replicated in their encrypted form, and multipart uploads are
replicated as a single part. As a result, the replicated copy loses
its knowledge about the original part boundaries required to decrypt
the data correctly.

We don't have a fix yet, but we're tracking it in
https://tracker.ceph.com/issues/46062. The fix will only modify the
replication logic, so won't repair any objects that have already
replicated incorrectly. We'll need to develop a radosgw-admin command
to search for affected objects and reschedule their replication.

In the meantime, I can only advise multisite users to avoid using
encryption for multipart uploads. If you'd like to scan your cluster
for existing encrypted multipart uploads, you can identify them with a
s3 HeadObject request. The response would include a
x-amz-server-side-encryption header, and the ETag header value (with
"s removed) would be longer than 32 characters (multipart ETags are in
the special form "-"). Take care not to delete the
corrupted replicas, because an active-active multisite configuration
would go on to delete the original copy.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Important: RGW multisite bug may silently corrupt encrypted objects on replication

2023-05-30 Thread Casey Bodley
On Tue, May 30, 2023 at 8:22 AM Tobias Urdin  wrote:
>
> Hello Casey,
>
> Thanks for the information!
>
> Can you please confirm that this is only an issue when using 
> “rgw_crypt_default_encryption_key”
> config opt that says “testing only” in the documentation [1] to enable 
> encryption and not when using
> Barbican or Vault as KMS or using SSE-C with the S3 API?

unfortunately, all flavors of server-side encryption (SSE-C, SSE-KMS,
SSE-S3, and rgw_crypt_default_encryption_key) are affected by this
bug, as they share the same encryption logic. the main difference is
where they get the key

>
> [1] 
> https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only
>
> > On 26 May 2023, at 22:45, Casey Bodley  wrote:
> >
> > Our downstream QE team recently observed an md5 mismatch of replicated
> > objects when testing rgw's server-side encryption in multisite. This
> > corruption is specific to s3 multipart uploads, and only affects the
> > replicated copy - the original object remains intact. The bug likely
> > affects Ceph releases all the way back to Luminous where server-side
> > encryption was first introduced.
> >
> > To expand on the cause of this corruption: Encryption of multipart
> > uploads requires special handling around the part boundaries, because
> > each part is uploaded and encrypted separately. In multisite, objects
> > are replicated in their encrypted form, and multipart uploads are
> > replicated as a single part. As a result, the replicated copy loses
> > its knowledge about the original part boundaries required to decrypt
> > the data correctly.
> >
> > We don't have a fix yet, but we're tracking it in
> > https://tracker.ceph.com/issues/46062. The fix will only modify the
> > replication logic, so won't repair any objects that have already
> > replicated incorrectly. We'll need to develop a radosgw-admin command
> > to search for affected objects and reschedule their replication.
> >
> > In the meantime, I can only advise multisite users to avoid using
> > encryption for multipart uploads. If you'd like to scan your cluster
> > for existing encrypted multipart uploads, you can identify them with a
> > s3 HeadObject request. The response would include a
> > x-amz-server-side-encryption header, and the ETag header value (with
> > "s removed) would be longer than 32 characters (multipart ETags are in
> > the special form "-"). Take care not to delete the
> > corrupted replicas, because an active-active multisite configuration
> > would go on to delete the original copy.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Important: RGW multisite bug may silently corrupt encrypted objects on replication

2023-05-31 Thread Casey Bodley
On Wed, May 31, 2023 at 7:24 AM Tobias Urdin  wrote:
>
> Hello Casey,
>
> Understood, thanks!
>
> That means that the original copy in the site that it was uploaded to is still
> safe as long as that copy is not removed, and no underlying changes below
> RadosGW in the Ceph storage could corrupt the original copy?

right, the original multipart upload remains intact and can be
decrypted successfully

as i noted above, take care not to delete or modify any replicas that
were corrupted. replication is bidirectional by default, so those
changes would sync back and delete/overwrite the original copy

>
> Best regards
> Tobias
>
> On 30 May 2023, at 14:48, Casey Bodley  wrote:
>
> On Tue, May 30, 2023 at 8:22 AM Tobias Urdin 
> mailto:tobias.ur...@binero.com>> wrote:
>
> Hello Casey,
>
> Thanks for the information!
>
> Can you please confirm that this is only an issue when using 
> “rgw_crypt_default_encryption_key”
> config opt that says “testing only” in the documentation [1] to enable 
> encryption and not when using
> Barbican or Vault as KMS or using SSE-C with the S3 API?
>
> unfortunately, all flavors of server-side encryption (SSE-C, SSE-KMS,
> SSE-S3, and rgw_crypt_default_encryption_key) are affected by this
> bug, as they share the same encryption logic. the main difference is
> where they get the key
>
>
> [1] 
> https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only
>
> On 26 May 2023, at 22:45, Casey Bodley  wrote:
>
> Our downstream QE team recently observed an md5 mismatch of replicated
> objects when testing rgw's server-side encryption in multisite. This
> corruption is specific to s3 multipart uploads, and only affects the
> replicated copy - the original object remains intact. The bug likely
> affects Ceph releases all the way back to Luminous where server-side
> encryption was first introduced.
>
> To expand on the cause of this corruption: Encryption of multipart
> uploads requires special handling around the part boundaries, because
> each part is uploaded and encrypted separately. In multisite, objects
> are replicated in their encrypted form, and multipart uploads are
> replicated as a single part. As a result, the replicated copy loses
> its knowledge about the original part boundaries required to decrypt
> the data correctly.
>
> We don't have a fix yet, but we're tracking it in
> https://tracker.ceph.com/issues/46062. The fix will only modify the
> replication logic, so won't repair any objects that have already
> replicated incorrectly. We'll need to develop a radosgw-admin command
> to search for affected objects and reschedule their replication.
>
> In the meantime, I can only advise multisite users to avoid using
> encryption for multipart uploads. If you'd like to scan your cluster
> for existing encrypted multipart uploads, you can identify them with a
> s3 HeadObject request. The response would include a
> x-amz-server-side-encryption header, and the ETag header value (with
> "s removed) would be longer than 32 characters (multipart ETags are in
> the special form "-"). Take care not to delete the
> corrupted replicas, because an active-active multisite configuration
> would go on to delete the original copy.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> To unsubscribe send an email to 
> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


  1   2   3   >