[ceph-users] Re: RGW returning HTTP 500 during resharding

Floris Bos Sat, 28 Sep 2024 14:48:12 -0700

"Anthony D'Atri" <a...@dreamsnake.net> schreef op 28 september 2024 16:24:
>> No retries.
>> Is it expected that resharding can take so long?
>> (in a setup with all NVMe drives)
> 
> Which drive SKU(s)? How full are they? Is their firmware up to date? How many 
> RGWs? Have you tuned
> your server network stack? Disabled Nagle? How many bucket OSDs? How many 
> index OSDs? How many PGs
> in the bucket and index pools? How many buckets? Do you have like 200M 
> objects per? Do you have the
> default max objects/shard setting?
> 
> Tiny objects are the devil of many object systems. I can think of cases where 
> the above questions
> could affect this case. I think you resharding in advance might help.

- Drives advertise themselves as “Dell Ent NVMe v2 AGN MU U.2 6.4TB” (think
that is Samsung under the Dell sticker), newest 2.5.0 firmware. They are
pretty empty. Although there is some 10% capacity being used by other stuff
(RBD images)

- Single bucket. My import application already errored out after only 72 M
objects/476 GiB of data, and need a lot more. Objects are between 0 bytes and 1
MB, 7 KB average.

- Currently using only 1 RGW during my test run to simplify looking at logs,
although I have 4.

- I cannot touch TCP socket options settings in my Java application.
When you build a S3AsyncClient with the Java AWS SDK using the .crtBuilder(),
the SDK outsources the communication to the AWS aws-c-s3/aws-c-http/aws-io CRT
libraries written in C, and I never get to see the raw socket in Java.
Looking at the source I don’t think Amazon is disabling the nagle algorithm in
their code.
At least I don’t see TCP_NODELAY or similar options being used at the place
they seem to set the socket options:
https://github.com/awslabs/aws-c-io/blob/c345d77274db83c0c2e30331814093e7c84c45e2/source/posix/socket.c#L1216

- Did not tune any network settings, and it is pretty quiet on the network
side, nowhere near saturating bandwidth because objects are so small.

- Did not really tune anything else either yet. Pretty much a default cephadm
setup for now.

- See it (automagically) allocated 1024 PGs for .data and 32 for .index.

- Think the main delay is just Ceph wanting to make sure everything is sync’ed
to storage before reporting success. So that is why I am making a lot of
concurrent connections to perform multiple PUT requests simultaneously. But
even with 250 connections, it only does around 5000 objects per second
according to the “object ingress/egress” Grafana graph. Can probably raise it
some more…

Had the default max. objects per shard settings for the dynamic sharding.
But have now manually resharded to 10069 shards, and will have a go to see if
it works better now.

Yours sincerely,

Floris Bos
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RGW returning HTTP 500 during resharding

Reply via email to