[ceph-users] RadosGW cant list objects when there are too many of them

2019-10-17 Thread Arash Shams
Dear All I have a bucket with 5 million Objects and I cant list objects with radosgw-admin bucket list --bucket=bucket | jq .[].name or listing files using boto3 s3 = boto3.client('s3', endpoint_url=credentials['endpoint_url'], aws_access_key_id=cre

[ceph-users] Re: Recovering from a Failed Disk (replication 1)

2019-10-17 Thread Burkhard Linke
Hi, On 10/17/19 5:56 AM, Ashley Merrick wrote: I think your better off doing the DD method, you can export and import a PG at a time (ceph-objectstore-tool) But if the disk is failing a DD is probably your best method. In case of hardware problems or broken sectors, I would recommend 'dd_

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Paul Emmerich
On Thu, Oct 17, 2019 at 12:17 AM Robert LeBlanc wrote: > > On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich wrote: > > > > On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc > > wrote: > > > > > > On Tue, Oct 15, 2019 at 8:05 AM Robert LeBlanc > > > wrote: > > > > > > > > On Mon, Oct 14, 2019 at 2:

[ceph-users] Re: RadosGW cant list objects when there are too many of them

2019-10-17 Thread Paul Emmerich
Listing large buckets is slow due to S3 ordering requirements, it's approximately O(n^2). However, I wouldn't consider 5M to be a large bucket, it should go to only ~50 shards which should still perform reasonable. How fast are your metadata OSDs? Try --allow-unordered in radosgw-admin to get an u

[ceph-users] Re: Recovering from a Failed Disk (replication 1)

2019-10-17 Thread Frank Schilder
You probably need to attempt a physical data rescue. Data access will be lost until done. First thing is shut down the OSD to avoid any further damage to the disk. Second thing is to try ddrescue, repair data on a copy if possible and then create a clone on a new disk from the copy. If this does

[ceph-users] Re: ceph-users Digest, Vol 81, Issue 39 Re:RadosGW cant list objects when there are too many of them

2019-10-17 Thread Romit Misra
Hi Arash, If the number of objects in a bucket are too large in the order of millions, a paginated listing approach works better. There are also ceratin RGW configs, that controls on how big a RGW response (in terms of number objects can be, by default I believe this is 1000) The code for Paginat

[ceph-users] Re: RadosGW cant list objects when there are too many of them

2019-10-17 Thread Casey Bodley
When you say that you can't list it with boto or radosgw-admin, what happens? Does it give you an error, or just hang/timeout? How many shards does the bucket have? On 10/17/19 6:00 AM, Paul Emmerich wrote: Listing large buckets is slow due to S3 ordering requirements, it's approximately O(n^2

[ceph-users] Re: RDMA

2019-10-17 Thread Stig Telfer
Hi All - I did some investigation into Ceph RDMA as part of a performance analysis project working with Ceph over Omnipath and NVME. I wrote up some of the analysis here: https://www.stackhpc.com/ceph-on-the-brain-a-year-with-the-human-brain-project.html

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Robert LeBlanc
On Thu, Oct 17, 2019 at 2:50 AM Paul Emmerich wrote: > > On Thu, Oct 17, 2019 at 12:17 AM Robert LeBlanc wrote: > > > > On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich > > wrote: > > > > > > On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc > > > wrote: > > > > > > > > On Tue, Oct 15, 2019 at 8:0

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Casey Bodley
On 10/17/19 10:58 AM, Robert LeBlanc wrote: On Thu, Oct 17, 2019 at 2:50 AM Paul Emmerich wrote: On Thu, Oct 17, 2019 at 12:17 AM Robert LeBlanc wrote: On Wed, Oct 16, 2019 at 2:50 PM Paul Emmerich wrote: On Wed, Oct 16, 2019 at 11:23 PM Robert LeBlanc wrote: On Tue, Oct 15, 2019 at 8:0

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Robert LeBlanc
On Thu, Oct 17, 2019 at 9:22 AM Casey Bodley wrote: > With respect to this issue, civetweb and beast should behave the same. > Both frontends have a large thread pool, and their calls to > process_request() run synchronously (including blocking on rados > requests) on a frontend thread. So once t

[ceph-users] Re: Nautilus power outage - 2/3 mons and mgrs dead and no cephfs

2019-10-17 Thread Alex L
Hi, I am still having issues accessing my cephfs and managed to pull out more interesting logs, I also have enabled logs to 20/20 that I intend to upload as soon as my ceph tracker account gets accepted. Oct 17 16:35:22 pve21 kernel: libceph: read_partial_message 8ae0e636 signature chec

[ceph-users] Re: Nautilus power outage - 2/3 mons and mgrs dead and no cephfs

2019-10-17 Thread Alex L
Final update. I switched the below from false and everything magically started working! cephx_require_signatures = true cephx_cluster_require_signatures = true cephx_sign_messages = true ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe se

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Casey Bodley
On 10/17/19 12:59 PM, Robert LeBlanc wrote: On Thu, Oct 17, 2019 at 9:22 AM Casey Bodley wrote: With respect to this issue, civetweb and beast should behave the same. Both frontends have a large thread pool, and their calls to process_request() run synchronously (including blocking on rados

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Robert LeBlanc
On Thu, Oct 17, 2019 at 11:46 AM Casey Bodley wrote: > > > On 10/17/19 12:59 PM, Robert LeBlanc wrote: > > On Thu, Oct 17, 2019 at 9:22 AM Casey Bodley wrote: > > > >> With respect to this issue, civetweb and beast should behave the same. > >> Both frontends have a large thread pool, and their ca

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Matt Benjamin
My impression is that running a second gateway (assuming 1 at present) on the same host would be preferable to running one with very high thread count, also that 1024 is a good maximum value for thread count. Matt On Thu, Oct 17, 2019 at 4:01 PM Robert LeBlanc wrote: > > On Thu, Oct 17, 2019 at

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Robert LeBlanc
On Thu, Oct 17, 2019 at 1:05 PM Matt Benjamin wrote: > > My impression is that running a second gateway (assuming 1 at present) > on the same host would be preferable to running one with very high > thread count, also that 1024 is a good maximum value for thread count. We are running 4 RGW contai

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Casey Bodley
On 10/17/19 4:00 PM, Robert LeBlanc wrote: On Thu, Oct 17, 2019 at 11:46 AM Casey Bodley wrote: On 10/17/19 12:59 PM, Robert LeBlanc wrote: On Thu, Oct 17, 2019 at 9:22 AM Casey Bodley wrote: With respect to this issue, civetweb and beast should behave the same. Both frontends have a lar

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Robert LeBlanc
On Thu, Oct 17, 2019 at 2:03 PM Casey Bodley wrote: > > This is great news. Anything we can do to help in this effort as it is > > very important for us? > > We would love help here. While most of the groundwork is done, so the > remaining work is mostly mechanical. > > To summarize the strategy,

[ceph-users] Re: RGW blocking on large objects

2019-10-17 Thread Matt Benjamin
Thanks very much, Robert. Matt On Thu, Oct 17, 2019 at 5:24 PM Robert LeBlanc wrote: > > On Thu, Oct 17, 2019 at 2:03 PM Casey Bodley wrote: > > > This is great news. Anything we can do to help in this effort as it is > > > very important for us? > > > > We would love help here. While most of t