Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-31 Thread Ben Kerr
"...Dashboard is a dashboard so could not get health thru curl..." If i didn't miss the question, IMHO "dashboard" does this job adequately: curl -s -XGET :7000/health_data | jq -C ".health.status" ceph version 12.2.10 Am Do., 31. Jan. 2019 um 11:02 Uhr schrieb PHARABOT Vincent < vincent.phara.

Re: [ceph-users] Large omap objects - how to fix ?

2018-11-02 Thread Ben Morrice
_large_omap_objects | head -1 | awk '{print $2}'; done | grep ": 1" 137.1b: 1 137.36: 1 # ceph pg deep-scrub 137.1b # ceph pg deep-scrub 137.36 Kind regards, Ben Morrice ______ Ben Morrice | e: ben.morr...

[ceph-users] Large omap objects - how to fix ?

2018-10-26 Thread Ben Morrice
ut -d\" -f4`   pg=`ceph osd map .bbp-gva-master.rgw.buckets.index ${id} | awk '{print $11}' | cut -d\( -f2 | cut -d\) -f1`   echo "$i:$id:$pg" done # ./buckets > pglist # egrep '137.1b|137.36' pglist |wc -l 192 The following do

[ceph-users] slow requests and degraded cluster, but not really ?

2018-10-23 Thread Ben Morrice
rove the situation above, I removed several pools that were not used anymore. I assume the PGs that ceph cannot find now are related to this pool deletion. Does anyone have any ideas on how to get out of this state? Details below - and full 'ceph health detail' attached to this email.

Re: [ceph-users] object lifecycle and updating from jewel

2018-01-04 Thread Ben Hines
Yes, it works fine with pre existing buckets. On Thu, Jan 4, 2018 at 8:52 AM, Graham Allan wrote: > I've only done light testing with lifecycle so far, but I'm pretty sure > you can apply it to pre-existing buckets. > > Graham > > > On 01/02/2018 10:42 PM, Robert Stanford wrote: > >> >> I woul

[ceph-users] Extending OSD disk partition size

2017-12-19 Thread Ben pollard
n to the size of the new disk ceph can see the space under size but It shows as used under RAW USED. I can't find any information on how I would go about this. Any help would be greatly appreciated. Best, Ben. ___ ceph-users mailing list ceph-

Re: [ceph-users] Speeding up garbage collection in RGW

2017-10-24 Thread Ben Hines
objs = 2647 rgw lc max objs = 2647 rgw gc obj min wait = 300 rgw gc processor period = 600 rgw gc processor max time = 600 -Ben On Tue, Oct 24, 2017 at 9:25 AM, David Turner wrote: > As I'm looking into this more and more, I'm realizing how big of a problem > garbage collection

Re: [ceph-users] installing specific version of ceph-common

2017-10-09 Thread Ben Hines
figure it out properly? -Ben On Tue, Jul 18, 2017 at 1:39 AM, Buyens Niels wrote: > I've been looking into this again and have been able to install it now > (10.2.9 is newest now instead of 10.2.8 when I first asked the question): > > Looking at the dependency resolving, we ca

[ceph-users] Kraken bucket index fix failing

2017-09-14 Thread Ben Hines
ucket int8-packages --check-objects 2017-09-14 18:02:53.690107 7f65e33b9c80 0 System already converted [] Currently Kraken 11.2.0. Is it worth going up to Luminous (or Kraken 11.2.1) to fix? Thanks, -Ben ___ ceph-users mailing list ceph-users@lists.cep

Re: [ceph-users] Ceph release cadence

2017-09-11 Thread Ben Hines
ut automation should take care of that... ceph-mgr is also similar. Minor (or even major) updates to the GUI dashboard shouldn't be blocked rolling out to users because we're waiting on a new RBD feature or critical RGW fix. radosgw and mgr are really 'clients', after all. -Ben

Re: [ceph-users] Ceph re-ip of OSD node

2017-09-05 Thread Morrice Ben
iperf results between servers in the OLD and NEW, our network team resolved the issue and then ceph 'just worked'. Cheers, Ben​ From: ceph-users on behalf of Jake Young Sent: Wednesday, August 30, 2017 11:37 PM To: Jeremy Hanmer; ceph-users S

Re: [ceph-users] Repeated failures in RGW in Ceph 12.1.4

2017-08-30 Thread Ben Hines
The daily log rotation. -Ben On Wed, Aug 30, 2017 at 3:09 PM, Bryan Banister wrote: > Looking at the systemd service it does show that twice, at roughly the > same time and one day apart, the service did receive a HUP signal: > > Aug 29 16:31:02 carf-ceph-osd02 radosgw[130050]: 201

[ceph-users] Ceph re-ip of OSD node

2017-08-30 Thread Ben Morrice
cluster addr = 10.1.1.101 public addr = 10.1.1.101 NEW [osd.0] host = sn01 devs = /dev/sdi cluster addr = 10.1.2.101 public addr = 10.1.2.101 -- Kind regards, Ben Morrice ______ Ben Morrice | e: ben.morr...@epfl.ch

Re: [ceph-users] Linear space complexity or memory leak in `Radosgw-admin bucket check --fix`

2017-07-26 Thread Ben Hines
Which version of Ceph? On Tue, Jul 25, 2017 at 4:19 AM, Hans van den Bogert wrote: > Hi All, > > I don't seem to be able to fix a bucket, a bucket which has become > inconsistent due to the use of the `inconsistent-index` flag 8). > > My ceph-admin VM has 4GB of RAM, but that doesn't seem to be

Re: [ceph-users] Kraken rgw lifeycle processing nightly crash

2017-07-24 Thread Ben Hines
Looks like wei found and fixed this in https://github.com/ceph/ceph/pull/16495 Thanks Wei! This has been causing crashes for us since May. Guess it shows that not many folks use Kraken with lifecycles yet, but more certainly will with Luminous. -Ben On Fri, Jul 21, 2017 at 7:19 AM, Daniel

[ceph-users] Kraken rgw lifeycle processing nightly crash

2017-07-20 Thread Ben Hines
ceph.com/issues/14> 0x7f6a6cb0fdc5 in start_thread () from /lib64/libpthread.so.0 #15 <http://tracker.ceph.com/issues/15> 0x7f6a6b37073d in clone () from /lib64/libc.so.6 thanks, -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW: Auth error with hostname instead of IP

2017-06-12 Thread Ben Morrice
w dns name' from your ceph.conf Kind regards, Ben Morrice ______ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 09/06/17 23:50, Eric Choi wrote: W

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-06 Thread Ben Hines
If you have nothing listed in 'lc list', you probably need to add a lifecycle configuration using the S3 API. It's not automatic and has to be added per-bucket. Here's some sample code for doing so: http://tracker.ceph.com/issues/19587 -Ben On Tue, Jun 6, 2017 at 9:07 AM,

Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-05 Thread Ben Hines
the lifecycle processor logs 'DELETED' each time it deletes something: https://github.com/ceph/ceph/blob/master/src/rgw/rgw_lc.cc#L388 grep --text DELETED client..log | wc -l 121853 -Ben On Mon, Jun 5, 2017 at 6:16 AM, Daniel Gryniewicz wrote: > Kraken has lifecycle, Je

Re: [ceph-users] Prometheus RADOSGW usage exporter

2017-05-30 Thread Ben Morrice
your code, as these values are not updated via radosgw-admin either. I think i'm hitting this bug http://tracker.ceph.com/issues/19194 [1] for bucket in entry['buckets']: print bucket bucket_owner = bucket['owner'

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-05-22 Thread Ben Hines
ould be noted in a more prominent release note. Without the hostname in there, ceph interpreted the hostname as a bucket name if the hostname rgw was being hit with differed from the hostname of the actual server. Pre Kraken, i didn't need that setting at all and it just worked. -Ben On Mon, M

Re: [ceph-users] DNS records for ceph

2017-05-20 Thread Ben Hines
Ceph kraken or later can use SRV records to find the mon servers. It works great and I've found it a bit easier to maintain than the static list in ceph.conf. That would presumably be on the private subnet. On May 20, 2017 7:40 AM, "David Turner" wrote: > The private network is only used by OS

Re: [ceph-users] Changing SSD Landscape

2017-05-17 Thread Ben Hines
is all-spinner performant with Bluestore?) Too early to make that call? -Ben On Wed, May 17, 2017 at 5:30 PM, Christian Balzer wrote: > > Hello, > > On Wed, 17 May 2017 11:28:17 +0200 Eneko Lacunza wrote: > > > Hi Nick, > > > > El 17/05/17 a las 11:12, Nick Fisk

Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-05-11 Thread Ben Hines
hese settings are honored still? (personally i dont want to limit it at all, I would rather it delete as many objects as it can within its runtime) Also curious if lifecycle deleted objects go through the garbage collector, or are they just immediately deleted? -Ben On Mon, Apr 10, 2017 at 2:46

Re: [ceph-users] Read from Replica Osds?

2017-05-08 Thread Ben Hines
We write many millions of keys into RGW which will never be changed (until they are deleted) -- it would be interesting if we could somehow indicate this to RGW and enable reading those from the replicas as well. -Ben On Mon, May 8, 2017 at 10:18 AM, Jason Dillaman wrote: > librbd

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-28 Thread Ben Morrice
Hello again, I can work around this issue. If the host header is an IP address, the request is treated as a virtual: So if I auth to to my backends via IP, things work as expected. Kind regards, Ben Morrice __ Ben Morrice

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-28 Thread Ben Morrice
me terminated by haproxy? Kind regards, Ben Morrice ______ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 27/04/17 13:53, Radoslaw Zarzynski wrote

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-27 Thread Ben Morrice
see the same behavior as 10.2.7, so the bug i'm hitting looks like it was introduced in 10.2.6 Kind regards, Ben Morrice ______ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mi

Re: [ceph-users] Ceph UPDATE (not upgrade)

2017-04-26 Thread Ben Hines
to be taken pre upgrade. The release notes will tell you if that is the case. -Ben On Wed, Apr 26, 2017 at 7:21 AM, Massimiliano Cuttini wrote: > On a Ceph Monitor/OSD server can i run just: > > *yum update -y* > > in order to upgrade system and pac

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-26 Thread Morrice Ben
Hello Radek, Please find attached the failed request for both the admin user and a standard user (backed by keystone). Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL BBP Biotech

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-24 Thread Ben Morrice
I hitting a bug here? Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL / BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 21/04/17 09:36, Ben Morrice wrote: Hello

Re: [ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-21 Thread Ben Morrice
Hello Orit, Please find attached the output from the radosgw commands and the relevant section from ceph.conf (radosgw) bbp-gva-master is running 10.2.5 bbp-gva-secondary is running 10.2.7 Kind regards, Ben Morrice __ Ben

[ceph-users] RGW 10.2.5->10.2.7 authentication fail?

2017-04-20 Thread Ben Morrice
ff9777e6700 20 RGWEnv::set(): HTTP_DATE: Thu Apr 20 14:43:04 2017 2017-04-20 16:43:04.918390 7ff9777e6700 20 > HTTP_DATE -> Thu Apr 20 14:43:04 2017 2017-04-20 16:43:04.918404 7ff9777e6700 10 get_canon_resource(): dest=/admin/log 2017-04-20 16:43:04.918406 7

Re: [ceph-users] Creating journal on needed partition

2017-04-19 Thread Ben Hines
eploy already handles nicely. -Ben On Tue, Apr 18, 2017 at 6:22 AM, Vincent Godin wrote: > Hi, > > If you're using ceph-deploy, just run the command : > > ceph-deploy osd prepare --overwrite-conf {your_host}:/dev/sdaa:/dev/sdaf2 >

Re: [ceph-users] RGW lifecycle bucket stuck processing?

2017-04-14 Thread Ben Hines
Interesting - the state went back to 'UNINITIAL' eventually, possibly because the first run never finished. Will see if it ever completes during a nightly run. -BEn On Thu, Apr 13, 2017 at 11:10 AM, Ben Hines wrote: > I initiated a manual lifecycle cleanup with: > > rados

Re: [ceph-users] Question about RadosGW subusers

2017-04-13 Thread Ben Hines
Based on past LTS release dates would predict Luminous much sooner than that, possibly even in May... http://docs.ceph.com/docs/master/releases/ The docs also say "Spring" http://docs.ceph.com/docs/master/release-notes/ -Ben On Thu, Apr 13, 2017 at 12:11 PM, wrote: > Thank

[ceph-users] RGW lifecycle bucket stuck processing?

2017-04-13 Thread Ben Hines
7c80 0 System already converted real0m17.785s Is is possible it left behind a stale lock on the bucket due to the control-c? -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-11 Thread Ben Hines
t the most correct / user friendly result. http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html specifies 'Prefix' as Optional, so i'll put in a bug for this. -Ben On Mon, Apr 3, 2017 at 12:14 PM, Ben Hines wrote: > Interesting. > I'm wondering

Re: [ceph-users] best way to resolve 'stale+active+clean' after disk failure

2017-04-06 Thread Ben Hines
Personally before extreme measures like marking lost, i would try bringing up the osd, so it's up and out -- i believe the data will still be found and re balanced away from it by Ceph. -Ben On Thu, Apr 6, 2017 at 11:20 AM, David Welch wrote: > Hi, > We had a disk on the cluster t

Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-04-05 Thread Ben Hines
Ceph's RadosGW uses garbage collection by default. Try running 'radosgw-admin gc list' to list the objects to be garbage collected, or 'radosgw-admin gc process' to trigger them to be deleted. -Ben On Wed, Apr 5, 2017 at 12:15 PM, Deepak Naidu wrote: > Folks, &

[ceph-users] ceph pg inconsistencies - omap data lost

2017-04-04 Thread Ben Morrice
ttributes on the filesystem of each OSD that hosts this file, they are indeed empty (all 3 OSDs are the same, but just listing one for brevity) getfattr /var/lib/ceph/osd/ceph-29/current/4.19_head/DIR_9/DIR_1/DIR_2/rbd\\uheader.08f7fa43a49c7f__head_6C8FC219__4 getfattr: Removing leading 

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-03 Thread Ben Hines
700 2 req 14:0.001034:s3:PUT /bentest/:put_lifecycle:op status=0 2017-04-03 12:07:15.093859 7f5617024700 2 req 14:0.001050:s3:PUT /bentest/:put_lifecycle:http status=501 2017-04-03 12:07:15.093884 7f5617024700 1 == req done req=0x7f561701e340 op status=0 http_status=501 == -Ben On Mon, Apr

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-04-02 Thread Ben Hines
"type": "zone", "perm": "*" } ], "op_mask": "read, write, delete", "default_placement": "", "placement_tags": [], "bucket_quota": { "enabled":

Re: [ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-03-31 Thread Ben Hines
e:op status=-13 2017-03-31 21:28:18.382620 7f50d0010700 2 req 8:0.001098:s3:PUT /bentest:put_lifecycle:http status=403 2017-03-31 21:28:18.382665 7f50d0010700 1 == req done req=0x7f50d000a340 op status=-13 http_status=403 == -Ben On Tue, Mar 28, 2017 at 6:42 AM, Daniel Gryniewicz

Re: [ceph-users] S3 Multi-part upload broken with newer AWS Java SDK and Kraken RGW

2017-03-30 Thread Ben Hines
Hey Yehuda, Are there plans to port of this fix to Kraken? (or is there even another Kraken release planned? :) thanks! -Ben On Wed, Mar 1, 2017 at 11:33 AM, Yehuda Sadeh-Weinraub wrote: > This sounds like this bug: > http://tracker.ceph.com/issues/17076 > > Will be fixed in

Re: [ceph-users] speed decrease with size

2017-03-14 Thread Ben Erridge
, 2017 at 7:24 PM, Christian Balzer wrote: > > Hello, > > On Mon, 13 Mar 2017 11:25:15 -0400 Ben Erridge wrote: > > > On Sun, Mar 12, 2017 at 8:24 PM, Christian Balzer wrote: > > > > > > > > Hello, > > > > > > On Sun, 12 Mar 2017 19:37:

Re: [ceph-users] speed decrease with size

2017-03-13 Thread Ben Erridge
On Sun, Mar 12, 2017 at 8:24 PM, Christian Balzer wrote: > > Hello, > > On Sun, 12 Mar 2017 19:37:16 -0400 Ben Erridge wrote: > > > I am testing attached volume storage on our openstack cluster which uses > > ceph for block storage. > > our Ceph nodes have lar

[ceph-users] speed decrease with size

2017-03-12 Thread Ben Erridge
I am testing attached volume storage on our openstack cluster which uses ceph for block storage. our Ceph nodes have large SSD's for their journals 50+GB for each OSD. I'm thinking some parameter is a little off because with relatively small writes I am seeing drastically reduced write speeds. we

Re: [ceph-users] Shrinking lab cluster to free hardware for a new deployment

2017-03-09 Thread Ben Hines
AFAIK depending on how many you have, you are likely to end up with 'too many pgs per OSD' warning for your main pool if you do this, because the number of PGs in a pool cannot be reduced and there will be less OSDs to put them on. -Ben On Wed, Mar 8, 2017 at 5:53 AM, Henrik Korkuc wr

Re: [ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Ben Hines
tweb.access.log -Ben On Thu, Feb 9, 2017 at 12:30 PM, Wido den Hollander wrote: > > > Op 9 februari 2017 om 19:34 schreef Mark Nelson : > > > > > > I'm not really an RGW expert, but I'd suggest increasing the > > "rgw_thread_pool_size" opt

Re: [ceph-users] rgw static website docs 404

2017-01-19 Thread Ben Hines
r new features are added and effectively kept secret. -Ben On Thu, Jan 19, 2017 at 1:56 AM, Wido den Hollander wrote: > > > Op 19 januari 2017 om 2:57 schreef Ben Hines : > > > > > > Aha! Found some docs here in the RHCS site: > > > > https://access.redhat.co

Re: [ceph-users] rgw static website docs 404

2017-01-18 Thread Ben Hines
Aha! Found some docs here in the RHCS site: https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/paged/object-gateway-guide-for-red-hat-enterprise-linux/chapter-2-configuration Really, ceph.com should have all this too... -Ben On Wed, Jan 18, 2017 at 5:15 PM, Ben Hines wrote

[ceph-users] rgw static website docs 404

2017-01-18 Thread Ben Hines
do with it? Does it require using dns based buckets, for example? I'd like to be able to hit a website with http:, ideally. (without the browser forcing it to download) thanks, -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.c

Re: [ceph-users] Can't create bucket (ERROR: endpoints not configured for upstream zone)

2016-12-22 Thread Ben Hines
but i had forgotten about my monitoring system which runs 'radosgw-admin' -- that part upgraded first, before i'd stopped any of my Infernalis RGW's. -Ben On Thu, Jul 28, 2016 at 7:50 AM, Arvydas Opulskis < arvydas.opuls...@adform.com> wrote: > Hi, > > We solved it by

Re: [ceph-users] v11.1.0 kraken candidate released

2016-12-12 Thread Ben Hines
upgrade notes. (though reading release notes is also required) http://docs.ceph.com/docs/master/install/upgrading-ceph/ -Ben On Mon, Dec 12, 2016 at 6:35 PM, Ben Hines wrote: > Hi! Can you clarify whether this release note applies to Jewel upgrades > only? Ie, can we go Infernalis ->

Re: [ceph-users] v11.1.0 kraken candidate released

2016-12-12 Thread Ben Hines
oes say 'All clusters'. Upgrading from Jewel * All clusters must first be upgraded to Jewel 10.2.z before upgrading to Kraken 11.2.z (or, eventually, Luminous 12.2.z). thanks! -Ben On Mon, Dec 12, 2016 at 6:28 PM, Abhishek L wrote: > Hi everyone, > > This is the

Re: [ceph-users] Kraken 11.x feedback

2016-12-09 Thread Ben Hines
Not particularly, i just never did the Jewel upgrade. (normally like to stay relatively current) -Ben On Fri, Dec 9, 2016 at 11:40 AM, Samuel Just wrote: > Is there a particular reason you are sticking to the versions with > shorter support periods? > -Sam > > On Fri, Dec 9, 2

[ceph-users] Kraken 11.x feedback

2016-12-09 Thread Ben Hines
Anyone have any good / bad experiences with Kraken? I haven't seen much discussion of it. Particularly from the RGW front. I'm still on Infernalis for our cluster, considering going up to K. thanks, -Ben ___ ceph-users mailing list

[ceph-users] Ceph recovery stuck

2016-12-06 Thread Ben Erridge
raded (0.625%) 896 active+clean 64 active+degraded Any idea on what's going on or how we can get the process to resume? -- -----. Ben Erridge Center For Information Management, Inc. (734) 930-0855 3550 West Liberty Road Ste 1 Ann Arbor

Re: [ceph-users] Memory leak in radosgw

2016-10-21 Thread Ben Morrice
What version of libcurl are you using? I was hitting this bug with RHEL7/libcurl 7.29 which could also be your catalyst. http://tracker.ceph.com/issues/15915 Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch

Re: [ceph-users] RGW multisite replication failures

2016-09-28 Thread Ben Morrice
b7fe700 5 Sync:bbp-gva-:data:Object:20160928:bbp-gva-master.106061599.1/20160928-1mb-testfile[null][0]:finish 2016-09-28 16:19:02.163101 7f845b7fe700 5 Sync:bbp-gva-:data:BucketFull:20160928:bbp-gva-master.106061599.1:finish 2016-09-28 16:19:02.163108 7f845b7fe700 5 f

Re: [ceph-users] RGW multisite replication failures

2016-09-27 Thread Ben Morrice
over 500k in size fails :( Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch | t: +41-21-693-9670 EPFL ENT CBS BBP Biotech Campus Chemin des Mines 9 1202 Geneva Switzerland On 23/09/16 16:52, Orit Wasserman wrote

[ceph-users] RGW multisite replication failures

2016-09-23 Thread Ben Morrice
035.4585.2 2016-09-23 09:03:28.829207 7f9a72ffd700 0 ERROR: failed to sync object: bentest1:bbp-gva-master.85732351.16:-1/1m 2016-09-23 09:03:28.834281 7f9a72ffd700 20 store_marker(): updating marker marker_oid=bucket.sync-status.bbp-gva-master:bentest1:bbp-gva-master.85732351.16 marker=000000

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines
Thanks. Will try it out once we get on Jewel. Just curious, does bucket deletion with --purge-objects work via radosgw-admin with the no index option? If not, i imagine rados could be used to delete them manually by prefix. On Sep 21, 2016 6:02 PM, "Stas Starikevich" wrote: > Hi B

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines
Nice, thanks! Must have missed that one. It might work well for our use case since we don't really need the index. -Ben On Wed, Sep 21, 2016 at 11:23 AM, Gregory Farnum wrote: > On Wednesday, September 21, 2016, Ben Hines wrote: > >> Yes, 200 million is way too big for

Re: [ceph-users] rgw bucket index manual copy

2016-09-21 Thread Ben Hines
very frustrating. If we could retrieve / put objects into RGW without hitting the index at all we would - we don't need to list our buckets. -Ben On Tue, Sep 20, 2016 at 1:57 AM, Wido den Hollander wrote: > > > Op 20 september 2016 om 10:55 schreef Василий Ангапов >: > &g

Re: [ceph-users] RGW multisite - second cluster woes

2016-08-21 Thread Ben Morrice
"name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "b23771d0-6005-41da-8ee0-aec0

[ceph-users] RGW multisite - second cluster woes

2016-08-18 Thread Ben Morrice
se, "max_size_kb": -1, "max_objects": -1 } }, "realm_id": "", "realm_name": "", "realm_epoch": 0 } # radosgw-admin realm default --rgw

Re: [ceph-users] Unknown error (95->500) when creating buckets or putting files to RGW after upgrade from Infernalis to Jewel

2016-07-26 Thread Ben Hines
Fwiw this thread still has me terrified to upgrade my rgw cluster. Just when I thought it was safe. Anyone have any successful problem free rgw infernalis-jewel upgrade reports? On Jul 25, 2016 11:27 PM, "nick" wrote: > Hey Maciej, > I compared the output of your commands with the output on our

Re: [ceph-users] Jewel Multisite RGW Memory Issues

2016-07-08 Thread Ben Agricola
es of the files in one of the buckets - probably 5000 lines total. The OP of the request that generated the bucket list was was '25RGWListBucket_ObjStore_S3', and appears to have been made by one of the RGW nodes in the other site. Any ideas? Ben. On Mon, 27 Jun 2016 at 10:47 Ben Agri

Re: [ceph-users] Jewel Multisite RGW Memory Issues

2016-06-27 Thread Ben Agricola
start back up and normal memory usage resumes. Cheers, Ben. On Mon, 27 Jun 2016 at 10:39 Ben Agricola wrote: > Hi Pritha, > > > At the time, the 'primary' cluster (i.e. the one with the active data set) > was receiving backup files from a small number of machines, prio

Re: [ceph-users] Jewel Multisite RGW Memory Issues

2016-06-27 Thread Ben Agricola
de continue to increase in usage at the same rate. There are no further messages in the RadosGW log as this is occurring (since there is no client traffic and no further replication traffic). If I kill the active RadosGW processes then they start back up and normal memory usage resumes. Cheers, B

[ceph-users] Jewel Multisite RGW Memory Issues

2016-06-20 Thread Ben Agricola
I have 2 distinct clusters configured, in 2 different locations, and 1 zonegroup. Cluster 1 has ~11TB of data currently on it, S3 / Swift backups via the duplicity backup tool - each file is 25Mb and probably 20% are multipart uploads from S3 (so 4Mb stripes) - in total 3217kobjects. This cluster

Re: [ceph-users] radosgw hammer -> jewel upgrade (default zone & region config)

2016-05-23 Thread Ben Hines
I for one am terrified of upgrading due to these messages (and indications that the problem still may not be resolved even in 10.2.1) - holding off until a clean upgrade is possible without running any hacky scripts. -Ben On Mon, May 23, 2016 at 2:23 AM, nick wrote: > Hi, > we ran in

[ceph-users] waiting for rw locks on rgw index file during recovery

2016-05-06 Thread Ben Hines
Infernalis 9.2.1, Centos 72. My cluster is in recovery and i've noticed a lot of 'waiting for rw locks'. Some of these can last quite a long time. Any idea what can cause this? Because this is a RGW bucket index file, this causes backup effects -- since the index can't be updated, S3 updates to ot

Re: [ceph-users] Incorrect crush map

2016-05-05 Thread Ben Hines
all. If there's duplicate IDs for example, due to leftover files or somesuch, then a working OSD on another OSD may be forcibly moved in the crush map to another node where it doesn't exist. I would expect OSDs to update their own location in CRUSH, rather than having this be a prestart s

[ceph-users] RGW obj remove cls_xx_remove returned -2

2016-05-04 Thread Ben Hines
Ceph 9.2.1, Centos 7.2 I noticed these errors sometimes when removing objects. It's getting a 'No such file or directory' on the OSD when deleting things sometimes. Any ideas here? Is this expected? (i anonymized the full filename, but it's all the same file) RGW log: 2016-05-04 23:14:32.2163

Re: [ceph-users] Incorrect crush map

2016-05-04 Thread Ben Hines
13 systemd: ceph-osd@42.service failed. -Ben On Tue, May 3, 2016 at 7:16 PM, Wade Holler wrote: > Hi Ben, > > What OS+Version ? > > Best Regards, > Wade > > > On Tue, May 3, 2016 at 2:44 PM Ben Hines wrote: > >> My crush map keeps putting some OSDs

[ceph-users] RGW Jewel upgrade: realms and default .rgw.root pool?

2016-05-04 Thread Ben Morrice
also move this federated configuration to a multisite configuration, however at this point in time I am just focusing on upgrading ceph to Jewel and maintaining the federated configuration. Thanks! Cheers, Ben [root@bbpcb051 ceph]# /usr/bin/radosgw -d --cluster ceph --name client.radosgw.gateway --set

[ceph-users] ceph degraded writes

2016-05-03 Thread Ben Hines
release note just needed for the upgrade? I think we may be encountering problems in our cluster during recovery because we can't write to any object which has less than 3 copies even though we have min_size at 1. thanks, -Ben ___ ceph-users mailing

[ceph-users] Incorrect crush map

2016-05-03 Thread Ben Hines
My crush map keeps putting some OSDs on the wrong node. Restarting them fixes it temporarily, but they eventually hop back to the other node that they aren't really on. Is there anything that can cause this to look for? Ceph 9.2.1 -Ben ___ ceph-

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines
Aha, i see how to use the debuginfo - trying it by running through gdb. On Wed, Apr 27, 2016 at 10:09 PM, Ben Hines wrote: > Got it again - however, the stack is exactly the same, no symbols - > debuginfo didn't resolve. Do i need to do something to enable that? > > The serve

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines
ght signal (Segmentation fault) ** in thread 7f9e7e7f4700 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) 1: (()+0x30b0a2) [0x7fa11c5030a2] 2: (()+0xf100) [0x7fa1183fe100] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- logging levels --- On Wed, Apr 27

Re: [ceph-users] radosgw crash - Infernalis

2016-04-27 Thread Ben Hines
happen. thanks! On Wed, Apr 27, 2016 at 8:40 PM, Brad Hubbard wrote: > - Original Message - > > From: "Karol Mroz" > > To: "Ben Hines" > > Cc: "ceph-users" > > Sent: Wednesday, 27 April, 2016 7:06:56 PM > > Subject: Re: [cep

[ceph-users] radosgw crash - Infernalis

2016-04-26 Thread Ben Hines
Is this a known one? Ceph 9.2.1. Can provide more logs if needed. 2> 2016-04-26 22:07:59.662702 7f49aeffd700 1 == req done req=0x7f49c4138be0 http_status=200 == -11> 2016-04-26 22:07:59.662752 7f49aeffd700 1 civetweb: 0x7f49c4001280: 10.30.1.221 - - [26/Apr/2016:22:07:59 -0700] "HEAD

Re: [ceph-users] Using s3 (radosgw + ceph) like a cache

2016-04-25 Thread Ben Hines
GW because it has to update the index too. -Ben On Mon, Apr 25, 2016 at 2:15 AM, Dominik Mostowiec < dominikmostow...@gmail.com> wrote: > Hi, > I thought that xfs fragmentation or leveldb(gc list growing, locking, > ...) could be a problem. > Do you have any experience w

[ceph-users] Ceph Advice

2016-03-22 Thread Ben Archuleta
NFS/CIFS? What are the chances of data corruption. Also on average how well does CephFS handle variable size files ranging from really small to really large? Regards, Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/li

[ceph-users] CephFS Advice

2016-03-22 Thread Ben Archuleta
NFS/CIFS? What are the chances of data corruption. Also on average how well does CephFS handle variable size files ranging from really small to really large? Regards, Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/li

Re: [ceph-users] rgw bucket deletion woes

2016-03-19 Thread Ben Hines
be expiring, so RGW will need to handle. -Ben On Wed, Mar 16, 2016 at 9:40 AM, Yehuda Sadeh-Weinraub wrote: > On Tue, Mar 15, 2016 at 11:36 PM, Pavan Rallabhandi > wrote: > > Hi, > > > > I find this to be discussed here before, but couldn¹t find any solution >

Re: [ceph-users] Radosgw (civetweb) hangs once around 850 established connections

2016-03-19 Thread Ben Hines
tweb num_threads=125 error_log_file=/var/log/radosgw/civetweb.error.log access_log_file=/var/log/radosgw/civetweb.access.log rgw num rados handles = 32 You can also up civetweb loglevel: debug civetweb = 20 -Ben On Wed, Mar 16, 2016 at 5:03 PM, seapasu...@uchicago.edu < seapasu...@uchicago.edu> wro

Re: [ceph-users] ceph-disk from jewel has issues on redhat 7

2016-03-15 Thread Ben Hines
It seems like ceph-disk is often breaking on centos/redhat systems. Does it have automated tests in the ceph release structure? -Ben On Tue, Mar 15, 2016 at 8:52 AM, Stephen Lord wrote: > > Hi, > > The ceph-disk (10.0.4 version) command seems to have problems operating on > a

Re: [ceph-users] Ceph Recovery Assistance, pgs stuck peering

2016-03-08 Thread Ben Hines
After making that setting, the pg appeared to start peering but then it actually changed the primary OSD to osd.100 - then went incomplete again. Perhaps it did that because another OSD had more data? I presume i need to set that value on each osd where the pg hops to. -Ben On Tue, Mar 8, 2016

[ceph-users] Ceph Recovery Assistance, pgs stuck peering

2016-03-07 Thread Ben Hines
ncorrect crushmaps caused ceph to put some data on the wrong osds, resulting in a peering failure later when the map repaired itself? - How does ceph determine what node an OSD is on? That process may be periodically failing due to some issue. (dns?) - P

Re: [ceph-users] abort slow requests ?

2016-03-04 Thread Ben Hines
Thanks, working on fixing the peering objects. Going to attempt a recovery on the bad pgs tomorrow. The corrupt OSD which they were on was marked 'lost' so i expected it wouldn't try to peer with it anymore. Anyway I do have the data, at least. -Ben On Fri, Mar 4, 2016 a

[ceph-users] abort slow requests ?

2016-03-03 Thread Ben Hines
7;slow requests' once they get to certain amount of time? Rather than building up and blocking everything.. -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw refuses to initialize / waiting for peered 'notify' object

2016-03-02 Thread Ben Hines
/ceph/osd/ceph-99/current/4.95_head On 64, that dir is empty. We had one osd which went bad and was removed, which was involved in this pg. Any next steps here? Are these 'notify' objects safe to nuke? I tried a repair/scrub on it, didnt seem to have an effect or log anywhere. Any assistance is appreciated... -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5

2016-02-24 Thread Ben Hines
s log = False rgw frontends = civetweb port=8080 num_threads=75 error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log rgw num rados handles = 10 rgw cache lru size = 3 debug civetweb = 10 rgw override bucke

[ceph-users] incorrect numbers in ceph osd pool stats

2016-02-18 Thread Ben Hines
{"pool_name":".rgw.buckets","pool_id":12,"recovery":{"degraded_objects":4,"degraded_total":83,"degraded_ratio":0.048193,"misplaced_objects":18446744073709551373,"misplaced_total":83,"misplaced_ratio":-2

Re: [ceph-users] Upgrading Ceph

2016-02-01 Thread Ben Hines
d one once. -Ben On Wed, Jan 27, 2016 at 6:00 AM, Vlad Blando wrote: > Hi, > > I have a production Ceph Cluster > - 3 nodes > - 3 mons on each nodes > - 9 OSD @ 4TB per node > - using ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6) > > ​Now I want to upgrade

[ceph-users] RGW Civetweb + CentOS7 boto errors

2016-01-29 Thread Ben Hines
there is some sysctl setting or other tuning that I should be using. I may try going back to apache + fastcgi as an experiment (if it still works with Infernalis?) thanks, -Ben ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.c

  1   2   >