Re: [ceph-users] mon problem after power failure

2015-01-09 Thread Joao Eduardo Luis
On 01/09/2015 04:31 PM, Jeff wrote: We had a power failure last night and our five node cluster has two nodes with mon's that fail to start. Here's what we see: # /usr/bin/ceph-mon --cluster=ceph -i ceph2 -f 2015-01-09 11:28:45.579267 b6c10740 -1 ERROR: on disk data includes unsupported featu

Re: [ceph-users] Ceph as backend for Swift

2015-01-09 Thread Mark Kirkwood
It is not too difficult to get going, once you add various patches so it works: - missing __init__.py - Allow to set ceph.conf - Fix write issue: ioctx.write() does not return the written length - Add param to async_update call (for swift in Juno) There are a number of forks/pulls etc for these

Re: [ceph-users] cephfs modification time

2015-01-09 Thread Lorieri
first 3 stat commands shows blocks and size changing, but not the times after a touch it changes and tail works I saw some cephfs freezes related to it, it came back after touching the files coreos2 logs # stat deis-router.log File: 'deis-router.log' Size: 148564 Blocks: 291IO Blo

[ceph-users] cephfs modification time

2015-01-09 Thread Lorieri
Hi, I have a program that tails a file and this file is create on another machine some tail programs does not work because the modification time is not updated in the remote machines I've find this old thread http://permalink.gmane.org/gmane.comp.file-systems.ceph.devel/11001 it mentions the pr

Re: [ceph-users] RHEL 7 Installs

2015-01-09 Thread Travis Rhoden
Hi John, For the last part, there being two different versions of packages in Giant, I don't think that's the actual problem. What's really happening there is that python-ceph has been obsoleted by other packages that are getting picked up by Yum. See the line that says "Package python-ceph is o

[ceph-users] RHEL 7 Installs

2015-01-09 Thread John Wilkins
Ken, I had a number of issues installing Ceph on RHEL 7, which I think are mostly due to dependencies. I followed the quick start guide, which gets the latest major release--e.g., Firefly, Giant. ceph.conf is here: http://goo.gl/LNjFp3 ceph.log common errors included: http://goo.gl/yL8UsM To res

Re: [ceph-users] backfill_toofull, but OSDs not full

2015-01-09 Thread Udo Lembke
Hi, I had an similiar effect two weeks ago - 1PG backfill_toofull and due reweighting and delete there was enough free space but the rebuild process stopped after a while. After stop and start ceph on the second node, the rebuild process runs without trouble and the backfill_toofull are gone. Thi

Re: [ceph-users] backfill_toofull, but OSDs not full

2015-01-09 Thread c3
In this case the root cause was half denied reservations. http://tracker.ceph.com/issues/9626 This stopped backfills since, those listed as backfilling were actually half denied and doing nothing. The toofull status is not checked until a free backfill slot happens, so everything was just

[ceph-users] Ceph configuration on multiple public networks.

2015-01-09 Thread J-P Methot
Hi, We've setup ceph and openstack on a fairly peculiar network configuration (or at least I think it is) and I'm looking for information on how to make it work properly. Basically, we have 3 networks, a management network, a storage network and a cluster network. The management network is o

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Gregory Farnum
On Fri, Jan 9, 2015 at 2:00 AM, Nico Schottelius wrote: > Lionel, Christian, > > we do have the exactly same trouble as Christian, > namely > > Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]: >> We still don't know what caused this specific error... > > and > >> ...there is currently

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Robert LeBlanc
On Fri, Jan 9, 2015 at 3:00 AM, Nico Schottelius wrote: > Even though I do not like the fact that we lost a pg for > an unknown reason, I would prefer ceph to handle that case to recover to > the best possible situation. > > Namely I wonder if we can integrate a tool that shows > which (parts of)

Re: [ceph-users] backfill_toofull, but OSDs not full

2015-01-09 Thread Craig Lewis
What was the osd_backfill_full_ratio? That's the config that controls backfill_toofull. By default, it's 85%. The mon_osd_*_ratio affect the ceph status. I've noticed that it takes a while for backfilling to restart after changing osd_backfill_full_ratio. Backfilling usually restarts for me in

Re: [ceph-users] ceph on peta scale

2015-01-09 Thread Gregory Farnum
On Thu, Jan 8, 2015 at 5:46 AM, Zeeshan Ali Shah wrote: > I just finished configuring ceph up to 100 TB with openstack ... Since we > are also using Lustre in our HPC machines , just wondering what is the > bottle neck in ceph going on Peta Scale like Lustre . > > any idea ? or someone tried it I

Re: [ceph-users] Documentation of ceph pg query

2015-01-09 Thread Gregory Farnum
On Fri, Jan 9, 2015 at 1:24 AM, Christian Eichelmann wrote: > Hi all, > > as mentioned last year, our ceph cluster is still broken and unusable. > We are still investigating what has happened and I am taking more deep > looks into the output of ceph pg query. > > The problem is that I can find so

Re: [ceph-users] Uniform distribution

2015-01-09 Thread Gregory Farnum
100GB objects (or ~40 on a hard drive!) are way too large for you to get an effective random distribution. -Greg On Thu, Jan 8, 2015 at 5:25 PM, Mark Nelson wrote: > On 01/08/2015 03:35 PM, Michael J Brewer wrote: >> >> Hi all, >> >> I'm working on filling a cluster to near capacity for testing p

Re: [ceph-users] Documentation of ceph pg query

2015-01-09 Thread John Wilkins
Have you looked at http://ceph.com/docs/master/rados/operations/monitoring-osd-pg/ http://ceph.com/docs/master/rados/operations/pg-states/ http://ceph.com/docs/master/rados/operations/pg-concepts/ On Fri, Jan 9, 2015 at 1:24 AM, Christian Eichelmann wrote: > Hi all, > > as mentioned last year, o

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Jiri Kanicky
Hi Nico, I would probably recommend to upgrade to 0.87 (giant). I am running this version for some time now and it works very well. I also upgraded from firefly and it was easy. The issue you are experiencing seems quite complex and it would require debug logs to troubleshoot. Apology that

Re: [ceph-users] rbd directory listing performance issues

2015-01-09 Thread Shain Miley
Although it seems like having a regularly scheduled cron job to do a recursive directory listing may be ok for us as a bit of a work around...I am still in the processes of trying to improve performance. A few other questions have come up as a result. a)I am in the process of looking at specs

Re: [ceph-users] Slow/Hung IOs

2015-01-09 Thread Craig Lewis
I doesn't seem like the problem here, but I've noticed that slow OSDs have a large fan-out. I have less than 100 OSDs, so every OSD talks to every other OSD in my cluster. I was getting slow notices from all of my OSDs. Nothing jumped out, so I started looking at disk write latency graphs. I no

Re: [ceph-users] Uniform distribution

2015-01-09 Thread Mark Nelson
I didn't actually calculate the per-OSD object density but yes, I agree that will hurt. On 01/09/2015 12:09 PM, Gregory Farnum wrote: 100GB objects (or ~40 on a hard drive!) are way too large for you to get an effective random distribution. -Greg On Thu, Jan 8, 2015 at 5:25 PM, Mark Nelson wr

[ceph-users] mon problem after power failure

2015-01-09 Thread Jeff
We had a power failure last night and our five node cluster has two nodes with mon's that fail to start. Here's what we see: # /usr/bin/ceph-mon --cluster=ceph -i ceph2 -f 2015-01-09 11:28:45.579267 b6c10740 -1 ERROR: on disk data includes unsupported features: compat={},rocompat={},incompat={6

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-09 Thread Andrey Korolyov
On Fri, Jan 9, 2015 at 7:17 AM, Robert LeBlanc wrote: > Protect against bit rot. Checked on read and on deep scrub. There are still issues (at least in firefly) with FDCache and scrub completion having corrupted on-disk data, so throughout checksumming will not cover every possible corruption cas

[ceph-users] Documentation of ceph pg query

2015-01-09 Thread Christian Eichelmann
Hi all, as mentioned last year, our ceph cluster is still broken and unusable. We are still investigating what has happened and I am taking more deep looks into the output of ceph pg query. The problem is that I can find some informations about what some of the sections mean, but mostly I can on

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-09 Thread Christian Balzer
On Thu, 8 Jan 2015 21:17:12 -0700 Robert LeBlanc wrote: > On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer wrote: > > On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote: > > Which of course currently means a strongly consistent lockup in these > > scenarios. ^o^ > > That is one way of puttin

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2015-01-09 Thread Robert LeBlanc
On Thu, Jan 8, 2015 at 8:31 PM, Christian Balzer wrote: > On Thu, 8 Jan 2015 11:41:37 -0700 Robert LeBlanc wrote: > Which of course currently means a strongly consistent lockup in these > scenarios. ^o^ That is one way of putting it > Slightly off-topic and snarky, that strong consistency is of

Re: [ceph-users] Ceph Minimum Cluster Install (ARM)

2015-01-09 Thread Christian Balzer
On Thu, 8 Jan 2015 01:35:03 + Garg, Pankaj wrote: > Hi, > I am trying to get a very minimal Ceph cluster up and running (on ARM) > and I'm wondering what is the smallest unit that I can run rados-bench > on ? Documentation at > (http://ceph.com/docs/next/start/quick-ceph-deploy/) seems to refe

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Nico Schottelius
Good morning Jiri, sure, let me catch up on this: - Kernel 3.16 - ceph: 0.80.7 - fs: xfs - os: debian (backports) (1x)/ubuntu (2x) Cheers, Nico Jiri Kanicky [Fri, Jan 09, 2015 at 10:44:33AM +1100]: > Hi Nico. > > If you are experiencing such issues it would be good if you provide more info >

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Nico Schottelius
Lionel, Christian, we do have the exactly same trouble as Christian, namely Christian Eichelmann [Fri, Jan 09, 2015 at 10:43:20AM +0100]: > We still don't know what caused this specific error... and > ...there is currently no way to make ceph forget about the data of this pg > and create it as

[ceph-users] question about S3 multipart upload ignores request headers

2015-01-09 Thread baijia...@126.com
I patch the http://tracker.ceph.com/issues/8452 run s3 test suite and still is error; err log: ERROR: failed to get obj attrs, obj=test-client.0-31zepqoawd8dxfa-212:_multipart_mymultipart.2/0IQGoJ7hG8ZtTyfAnglChBO79HUsjeC.meta ret=-2 I found code that it may has problem: when function exec "re

Re: [ceph-users] PG num calculator live on Ceph.com

2015-01-09 Thread Irek Fasikhov
Very very good :) пт, 9 янв. 2015, 2:17, William Bloom (wibloom) : > Awesome, thanks Michael. > > > > Regards > > William > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Michael J. Kidd > *Sent:* Wednesday, January 07, 2015 2:09 PM > *To:* ceph-us...@ceph.c

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Christian Eichelmann
Hi Lionel, we have a ceph cluster with in sum about 1PB, 12 OSDs with 60 Disks, devided into 4 racks in 2 rooms, all connected with a dedicated 10G cluster network. Of course with a replication level of 3. We did about 9 Month intensive testing. Just like you, we were never experiences that kind

Re: [ceph-users] Erasure coded PGs incomplete

2015-01-09 Thread Nick Fisk
Hi Italo, If you check for a post from me from a couple of days back, I have done exactly this. I created a k=5 m=3 over 4 hosts. This ensured that I could lose a whole host and then an OSD on another host and the cluster was still fully operational. I’m not sure if my method I used i