Re: [ceph-users] logrotate

2014-07-11 Thread James Eckersall
Upon further investigation, it looks like this part of the ceph logrotate script is causing me the problem: if [ -e "/var/lib/ceph/$daemon/$f/done" ] && [ -e "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e "/var/lib/ceph/$daemon/$f/sysvinit" ]; then I don't have a "done" file in the mounted direct

[ceph-users] question about crushmap

2014-07-11 Thread Simone Spinelli
Dear list, we are new to ceph and we are planning to install a ceph cluster over two datacenters. The situation is: DC1: 2 racks DC2: 1 racks We want to have one replica per rack and more generally two replicas in the first DC and one in the other one. So now we are stuck on the crushmap: how to

Re: [ceph-users] question about crushmap

2014-07-11 Thread Robert van Leeuwen
> We want to have one replica per rack and more generally two replicas in > the first DC and one in the other one. > So now we are stuck on the crushmap: how to force the cluster to put two > replicas in the first dc? > Is that related to th bucket's weight? You can fix that in the crush map bucke

Re: [ceph-users] ceph mount not working anymore

2014-07-11 Thread Alfredo Deza
Joshua it looks like you got Ceph from EPEL (that version has the '-2' slapped on it). And it is why you are seeing this for ceph: ceph-0.80.1-2.el6.x86_64 And this for others: libcephfs1-0.80.1-0.el6.x86_64 Make sure that you do get Ceph from our repos. Newer versions of ceph-deploy fix this b

Re: [ceph-users] ceph mount not working anymore

2014-07-11 Thread Sage Weil
On Thu, 10 Jul 2014, Joshua McClintock wrote: >         { "rule_id": 1, >           "rule_name": "erasure-code", >           "ruleset": 1, >           "type": 3, The presence of the erasure code CRUSH rules it what is preventing the kernel client from mounting. Upgrade to a newer kernel (3.14 I

Re: [ceph-users] logrotate

2014-07-11 Thread Sage Weil
On Fri, 11 Jul 2014, James Eckersall wrote: > Upon further investigation, it looks like this part of the ceph logrotate > script is causing me the problem: > > if [ -e "/var/lib/ceph/$daemon/$f/done" ] && [ -e > "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e > "/var/lib/ceph/$daemon/$f/sysvinit" ]

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Randy Smith
On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just wrote: > It could be an indication of a problem on osd 5, but the timing is > worrying. Can you attach your ceph.conf? Attached. > Have there been any osds > going down, new osds added, anything to cause recovery? I upgraded to firefly last week

Re: [ceph-users] logrotate

2014-07-11 Thread James Eckersall
Hi Sage, Many thanks for the info. I have inherited this cluster, but I believe it may have been created with mkcephfs rather than ceph-deploy. I'll touch the done files and see what happens. Looking at the logic in the logrotate script I'm sure this will resolve the problem. Thanks J On 11

Re: [ceph-users] ceph mount not working anymore

2014-07-11 Thread Joshua McClintock
Thanks Sage! What had happened prior to me upgrading was that I added an erasure coded pool, but all my OSDs began to crash. The ec profile didn't seem to cause the crash, so I left it, but once I removed the pool, the crashes stopped. Do you guys want any of the core dumps, or is anything short

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
When you get the next inconsistency, can you copy the actual objects from the osd store trees and get them to us? That might provide a clue. -Sam On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith wrote: > > > > On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just wrote: >> >> It could be an indication of a

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Sage Weil
One other thing we might also try is catching this earlier (on first read of corrupt data) instead of waiting for scrub. If you are not super performance sensitive, you can add filestore sloppy crc = true filestore sloppy crc block size = 524288 That will track and verify CRCs on any large (

Re: [ceph-users] ceph mount not working anymore

2014-07-11 Thread Joshua McClintock
Hello Alfredo, isn't this what the 'ceph-release-1-0.el6.noarch' package is for in my rpm -qa list? Here are the yum repo files I have in /etc/yum.repos.d. I don't see any priorities in the ceph one which is where libcephfs1 comes from I think. I tried to 'yum reinstall ceph-release', but the fi

Re: [ceph-users] ceph mount not working anymore

2014-07-11 Thread Tregaron Bayly
"Ceph is short of Cephalopod, a class of mollusks that includes the octopus, squid, and cuttlefish" http://community.redhat.com/blog/2014/06/ceph-turns-10-a-look-back/ On Fri, 2014-07-11 at 10:48 -0700, Tuite, John E. wrote: > Is Ceph an acronym? If yes, what? > > > > John Tuite > > Corpor

Re: [ceph-users] ceph mount not working anymore

2014-07-11 Thread Tuite, John E.
thanks John Tuite Corporate Global Infrastructure Services Pittsburgh Manager, Information Technology Global Hosting Services Thermo Fisher Scientific 600 Business Center Drive Pittsburgh, Pennsylvania 15205 Office 412-490-7292 Mobile 412-897-3401 Fax 412-490-9401 john.tu...@thermofisher.com http:

[ceph-users] latency for asynchronous read

2014-07-11 Thread Shayan Saeed
Hi, I am using the librados python api to do a lot of rapid reads on objects. I am using a callback function def oncomplete(self, completion,data_read) which is called whenever an object has been read. Is there a way to identify which object has completed the read request? I want to measure the

[ceph-users] Way to lower latencies while recovering?

2014-07-11 Thread Stefan Priebe
Hello, while recovering is running my virtual machine latencies using qemu rbd goes up from 3-7ms to 250ms-350ms. So all applications inside vms are slow. I already have those osd_recovery_max_active = 1 osd_max_backfills = 1 osd_recovery_op_priority = 5 osd_recover_clone_

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
Also, what filesystem are you using? -Sam On Fri, Jul 11, 2014 at 10:37 AM, Sage Weil wrote: > One other thing we might also try is catching this earlier (on first read > of corrupt data) instead of waiting for scrub. If you are not super > performance sensitive, you can add > > filestore slopp

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Randy Smith
Greetings, I'm using xfs. Also, when, in a previous email, you asked if I could send the object, do you mean the files from each server named something like this: ./3.c6_head/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.000b__head_34DC35C6__3 ? On Fri, Jul 11, 2014 at 2:00 PM, Samuel Just wr

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
Right. -Sam On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith wrote: > Greetings, > > I'm using xfs. > > Also, when, in a previous email, you asked if I could send the object, do > you mean the files from each server named something like this: > ./3.c6_head/DIR_6/DIR_C/DIR_5/rb.0.b0ce3.238e1f29.00

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
And grab the xattrs as well. -Sam On Fri, Jul 11, 2014 at 2:39 PM, Samuel Just wrote: > Right. > -Sam > > On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith wrote: >> Greetings, >> >> I'm using xfs. >> >> Also, when, in a previous email, you asked if I could send the object, do >> you mean the files f

[ceph-users] Chatty OSD host to monitor

2014-07-11 Thread Tregaron Bayly
I have a four node ceph cluster for testing. As I'm watching the relatively idle cluster I'm seeing quite a bit of traffic from one of the OSD nodes to the monitor. This node has 8 OSDs and each of them are involved in this behavior, but none of the other 24 OSDs located on the other nodes are.

[ceph-users] v0.80.3 released

2014-07-11 Thread Sage Weil
v0.80.3 Firefly === This is the third Firefly point release. It includes a single fix for a radosgw regression that was discovered in v0.80.2 right after it was released. We recommand that all v0.80.x Firefly users upgrade. Notable Changes --- * radosgw: fix regression

Re: [ceph-users] qemu image create failed

2014-07-11 Thread Yonghua Peng
Anybody knows this issue? thanks.  Fri, 11 Jul 2014 10:26:47 +0800 from Yonghua Peng : >Hi, > >I try to create a qemu image, but got failed. > >ceph@ceph:~/my-cluster$ qemu-img create -f rbd rbd:rbd/qemu 2G >Formatting 'rbd:rbd/qemu', fmt=rbd size=2147483648 cluster_size=0 >qemu-img: error co

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Randy Smith
Greetings, Well it happened again with two pgs this time, still in the same rbd image. They are at http://people.adams.edu/~rbsmith/osd.tar. I think I grabbed the files correctly. If not, let me know and I'll try again on the next failure. It certainly is happening often enough. On Fri, Jul 11,