[ceph-users] ceph durability calculation and test method

2017-06-13 Thread Z Will
Hi all : I have some questions about the durability of ceph. I am trying to mesure the durability of ceph .I konw it should be related with host and disk failing probability, failing detection time, when to trigger the recover and the recovery time . I use it with multiple replication, say k

[ceph-users] v11.2.0 Disk activation issue while booting

2017-06-13 Thread nokia ceph
Hello, Some osd's not getting activated after a reboot operation which cause that particular osd's landing in failed state. Here you can see mount points were not getting updated to osd-num and mounted as a incorrect mount point, which caused osd. can't able to mount/activate the osd's. Env:- RH

[ceph-users] Ceph Jewel XFS calltraces

2017-06-13 Thread list
Hello guys, we have currently an issue with our ceph setup based on XFS. Sometimes some nodes are dying with high load with this calltrace in dmesg: [Tue Jun 13 13:18:48 2017] BUG: unable to handle kernel NULL pointer dereference at 00a0 [Tue Jun 13 13:18:48 2017] IP: [] xfs_da3_

Re: [ceph-users] osd_op_tp timeouts

2017-06-13 Thread Mark Nelson
Hi Tyler, I wanted to make sure you got a reply to this, but unfortunately I don't have much to give you. It sounds like you already took a look at the disk metrics and ceph is probably not waiting on disk IO based on your description. If you can easily invoke the problem, you could attach g

Re: [ceph-users] v11.2.0 Disk activation issue while booting

2017-06-13 Thread David Turner
I came across this a few times. My problem was with journals I set up by myself. I didn't give them the proper GUID partition type ID so the udev rules didn't know how to make sure the partition looked correct. What the udev rules were unable to do was chown the journal block device as ceph:ceph

Re: [ceph-users] osd_op_tp timeouts

2017-06-13 Thread Bryan Stillwell
Is this on an RGW cluster? If so, you might be running into the same problem I was seeing with large bucket sizes: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018504.html The solution is to shard your buckets so the bucket index doesn't get too big. Bryan From: ceph-users o

[ceph-users] ceph pg repair : Error EACCES: access denied

2017-06-13 Thread Jake Grimmett
Dear All, I'm testing Luminous and have a problem repairing inconsistent pgs. This occurs with v12.0.2 and is still present with v12.0.3-1507-g52f0deb # ceph health HEALTH_ERR noout flag(s) set; 2 pgs inconsistent; 2 scrub errors # ceph health detail HEALTH_ERR noout flag(s) set; 2 pgs inconsist

Re: [ceph-users] Ceph Jewel XFS calltraces

2017-06-13 Thread Emmanuel Florac
Le Tue, 13 Jun 2017 14:30:05 +0200 l...@jonas-server.de écrivait: > [Tue Jun 13 13:18:48 2017] CPU: 3 PID: 3844 Comm: tp_fstore_op Not > tainted 4.4.0-75-generic #96-Ubuntu Looks like a kernel bug. However this isn't completely up to date, 4.4.0-79 is available. You'd probably better post this o

Re: [ceph-users] ceph pg repair : Error EACCES: access denied

2017-06-13 Thread Gregory Farnum
What are the cephx permissions of the key you are using to issue repair commands? On Tue, Jun 13, 2017 at 8:31 AM Jake Grimmett wrote: > Dear All, > > I'm testing Luminous and have a problem repairing inconsistent pgs. This > occurs with v12.0.2 and is still present with v12.0.3-1507-g52f0deb > >

Re: [ceph-users] Living with huge bucket sizes

2017-06-13 Thread Eric Choi
Hello all, I work in the same team as Tyler here, and I can provide more info here.. The cluster is indeed an RGW cluster, with many small (100 KB) objects similar to your use case Bryan. But we have the blind bucket set up with "index_type": 1 for this particular bucket, as we wanted to avoid

Re: [ceph-users] Effect of tunables on client system load

2017-06-13 Thread Gregory Farnum
On Thu, Jun 8, 2017 at 11:11 PM Nathanial Byrnes wrote: > Hi All, >First, some background: >I have been running a small (4 compute nodes) xen server cluster > backed by both a small ceph (4 other nodes with a total of 18x 1-spindle > osd's) and small gluster cluster (2 nodes each with

Re: [ceph-users] osd_op_tp timeouts

2017-06-13 Thread Eric Choi
I realized I sent this under wrong thread: here I am sending it again: --- Hello all, I work in the same team as Tyler here, and I can provide more info here.. The cluster is indeed an RGW cluster, with many small (100 KB) objects similar to your use case Bryan. But we have the blind bucket se

Re: [ceph-users] Effect of tunables on client system load

2017-06-13 Thread Nathanial Byrnes
Thanks very much for the insights Greg! My most recent suspicion around the resource consumption is that, with my current configuration, xen is provisioning rbd-nbd storage for guests, rather than just using the kernel module like I was last time around. And, (while I'm unsure of how this works) b

Re: [ceph-users] Sparse file info in filestore not propagated to other OSDs

2017-06-13 Thread Paweł Sadowski
On 04/13/2017 04:23 PM, Piotr Dałek wrote: > On 04/06/2017 03:25 PM, Sage Weil wrote: >> On Thu, 6 Apr 2017, Piotr Dałek wrote: >>> Hello, >>> >>> We recently had an interesting issue with RBD images and filestore >>> on Jewel >>> 10.2.5: >>> We have a pool with RBD images, all of them mostly unt