Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-10 Thread Somnath Roy
Hi Alexandre, Thanks for sharing the data. I need to try out the performance on qemu soon and may come back to you if I need some qemu setting trick :-) Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Alexandre DERUMIER Sent: T

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-10 Thread Alexandre DERUMIER
>>I need to try out the performance on qemu soon and may come back to you if I >>need some qemu setting trick :-) Sure no problem. (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 iothread by disk) - Mail original - De: "Somnath Roy" À: "aderumier" , "Irek F

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Dan van der Ster
This is a CRUSH misconception. Triple drive failures only cause data loss when they share a PG (e.g. ceph pg dump .. those [x,y,z] triples of OSDs are the only ones that matter). If you have very few OSDs, then its possibly true that any combination of disks would lead to failure. But as you increa

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Christian Balzer
Hello, As always, this has been discussed in the past, with people taking various bits of "truth" from it and a precise model for failure, the latest one is here: https://wiki.ceph.com/Development/Reliability_model/Final_report And there is this, last time it came up I felt it didn't take all th

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Jan Schermer
When you increase the number of OSDs, you generaly would (and should) increase the number of PGs. For us, the sweet spot for ~200 OSDs is 16384 PGs. RBD volume that has xxx GiBs of data gets striped across many PGs, so the probability that the volume loses at least part of its’ data is very sign

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Dan van der Ster
I'm not a mathematician, but I'm pretty sure there are 200 choose 3 = 1.3 million ways you can have 3 disks fail out of 200. nPGs = 16384 so that many combinations would cause data loss. So I think 1.2% of triple disk failures would lead to data loss. There might be another factor of 3! that needs

[ceph-users] adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption

2015-06-10 Thread Makkelie, R (ITCDCC) - KLM
i'm trying to add a extra monitor to my already existing cluster i do this with the ceph-deploy with the following command ceph-deploy mon add "mynewhost" the ceph-deploy says its all finished but when i take a look at my new monitor host in the logs i see the following error cephx: verify_repl

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Jan Schermer
Yeah, I know but I believe it was fixed so that a single copy is sufficient for recovery now (even with min_size=1)? Depends on what you want to achieve... The point is that even if we lost “just” 1% of data, that’s too much (>0%) when talking about customer data, and I know from experience that

Re: [ceph-users] osd_scrub_sleep, osd_scrub_chunk_{min,max}

2015-06-10 Thread Dan van der Ster
I don't know if/why they're not documented, but we use them plus the scrub stride and iopriority options too: [osd] osd scrub sleep = .1 osd disk thread ioprio class = idle osd disk thread ioprio priority = 0 osd scrub chunk max = 5 osd deep scrub stride = 1048576 Cheers, Dan On Wed, J

[ceph-users] CEPH on RHEL 7.1

2015-06-10 Thread Varada Kari
Hi, We are trying to build CEPH on RHEL7.1. But facing some issues with the build with "Giant" branch. Enabled the redhat server rpms and redhat ceph storage rpm channels along with optional, extras and supplementary. But we are not able to find gperftools, leveldb and yasm rpms in the channels

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Dan van der Ster
OK I wrote a quick script to simulate triple failures and count how many would have caused data loss. The script gets your list of OSDs and PGs, then simulates failures and checks if any permutation of that failure matches a PG. Here's an example with 1 simulations on our production cluster:

[ceph-users] How radosgw-admin gets usage information for each user

2015-06-10 Thread Nguyen Hoang Nam
Hi there, I am playing with ceph/radosgw. I want to know how radosgw-admin command get usage information for each user. I check ".usage" pool and see some object name "usage.X" . I locate object store but it has no content (0 KB). Using command `rados -p .usage listomapkeys usage.30`, it displa

[ceph-users] kernel: libceph socket closed (con state OPEN)

2015-06-10 Thread Daniel van Ham Colchete
Hello everyone! I have been doing some log analysis on my systems here, trying to detect problems before they affect my users. One thing I have found is that I have been seeing a lot of those logs here: Jun 10 06:47:09 10.3.1.1 kernel: [2960203.682638] libceph: osd2 10.3.1.2:6800 socket closed (c

[ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Nick Fisk
Hi, Using Kernel RBD client with Kernel 4.03 (I have also tried some older kernels with the same effect) and IO is being split into smaller IO's which is having a negative impact on performance. cat /sys/block/sdc/queue/max_hw_sectors_kb 4096 cat /sys/block/rbd0/queue/max_sectors_kb 4096 Using

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Dan van der Ster
Hi, I found something similar awhile ago within a VM. http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/045034.html I don't know if the change suggested by Ilya ever got applied. Cheers, Dan On Wed, Jun 10, 2015 at 1:47 PM, Nick Fisk wrote: > Hi, > > Using Kernel RBD clien

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Ilya Dryomov
On Wed, Jun 10, 2015 at 3:23 PM, Dan van der Ster wrote: > Hi, > > I found something similar awhile ago within a VM. > http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-November/045034.html > I don't know if the change suggested by Ilya ever got applied. Yeah, it got applied. We did

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Nick Fisk
Hi Dan, I found your post last night, it does indeed look like the default has been set to 4096 for the Kernel RBD client in the 4.0 kernel. I also checked a machine running 3.16 and this had 512 as the default. However in my case there seems to be something else which is affecting the max block

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Ilya Dryomov
On Wed, Jun 10, 2015 at 2:47 PM, Nick Fisk wrote: > Hi, > > Using Kernel RBD client with Kernel 4.03 (I have also tried some older > kernels with the same effect) and IO is being split into smaller IO's which > is having a negative impact on performance. > > cat /sys/block/sdc/queue/max_hw_sectors

[ceph-users] clock skew detected

2015-06-10 Thread Pavel V. Kaygorodov
Hi! Immediately after a reboot of mon.3 host its clock was unsynchronized and "clock skew detected on mon.3" warning is appeared. But now (more then 1 hour of uptime) the clock is synced, but the warning still showing. Is this ok? Or I have to restart monitor after clock synchronization? Pavel.

Re: [ceph-users] clock skew detected

2015-06-10 Thread Andrey Korolyov
On Wed, Jun 10, 2015 at 4:11 PM, Pavel V. Kaygorodov wrote: > Hi! > > Immediately after a reboot of mon.3 host its clock was unsynchronized and > "clock skew detected on mon.3" warning is appeared. > But now (more then 1 hour of uptime) the clock is synced, but the warning > still showing. > Is

[ceph-users] Blueprints

2015-06-10 Thread Patrick McGarry
Hey all, Given that there was some confusion and difficulty surrounding Blueprint submission, I'm going to leave the submission window open until the end of the week. Please get your blueprints added to the new tracker wiki by then so I can build a schedule and release it next week. Many requests

[ceph-users] Speaking opportunity at OpenNebula Cloud Day

2015-06-10 Thread Patrick McGarry
Hey all, Jaime from the OpenNebula team has offered up a speaking slot for Ceph at the upcoming event on 29 June (short notice) in Boston. If anyone is interested in giving a Ceph talk please let me know ASAP and I can help get you set up. Thanks. -- Best Regards, Patrick McGarry Director Cep

Re: [ceph-users] CEPH on RHEL 7.1

2015-06-10 Thread Ken Dreyer
- Original Message - > From: "Varada Kari" > To: "ceph-devel" > Cc: "ceph-users" > Sent: Wednesday, June 10, 2015 3:33:08 AM > Subject: [ceph-users] CEPH on RHEL 7.1 > > Hi, > > We are trying to build CEPH on RHEL7.1. But facing some issues with the build > with "Giant" branch. > Enabl

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Nick Fisk
> -Original Message- > From: Ilya Dryomov [mailto:idryo...@gmail.com] > Sent: 10 June 2015 14:06 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's > > On Wed, Jun 10, 2015 at 2:47 PM, Nick Fisk wrote: > > Hi, > > > > Using Kernel RBD

[ceph-users] unsubscribe

2015-06-10 Thread Jordan A Eliseo
unsubscribe ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Ilya Dryomov
On Wed, Jun 10, 2015 at 6:18 PM, Nick Fisk wrote: >> -Original Message- >> From: Ilya Dryomov [mailto:idryo...@gmail.com] >> Sent: 10 June 2015 14:06 >> To: Nick Fisk >> Cc: ceph-users >> Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's >> >> On Wed, Jun 10, 2015 at 2:

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Ilya Dryomov > Sent: 10 June 2015 16:23 > To: Nick Fisk > Cc: ceph-users > Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's > > On Wed, Jun 10, 2015 at 6:18 PM, Nick F

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Nick Fisk
> > >> -Original Message- > > >> From: Ilya Dryomov [mailto:idryo...@gmail.com] > > >> Sent: 10 June 2015 14:06 > > >> To: Nick Fisk > > >> Cc: ceph-users > > >> Subject: Re: [ceph-users] krbd splitting large IO's into smaller > > >> IO's > > >> > > >> On Wed, Jun 10, 2015 at 2:47 PM, Nick

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread Ilya Dryomov
On Wed, Jun 10, 2015 at 7:04 PM, Nick Fisk wrote: >> > >> -Original Message- >> > >> From: Ilya Dryomov [mailto:idryo...@gmail.com] >> > >> Sent: 10 June 2015 14:06 >> > >> To: Nick Fisk >> > >> Cc: ceph-users >> > >> Subject: Re: [ceph-users] krbd splitting large IO's into smaller >> > >>

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-10 Thread German Anders
hi guys, sorry that I hang on this email, I've four OSD servers with Ubuntu 14.04.1 LTS with 9 osd daemons each, 3TB drive size, and 3 ssd journal drives (each journal holds 3 osd daemons), the kernel version that I'm using is 3.18.3-031803-generic, and ceph version 0.82, I would like to know what

[ceph-users] High IO Waits

2015-06-10 Thread Nick Fisk
>From the looks of it you are reaching the maxing out your OSD’s. Some of them >are pushing over 500 iops, which is a lot for a 7.2k disk and at the high >queue depths IO’s will have to wait a long time to reach the front of the >queue. The only real thing I can suggest is to add more OSD’s whi

Re: [ceph-users] High IO Waits

2015-06-10 Thread German Anders
Thanks a lot Nick, I'll try with more PGs and if I don't see any improve I'll add more OSD servers to the cluster. Best regards, *German Anders* Storage System Engineer Leader *Despegar* | IT Team *office* +54 11 4894 3500 x3408 *mobile* +54 911 3493 7262 *mail* gand...@despegar.com 2015-06-10

[ceph-users] 6/10/2015 performance meeting recording

2015-06-10 Thread Mark Nelson
Hi All, A couple of folks have asked for a recording of the performance meeting this week as there was an excellent discussion today regarding simplemessenger optimization with Sage. Here's a link to the recording: https://bluejeans.com/s/8knV/ You can access this recording and all previous

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Vasiliy Angapov
Hi, I also wrote a simple script which calculates the data loss probabilities for triple disk failure. Here are some numbers: OSDs: 10, Pr: 138.89% OSDs: 20, Pr: 29.24% OSDs: 30, Pr: 12.32% OSDs: 40, Pr: 6.75% OSDs: 50, Pr: 4.25% OSDs: 100, Pr: 1.03% OSDs: 200, Pr: 0.25% OSDs: 500, Pr: 0

[ceph-users] S3 - grant user/group access to buckets

2015-06-10 Thread Sean
I was wondering on the best way to accomplish this: We have a set of buckets with thousands of keys inside. We need to grant and revoke access to these keys to users quickly. Is there a way to set this up via radosgw-admin or via s3? I thought that the bucket policy would be enough to stop use

Re: [ceph-users] 6/10/2015 performance meeting recording

2015-06-10 Thread Nick Fisk
Hi Mark, I've just watched the 1st part regarding the cache tiering and found it very interesting. I think you guys have hit the nail on the head regarding the unnecessary promotions as well as hurting performance on the current in flight IOs, they also have an impact on future IO's that need to b

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-10 Thread Christian Balzer
Hello, On Wed, 10 Jun 2015 23:53:48 +0300 Vasiliy Angapov wrote: > Hi, > > I also wrote a simple script which calculates the data loss probabilities > for triple disk failure. Here are some numbers: > OSDs: 10, Pr: 138.89% > OSDs: 20, Pr: 29.24% > OSDs: 30, Pr: 12.32% > OSDs: 40, Pr: 6.

[ceph-users] S3 expiration

2015-06-10 Thread Arkadi Kizner
Hello, I need to store expirable objects in Ceph for housekeeping purposes. I understand that developers are planning to implement it using Amazon S3 API. Does anybody know what is the status of this, or is there another approach for housekeeping available? Thanks. This email and any files tra