[ceph-users] Beginners ceph journal question

2015-06-09 Thread Vickey Singh
Hello Cephers Beginners question on Ceph Journals creation. Need answers from experts. - Is it true that by default ceph-deploy creates journal on dedicated partition and data on another partition. It does not creates journal on file ?? ceph-deploy osd create ceph-node1:/dev/sdb This commands i

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread pushpesh sharma
Hi Alexandre, We have also seen something very similar on Hammer(0.94-1). We were doing some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For some strange reason it was not able to scale 4K- RR iops on

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
Hi, >> We tried adding more RBDs to single VM, but no luck. If you want to scale with more disks in a single qemu vm, you need to use iothread feature from qemu and assign 1 iothread by disk (works with virtio-blk). It's working for me, I can scale with adding more disks. My bench here are do

Re: [ceph-users] Multiple journals and an OSD on one SSD doable?

2015-06-09 Thread koukou73gr
On 06/08/2015 11:54 AM, Jan Schermer wrote: > > This should indicate the real wear: 100 Gigabytes_Erased > 0x0032 000 000 000Old_age Always - 62936 > Bytes written after compression: 233 SandForce_Internal > 0x 000 000 000O

Re: [ceph-users] rbd format v2 support

2015-06-09 Thread Ilya Dryomov
On Tue, Jun 9, 2015 at 5:52 AM, David Z wrote: > Hi Ilya, > > Thanks for the reply. I knew that v2 image can be mapped if using default > striping parameters without --stripe-unit or --stripe-count. > > It is just the rbd performance (IOPS & bandwidth) we tested hasn't met our > goal. We found at

Re: [ceph-users] rbd cache + libvirt

2015-06-09 Thread Daniel Swarbrick
I presume that since QEMU 1.2+ sets the default cache mode to writeback if not otherwise specified, and since giant sets rbd_cache to true if not otherwise specified, then the result should be to cache? We have a fair number of VMs running on hosts where we don't specify either explicitly, and I'v

Re: [ceph-users] rbd cache + libvirt

2015-06-09 Thread Alexandre DERUMIER
>>If I understand the QEMU docs correctly, cache=unsafe would immediately >>ack the guest's fsync() - at the risk of data losss if the QEMU process >>crashes. That's not true with rbd block driver. The qemu cache options only set rbd_cache=true|false rbd.c if (flags & BDRV_O_NOCACHE) {

Re: [ceph-users] rbd cache + libvirt

2015-06-09 Thread Andrey Korolyov
On Tue, Jun 9, 2015 at 7:59 AM, Alexandre DERUMIER wrote: > host conf : rbd_cache=true : guest cache=none : result : cache (wrong) Thanks Alexandre, so you are confirming that this exact case misbehaves? ___ ceph-users mailing list ceph-users@li

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
It's seem that the limit is mainly going in high queue depth (+- > 16) Here the result in iops with 1client- 4krandread- 3osd - with differents queue depth size. rbd_cache is almost the same than without cache with queue depth <16 cache - qd1: 1651 qd2: 3482 qd4: 7958 qd8: 17912 qd16: 36020

Re: [ceph-users] rbd cache + libvirt

2015-06-09 Thread Alexandre DERUMIER
>>Thanks Alexandre, so you are confirming that this exact case misbehaves? The rbd_cache value from ceph.conf always override the cache value from qemu. My personnal opinion is this is wrong. qemu value should overrive the ceph.conf value. I don't known what happen in a live migration for examp

Re: [ceph-users] rbd cache + libvirt

2015-06-09 Thread Andrey Korolyov
On Tue, Jun 9, 2015 at 11:51 AM, Alexandre DERUMIER wrote: >>>Thanks Alexandre, so you are confirming that this exact case misbehaves? > > The rbd_cache value from ceph.conf always override the cache value from qemu. > > My personnal opinion is this is wrong. qemu value should overrive the > ceph

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Mark Nelson
Hi All, In the past we've hit some performance issues with RBD cache that we've fixed, but we've never really tried pushing a single VM beyond 40+K read IOPS in testing (or at least I never have). I suspect there's a couple of possibilities as to why it might be slower, but perhaps joshd can

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
>>Frankly, I'm a little impressed that without RBD cache we can hit 80K >>IOPS from 1 VM! Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll have overhead. (I'm planning to send results in qemu soon) >>How fast are the SSDs in those 3 OSDs? Theses results are with dat

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Jason Dillaman
> In the past we've hit some performance issues with RBD cache that we've > fixed, but we've never really tried pushing a single VM beyond 40+K read > IOPS in testing (or at least I never have). I suspect there's a couple > of possibilities as to why it might be slower, but perhaps joshd can > chi

Re: [ceph-users] Beginners ceph journal question

2015-06-09 Thread Michael Kuriger
You could mount /dev/sdb to a filesystem, such as /ceph-disk, and then do this: ceph-deploy osd create ceph-node1:/ceph-disk Your journal would be a file doing it this way. [yp] Michael Kuriger Sr. Unix Systems Engineer * mk7...@yp.com |* 818-649-7235 From: Vickey Sing

Re: [ceph-users] Beginners ceph journal question

2015-06-09 Thread Vickey Singh
Thanks Michael for your response. Could you also please help in understanding #1 On my ceph cluster , how can i confirm if journal is on block device partition or on file ? #2 Is it true that by default ceph-deploy creates journal on dedicated partition and data on another partition if i use t

[ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-09 Thread kevin parrikar
I have 4 node cluster each with 5 disks (4 OSD and 1 Operating system also hosting 3 monitoring process) with default replica 3. Total OSD disks : 16 Total Nodes : 4 How can i calculate the - Maximum number of disk failures my cluster can handle with out any impact on current data and new

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I also saw a similar performance increase by using alternative memory allocators. What I found was that Ceph OSDs performed well with either tcmalloc or jemalloc (except when RocksDB was built with jemalloc instead of tcmalloc, I'm still working to d

Re: [ceph-users] Beginners ceph journal question

2015-06-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 #1 `readlink /var/lib/ceph/osd/-/journal` If it returns nothing then it is a file, if it returns something it is a partition. #2 It appears that the default of ceph-deploy is to create and use a partition (I don't use ceph-deploy, so I can't be auth

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-09 Thread Nick Fisk
Hi Kevin, Ceph by default will make sure no copies of the data are on the same host. So with a replica count of 3, you could lose 2 hosts without losing any data or operational ability. If by some luck all disk failures were constrained to 2 hosts, you could in theory have up to 8 disks fail

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 If you are using the default rule set (which I think has min_size 2), you can sustain 1-4 disk failures or one host failures. The reason disk failures vary so wildly is that you can lose all the disks in host. You can lose up to another 4 disks (in

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
Hi Robert, >>What I found was that Ceph OSDs performed well with either >>tcmalloc or jemalloc (except when RocksDB was built with jemalloc >>instead of tcmalloc, I'm still working to dig into why that might be >>the case). yes,from my test, for osd tcmalloc is a little faster (but very little

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
>>At high queue-depths and high IOPS, I would suspect that the bottleneck is >>the single, coarse-grained mutex protecting the cache data structures. It's >>been a back burner item to refactor the current cache mutex into >>finer->>grained locks. >> >>Jason Thanks for the explain Jason. Anyw

Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

2015-06-09 Thread Gregory Farnum
On Mon, Jun 8, 2015 at 5:20 PM, Francois Lafont wrote: > Hi, > > On 27/05/2015 22:34, Gregory Farnum wrote: > >> Sorry for the delay; I've been traveling. > > No problem, me too, I'm not really fast to answer. ;) > >>> Ok, I see. According to the online documentation, the way to close >>> a cephfs

[ceph-users] .New Ceph cluster - cannot add additional monitor

2015-06-09 Thread Mike Carlson
We have a new ceph cluster, and when I follow the guide ( http://ceph.com/docs/master/start/quick-ceph-deploy/) during the section where you can add additional monitors, it fails, and it almost seems like its using an improper ip address We have 4 nodes: - lts-mon - lts-osd1 - lts-osd2

[ceph-users] RGW blocked threads/timeouts

2015-06-09 Thread Daniel Maraio
Hello Cephers, I had a question about something we experience in our cluster. When we add new capacity or suffer failures we will often get blocked requests during the rebuilding. This leads to threads from the RGW blocking and eventually no longer serving new requests. I suspect that if we

[ceph-users] cephx error - renew key

2015-06-09 Thread tombo
Hello guys, today we had one storage (19xosd) down for 4 hours and now we are observing different problems and when I tried to restart one osd, I got error related to cephx 2015-06-09 21:09:49.983522 7fded00c7700 0 auth: could not find secret_id=6238 2015-06-09 21:09:49.983585 7fded00c7700

Re: [ceph-users] apply/commit latency

2015-06-09 Thread Gregory Farnum
On Thu, Jun 4, 2015 at 3:57 AM, Межов Игорь Александрович wrote: > Hi! > >> My deployments have seen many different versions of ceph. Pre 0.80.7, I've >> seen those numbers being pretty high. After upgrading to 0.80.7, all of a >> sudden, commit latency of all OSDs drop to 0-1ms, and apply latency

[ceph-users] Nginx access ceph

2015-06-09 Thread Ram Chander
Hi, I am trying to setup nginx to access html files in ceph buckets. I have setup -> https://github.com/anomalizer/ngx_aws_auth Below is the nginx config . When I try to access http://hostname:8080/test/b.html -> shows signature mismatch. http://hostname:8080/b.html -> shows signature mismatch

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
Hi, I have tested qemu with last tcmalloc 2.4, and the improvement is huge with iothread: 50k iops (+45%) ! qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : no-iothread : tcmalloc (2.4)

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Irek Fasikhov
Hi, Alexandre. Very good work! Do you have a rpm-file? Thanks. 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER : > Hi, > > I have tested qemu with last tcmalloc 2.4, and the improvement is huge > with iothread: 50k iops (+45%) ! > > > > qemu : no iothread : glibc : iops=33395 > qemu : no-iothread :

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-09 Thread Alexandre DERUMIER
>>Very good work! >>Do you have a rpm-file? >>Thanks. no sorry, I'm have compiled it manually (and I'm using debian jessie as client) - Mail original - De: "Irek Fasikhov" À: "aderumier" Cc: "Robert LeBlanc" , "ceph-devel" , "pushpesh sharma" , "ceph-users" Envoyé: Mercredi 10 Ju

[ceph-users] osd_scrub_sleep, osd_scrub_chunk_{min,max}

2015-06-09 Thread Paweł Sadowski
Hello Everyone, There are some options[1] that greatly reduces deep-scrub performance impact but they are not documented anywhere. Is there any reason for this? 1: - osd_scrub_sleep - osd_scrub_chunk_min - osd_scrub_chunk_max -- PS ___ ceph-user

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-09 Thread Jan Schermer
Hidden danger in the default CRUSH rules is that if you lose 3 drives in 3 different hosts at the same time, you _will_ lose data, and not just some data but possibly a piece of every rbd volume you have... And the probability of that happening is sadly nowhere near zero. We had drives drop out