Re: [ceph-users] Memory leak in radosgw

2016-10-21 Thread Ben Morrice
What version of libcurl are you using? I was hitting this bug with RHEL7/libcurl 7.29 which could also be your catalyst. http://tracker.ceph.com/issues/15915 Kind regards, Ben Morrice __ Ben Morrice | e: ben.morr...@epfl.ch |

Re: [ceph-users] offending shards are crashing osd's

2016-10-21 Thread Ronny Aasen
On 19. okt. 2016 13:00, Ronny Aasen wrote: On 06. okt. 2016 13:41, Ronny Aasen wrote: hello I have a few osd's in my cluster that are regularly crashing. [snip] ofcourse having 3 osd's dying regularly is not good for my health. so i have set noout, to avoid heavy recoveries. googeling thi

[ceph-users] rbd multipath by export iscsi gateway

2016-10-21 Thread tao chang
HI All, I try to configure multipath by export two iscsi gateway on two host with same one rbd volume, The steps is below: one host1: 1) mapped a rbd volume as a local blockdevice rbd map fastpool/vdisk as /dev/rbd0 2) export /dev/rbd0 as iscsi taget: [ro

Re: [ceph-users] rbd multipath by export iscsi gateway

2016-10-21 Thread Iban Cabrillo
HI tao, I would make something like this : https://support.zadarastorage.com/hc/en-us/articles/213024386-How-To-setup-Multiple-iSCSI-sessions-and-MultiPath-on-your-Linux-Cloud-Server regards, I 2016-10-21 11:33 GMT+02:00 tao chang : > HI All, > > I try to configure multipath by export two is

[ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Pavan Rallabhandi
I see the fix for write back cache not getting turned on after flush has made into Jewel 10.2.3 ( http://tracker.ceph.com/issues/17080 ) but our testing says otherwise. The cache is still behaving as if its writethrough, though the setting is set to true. Wanted to check if it’s still broken i

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Jason Dillaman
It's in the build and has tests to verify that it is properly being triggered [1]. $ git tag --contains 5498377205523052476ed81aebb2c2e6973f67ef v10.2.3 What are your tests that say otherwise? [1] https://github.com/ceph/ceph/pull/10797/commits/5498377205523052476ed81aebb2c2e6973f67ef On Fri,

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Pavan Rallabhandi
From my VMs that have cinder provisioned volumes, I tried dd / fio (like below) to find the IOPS to be less, even a sync before the runs didn’t help. Same runs by setting the option to false yield better results. Both the clients and the cluster are running 10.2.3, perhaps the only difference i

[ceph-users] Ceph and TCP States

2016-10-21 Thread Nick Fisk
Hi, I'm just testing out using a Ceph client in a DMZ behind a FW from the main Ceph cluster. One thing I have noticed is that if the state table on the FW is emptied maybe by restarting it or just clearing the state table...etc. Then the Ceph client will hang for a long time as the TCP session

Re: [ceph-users] Ceph and TCP States

2016-10-21 Thread Haomai Wang
On Fri, Oct 21, 2016 at 10:19 PM, Nick Fisk wrote: > Hi, > > I'm just testing out using a Ceph client in a DMZ behind a FW from the > main Ceph cluster. One thing I have noticed is that if the > state table on the FW is emptied maybe by restarting it or just clearing > the state table...etc. Then

Re: [ceph-users] Ceph and TCP States

2016-10-21 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Haomai Wang > Sent: 21 October 2016 15:28 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Ceph and TCP States > > > > On Fri, Oct 21, 2016 at 10:19 PM, Nick Fis

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Pavan Rallabhandi
And to add, the host running Cinder services is having Hammer 0.94.9 but the rest of them like Nova are on Jewel 10.2.3 FWIW, the rbd info for one such image looks like this: rbd image 'volume-f6ec45e2-b644-4b58-b6b5-b3a418c3c5b2': size 2048 MB in 512 objects order 22 (4096 kB ob

Re: [ceph-users] Ceph and TCP States

2016-10-21 Thread Haomai Wang
On Fri, Oct 21, 2016 at 10:31 PM, Nick Fisk wrote: > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > Of Haomai Wang > > Sent: 21 October 2016 15:28 > > To: Nick Fisk > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Ceph and

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Jason Dillaman
I just tested from the v10.2.3 git tag on my local machine and averaged 2912.54 4K writes / second with "rbd_cache_writethrough_until_flush = false" and averaged 3035.09 4K writes / second with "rbd_cache_writethrough_until_flush = true" (queue depth of 1 in both cases). I used new images between e

Re: [ceph-users] Ceph and TCP States

2016-10-21 Thread Nick Fisk
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Haomai Wang > Sent: 21 October 2016 15:40 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Ceph and TCP States > > > > On Fri, Oct 21, 2016 at 10:31 PM, Nick Fis

Re: [ceph-users] Crash in ceph_read_iter->__free_pages due to null page

2016-10-21 Thread Markus Blank-Burian
Hi, is there any update regarding this bug? I can easily reproduce this issue on our cluster with the following scenario: - Start a few hundred processes on different nodes, each process writing slowly some text into its own output file - Call: watch -n1 'grep mycustomerrorstring *.out' - Hit CTR

Re: [ceph-users] Ceph and TCP States

2016-10-21 Thread Haomai Wang
On Fri, Oct 21, 2016 at 10:56 PM, Nick Fisk wrote: > > -Original Message- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > Of Haomai Wang > > Sent: 21 October 2016 15:40 > > To: Nick Fisk > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] Ceph and

Re: [ceph-users] Memory leak in radosgw

2016-10-21 Thread Trey Palmer
Hi Ben, I previously hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1327142 So I updated from libcurl 7.29.0-25 to the new update package libcurl 7.29.0-32 on RHEL 7, which fixed the deadlock problem. I had not seen the issue you linked. It doesn't seem directly related, since my p

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-21 Thread Reed Dier
> On Oct 19, 2016, at 7:54 PM, Christian Balzer wrote: > > > Hello, > > On Wed, 19 Oct 2016 12:28:28 + Jim Kilborn wrote: > >> I have setup a new linux cluster to allow migration from our old SAN based >> cluster to a new cluster with ceph. >> All systems running centos 7.2 with the 3.10

Re: [ceph-users] Crash in ceph_read_iter->__free_pages due to null page

2016-10-21 Thread Nikolay Borisov
On Friday, October 21, 2016, Markus Blank-Burian wrote: > Hi, > > is there any update regarding this bug? I did send a patch and i believe it should find its way in upstream releaes rather soon. > > I can easily reproduce this issue on our cluster with the following > scenario: > - Start a f

Re: [ceph-users] Crash in ceph_read_iter->__free_pages due to null page

2016-10-21 Thread Ilya Dryomov
On Fri, Oct 21, 2016 at 5:01 PM, Markus Blank-Burian wrote: > Hi, > > is there any update regarding this bug? Nikolay's patch made mainline yesterday and should show up in various stable kernels in the forthcoming weeks. > > I can easily reproduce this issue on our cluster with the following > s

Re: [ceph-users] Crash in ceph_read_iter->__free_pages due to null page

2016-10-21 Thread Markus Blank-Burian
Thanks for the fix and the quick reply! From: Nikolay Borisov [mailto:ker...@kyup.com] Sent: Freitag, 21. Oktober 2016 17:09 To: Markus Blank-Burian Cc: Nikolay Borisov ; Ilya Dryomov ; Yan, Zheng ; ceph-users Subject: Re: Crash in ceph_read_iter->__free_pages due to null page On Friday, Oct

[ceph-users] effect of changing ceph osd primary affinity

2016-10-21 Thread Ridwan Rashid Noel
Hi, While reading about Ceph osd primary affinity in the documentation of Ceph I found that it is mentioned "When the weight is < 1, it is less likely that CRUSH will select the Ceph OSD Daemon to act as a primary". My question is if the primary affinity of an OSD is set to be <1 will there be any

[ceph-users] Ceph rbd jewel

2016-10-21 Thread fridifree
Hi everyone, I'm using ceph jewel running on Ubuntu 16.04 (kernel 4.4) and Ubuntu 14.04 clients (kernel 3.13) When trying to map rbd to the clients and to servers I get error about feature set mismatch which I didnt get on hammer. Tried to upgrade my clients to kernel 4.8 and 4.9rc1 I got an error

Re: [ceph-users] Ceph rbd jewel

2016-10-21 Thread Ilya Dryomov
On Fri, Oct 21, 2016 at 5:50 PM, fridifree wrote: > Hi everyone, > I'm using ceph jewel running on Ubuntu 16.04 (kernel 4.4) and Ubuntu 14.04 > clients (kernel 3.13) > When trying to map rbd to the clients and to servers I get error about > feature set mismatch which I didnt get on hammer. > Tried

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Pavan Rallabhandi
Thanks for verifying at your end Jason. It’s pretty weird that the difference is >~10X, with "rbd_cache_writethrough_until_flush = true" I see ~400 IOPS vs with "rbd_cache_writethrough_until_flush = false" I see them to be ~6000 IOPS. The QEMU cache is none for all of the rbd drives. On that n

Re: [ceph-users] New cephfs cluster performance issues- Jewel - cache pressure, capability release, poor iostat await avg queue size

2016-10-21 Thread Jim Kilborn
Reed/Christian, So if I put the OSD journals on an SSD that has power loss protection (Samsung SM863) , all the write then go through those journals. Can I then leave write caching turn on for the spinner OSDs, even without BBU caching controller? In the event of a power outage past our ups tim

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Jason Dillaman
On Fri, Oct 21, 2016 at 1:15 PM, Pavan Rallabhandi wrote: > The QEMU cache is none for all of the rbd drives Hmm -- if you have QEMU cache disabled, I would expect it to disable the librbd cache. I have to ask, but did you (re)start/live-migrate these VMs you are testing against after you upgrad

Re: [ceph-users] ceph on two data centers far away

2016-10-21 Thread Wes Dillingham
What is the use case that requires you to have it in two datacenters? In addition to RBD mirroring already mentioned by others, you can do RBD snapshots and ship those snapshots to a remote location (separate cluster or separate pool). Similar to RBD mirroring, in this situation your client writes

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Pavan Rallabhandi
The VM am testing against is created after the librbd upgrade. Always had this confusion around this bit in the docs here http://docs.ceph.com/docs/jewel/rbd/qemu-rbd/#qemu-cache-options that: “QEMU’s cache settings override Ceph’s default settings (i.e., settings that are not explicitly set i

Re: [ceph-users] rbd cache writethrough until flush

2016-10-21 Thread Jason Dillaman
Thanks for pointing that out, since it is incorrect for (semi-)modern QEMUs. All configuration starts and the Ceph defaults, are overwritten by your ceph.conf, and then are further overwritten by any QEMU-specific override. I would recommend retesting with "cache=writeback" to see if that helps.

[ceph-users] reliable monitor restarts

2016-10-21 Thread Steffen Weißgerber
Hello, we're running a 6 node ceph cluster with 3 mons on Ubuntu (14.04.4). Sometimes it happen's that the mon services die and have to restarted manually. To have reliable service restarts I normally use D.J. Bernsteins deamontools on other Linux distributions. Until now I never did this on Ubu

Re: [ceph-users] Ceph and TCP States

2016-10-21 Thread Gregory Farnum
On Fri, Oct 21, 2016 at 7:56 AM, Nick Fisk wrote: >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Haomai Wang >> Sent: 21 October 2016 15:40 >> To: Nick Fisk >> Cc: ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] Ceph and TCP St

Re: [ceph-users] effect of changing ceph osd primary affinity

2016-10-21 Thread Gregory Farnum
On Fri, Oct 21, 2016 at 8:38 AM, Ridwan Rashid Noel wrote: > Hi, > > While reading about Ceph osd primary affinity in the documentation of Ceph I > found that it is mentioned "When the weight is < 1, it is less likely that > CRUSH will select the Ceph OSD Daemon to act as a primary". My question i

Re: [ceph-users] effect of changing ceph osd primary affinity

2016-10-21 Thread Ridwan Rashid Noel
Thank you for your reply Greg. Is there any detailed resource that describe about how the primary affinity changing works? All I got from searching was one paragraph from the documentation. Regards, Ridwan Noel On Oct 21, 2016 3:15 PM, "Gregory Farnum" wrote: > On Fri, Oct 21, 2016 at 8:38 AM,

[ceph-users] Three tier cache

2016-10-21 Thread Robert Sanders
Hello, Is it possible to create a three level cache tier? Searching documentation and archives suggests that I’m not the first one to ask about it, but I can’t tell if it is supported yet. Thanks, Rob ___ ceph-users mailing list ceph-users@lists.ce

[ceph-users] tgt with ceph

2016-10-21 Thread Lu Dillon
Hi all, I'm using tgt for iSCSI service. Are there any parameters of tgt to specific the user and keyring to access the RBD? Right now, I'm using admin user to do this. Thanks for advise. Thanks, Dillon ___ ceph-users mailing list ceph-users@lists.

Re: [ceph-users] Ceph rbd jewel

2016-10-21 Thread fridifree
Hi, What is the ceph tunables? how it affects the cluster? I upgrade my kernel I do not understand why I have to disable features? On Oct 21, 2016 19:39, "Ilya Dryomov" wrote: > On Fri, Oct 21, 2016 at 5:50 PM, fridifree wrote: > > Hi everyone, > > I'm using ceph jewel running on Ubuntu 16.04

Re: [ceph-users] reliable monitor restarts

2016-10-21 Thread Wido den Hollander
> Op 21 oktober 2016 om 21:31 schreef Steffen Weißgerber : > > > Hello, > > we're running a 6 node ceph cluster with 3 mons on Ubuntu (14.04.4). > > Sometimes it happen's that the mon services die and have to restarted > manually. > That they die is not the thing which should happen! MONs ar