[ceph-users] Fwd: Ceph Filesystem - Production?
On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum wrote: > On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman > wrote: > > Is Ceph Filesystem ready for production servers? > > > > The documentation says it's not, but I don't see that mentioned anywhere > > else. > > http://ceph.com/docs/master/cephfs/ > > Everybody has their own standards, but Red Hat isn't supporting it for > general production use at this time. If you're brave you could test it > under your workload for a while and see how it comes out; the known > issues are very much workload-dependent (or just general concerns over > polish). > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > I've been testing it with our webstats since it gets live hits but isn't customer affecting. Seems the MDS server has problems every few days requiring me to umount and remount the ceph disk to resolve. Not sure if the issue is resolved in development versions but as of 0.80.5 we seem to be hitting it. I set the log verbosity to 20 so there's tons of logs but ends with 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14 laggy, deferring client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1) 2014-08-24 07:10:19.682021 7f2b575e7700 5 mds.0.14 is_laggy 19.324963 > 15 since last acked beacon 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send up:active seq 127220 (currently up:active) 2014-08-24 07:10:21.515899 7f2b575e7700 5 mds.0.14 is_laggy 21.158841 > 15 since last acked beacon 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14 laggy, deferring client_session(request_renewcaps seq 26766) 2014-08-24 07:10:21.515915 7f2b575e7700 5 mds.0.14 is_laggy 21.158857 > 15 since last acked beacon 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map need_to_purge={} 2014-08-24 07:10:21.981176 7f2b575e7700 5 mds.0.14 is_laggy 21.624117 > 15 since last acked beacon 2014-08-24 07:10:23.170528 7f2b575e7700 5 mds.0.14 handle_mds_map epoch 93 from mon.0 2014-08-24 07:10:23.175367 7f2b532d5700 0 -- 10.251.188.124:6800/985 >> 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 cs=1 l=0 c=0x2cbfb20).fault with nothing to send, going to standby 2014-08-24 07:10:23.175376 7f2b533d6700 0 -- 10.251.188.124:6800/985 >> 10.251.188.55:0/306923677 pipe(0x5588d00 sd=22 :6800 s=2 pgs=7 cs=1 l=0 c=0x2cbf700).fault with nothing to send, going to standby 2014-08-24 07:10:23.175380 7f2b531d4700 0 -- 10.251.188.124:6800/985 >> 10.251.188.31:0/2854230502 pipe(0x5589480 sd=24 :6800 s=2 pgs=881 cs=1 l=0 c=0x2cbfde0).fault with nothing to send, going to standby 2014-08-24 07:10:23.175438 7f2b534d7700 0 -- 10.251.188.124:6800/985 >> 10.251.188.68:0/2928927296 pipe(0x5588800 sd=21 :6800 s=2 pgs=7 cs=1 l=0 c=0x2cbf5a0).fault with nothing to send, going to standby 2014-08-24 07:10:23.184201 7f2b575e7700 10 mds.0.14 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data} 2014-08-24 07:10:23.184255 7f2b575e7700 10 mds.0.14 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap} 2014-08-24 07:10:23.184264 7f2b575e7700 10 mds.-1.-1 map says i am 10.251.188.124:6800/985 mds.-1.-1 state down:dne 2014-08-24 07:10:23.184275 7f2b575e7700 10 mds.-1.-1 peer mds gid 94665 removed from map 2014-08-24 07:10:23.184282 7f2b575e7700 1 mds.-1.-1 handle_mds_map i ( 10.251.188.124:6800/985) dne in the mdsmap, respawning myself 2014-08-24 07:10:23.184284 7f2b575e7700 1 mds.-1.-1 respawn 2014-08-24 07:10:23.184286 7f2b575e7700 1 mds.-1.-1 e: '/usr/bin/ceph-mds' 2014-08-24 07:10:23.184288 7f2b575e7700 1 mds.-1.-1 0: '/usr/bin/ceph-mds' 2014-08-24 07:10:23.184289 7f2b575e7700 1 mds.-1.-1 1: '-i' 2014-08-24 07:10:23.184290 7f2b575e7700 1 mds.-1.-1 2: 'ceph-cluster1-mds2' 2014-08-24 07:10:23.184291 7f2b575e7700 1 mds.-1.-1 3: '--pid-file' 2014-08-24 07:10:23.184292 7f2b575e7700 1 mds.-1.-1 4: '/var/run/ceph/mds.ceph-cluster1-mds2.pid' 2014-08-24 07:10:23.184293 7f2b575e7700 1 mds.-1.-1 5: '-c' 2014-08-24 07:10:23.184294 7f2b575e7700 1 mds.-1.-1 6: '/etc/ceph/ceph.conf' 2014-08-24 07:10:23.184295 7f2b575e7700 1 mds.-1.-1 7: '--cluster' 2014-08-24 07:10:23.184296 7f2b575e7700 1 mds.-1.-1 8: 'ceph' 2014-08-24 07:10:23.274640 7f2b575e7700 1 mds.-1.-1 exe_path /usr/bin/ceph-mds 2014-08-24 07:10:23.606875 7f4c55abb800 0 ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 987 2014-08-24 07:10:49.024862 7f4c506ad700 1 mds.-1.0 handle_mds_map standby 2014-08-24 07:10:49.199676 7f4c506ad700 0 mds.-1.0 handle_mds
Re: [ceph-users] Fwd: Ceph Filesystem - Production?
I am running active/standby and it didn't swap over to the standby. If I shutdown the active server it swaps to the standby fine though. When there were issues, disk access would back up on the webstats servers and a cat of /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas normally it would only list one or two if any. I have 4 cores and 2GB of ram on the mds machines. Watching it right now it is using most of the ram and some of swap although most of the active ram is disk cache. I lowered the memory.swappiness value to see if that helps. I'm also logging top output if it happens again. On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng wrote: > On Fri, Aug 29, 2014 at 8:36 AM, James Devine wrote: > > > > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum > wrote: > >> > >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman > >> wrote: > >> > Is Ceph Filesystem ready for production servers? > >> > > >> > The documentation says it's not, but I don't see that mentioned > anywhere > >> > else. > >> > http://ceph.com/docs/master/cephfs/ > >> > >> Everybody has their own standards, but Red Hat isn't supporting it for > >> general production use at this time. If you're brave you could test it > >> under your workload for a while and see how it comes out; the known > >> issues are very much workload-dependent (or just general concerns over > >> polish). > >> -Greg > >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > I've been testing it with our webstats since it gets live hits but isn't > > customer affecting. Seems the MDS server has problems every few days > > requiring me to umount and remount the ceph disk to resolve. Not sure if > > the issue is resolved in development versions but as of 0.80.5 we seem > to be > > hitting it. I set the log verbosity to 20 so there's tons of logs but > ends > > with > > The cephfs client is supposed to be able to handle MDS takeover. > what's symptom makes you umount and remount the cephfs ? > > > > > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14 laggy, deferring > > client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1) > > 2014-08-24 07:10:19.682021 7f2b575e7700 5 mds.0.14 is_laggy 19.324963 > > 15 > > since last acked beacon > > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send up:active > > seq 127220 (currently up:active) > > 2014-08-24 07:10:21.515899 7f2b575e7700 5 mds.0.14 is_laggy 21.158841 > > 15 > > since last acked beacon > > 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14 laggy, deferring > > client_session(request_renewcaps seq 26766) > > 2014-08-24 07:10:21.515915 7f2b575e7700 5 mds.0.14 is_laggy 21.158857 > > 15 > > since last acked beacon > > 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map > > need_to_purge={} > > 2014-08-24 07:10:21.981176 7f2b575e7700 5 mds.0.14 is_laggy 21.624117 > > 15 > > since last acked beacon > > 2014-08-24 07:10:23.170528 7f2b575e7700 5 mds.0.14 handle_mds_map epoch > 93 > > from mon.0 > > 2014-08-24 07:10:23.175367 7f2b532d5700 0 -- 10.251.188.124:6800/985 >> > > 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 cs=1 > l=0 > > c=0x2cbfb20).fault with nothing to send, going to standby > > 2014-08-24 07:10:23.175376 7f2b533d6700 0 -- 10.251.188.124:6800/985 >> > > 10.251.188.55:0/306923677 pipe(0x5588d00 sd=22 :6800 s=2 pgs=7 cs=1 l=0 > > c=0x2cbf700).fault with nothing to send, going to standby > > 2014-08-24 07:10:23.175380 7f2b531d4700 0 -- 10.251.188.124:6800/985 >> > > 10.251.188.31:0/2854230502 pipe(0x5589480 sd=24 :6800 s=2 pgs=881 cs=1 > l=0 > > c=0x2cbfde0).fault with nothing to send, going to standby > > 2014-08-24 07:10:23.175438 7f2b534d7700 0 -- 10.251.188.124:6800/985 >> > > 10.251.188.68:0/2928927296 pipe(0x5588800 sd=21 :6800 s=2 pgs=7 cs=1 l=0 > > c=0x2cbf5a0).fault with nothing to send, going to standby > > 2014-08-24 07:10:23.184201 7f2b575e7700 10 mds.0.14 my compat > > compat={},rocompat={},incompat={1=base v0.20,2=client writeable > > ranges,3=default file layouts on dirs,4=dir inode in separate > object,5=mds > > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline > data} > >
Re: [ceph-users] Fwd: Ceph Filesystem - Production?
It took a week to happen again, I had hopes that it was fixed but alas it is not. Looking at top logs on the active mds server, the load average was 0.00 the whole time and memory usage never changed much, it is using close to 100% and some swap but since I changed memory.swappiness swap usage hasn't gone up but has been slowly coming back down. Same symptoms, the mount on the client is unresponsive and a cat on /sys/kernel/debug/ceph/*/mdsc had a whole list of entries. A umount and remount seems to fix it. On Fri, Aug 29, 2014 at 11:26 AM, James Devine wrote: > I am running active/standby and it didn't swap over to the standby. If I > shutdown the active server it swaps to the standby fine though. When there > were issues, disk access would back up on the webstats servers and a cat of > /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas > normally it would only list one or two if any. I have 4 cores and 2GB of > ram on the mds machines. Watching it right now it is using most of the ram > and some of swap although most of the active ram is disk cache. I lowered > the memory.swappiness value to see if that helps. I'm also logging top > output if it happens again. > > > On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng wrote: > >> On Fri, Aug 29, 2014 at 8:36 AM, James Devine wrote: >> > >> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum >> wrote: >> >> >> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman >> >> wrote: >> >> > Is Ceph Filesystem ready for production servers? >> >> > >> >> > The documentation says it's not, but I don't see that mentioned >> anywhere >> >> > else. >> >> > http://ceph.com/docs/master/cephfs/ >> >> >> >> Everybody has their own standards, but Red Hat isn't supporting it for >> >> general production use at this time. If you're brave you could test it >> >> under your workload for a while and see how it comes out; the known >> >> issues are very much workload-dependent (or just general concerns over >> >> polish). >> >> -Greg >> >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> ___ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > >> > >> > I've been testing it with our webstats since it gets live hits but isn't >> > customer affecting. Seems the MDS server has problems every few days >> > requiring me to umount and remount the ceph disk to resolve. Not sure >> if >> > the issue is resolved in development versions but as of 0.80.5 we seem >> to be >> > hitting it. I set the log verbosity to 20 so there's tons of logs but >> ends >> > with >> >> The cephfs client is supposed to be able to handle MDS takeover. >> what's symptom makes you umount and remount the cephfs ? >> >> > >> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14 laggy, deferring >> > client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1) >> > 2014-08-24 07:10:19.682021 7f2b575e7700 5 mds.0.14 is_laggy 19.324963 >> > 15 >> > since last acked beacon >> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send >> up:active >> > seq 127220 (currently up:active) >> > 2014-08-24 07:10:21.515899 7f2b575e7700 5 mds.0.14 is_laggy 21.158841 >> > 15 >> > since last acked beacon >> > 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14 laggy, deferring >> > client_session(request_renewcaps seq 26766) >> > 2014-08-24 07:10:21.515915 7f2b575e7700 5 mds.0.14 is_laggy 21.158857 >> > 15 >> > since last acked beacon >> > 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map >> > need_to_purge={} >> > 2014-08-24 07:10:21.981176 7f2b575e7700 5 mds.0.14 is_laggy 21.624117 >> > 15 >> > since last acked beacon >> > 2014-08-24 07:10:23.170528 7f2b575e7700 5 mds.0.14 handle_mds_map >> epoch 93 >> > from mon.0 >> > 2014-08-24 07:10:23.175367 7f2b532d5700 0 -- 10.251.188.124:6800/985 >> >> >> > 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 cs=1 >> l=0 >> > c=0x2cbfb20).fault with nothing to send, going to standby >> > 2014-08-24 07:10:23.175376 7f2b533d6700 0 -- 10.251.188.124:6800/985 >> >> >> > 10.251.1
Re: [ceph-users] Fwd: Ceph Filesystem - Production?
I'm using 3.13.0-35-generic on Ubuntu 14.04.1 On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng wrote: > On Fri, Sep 5, 2014 at 3:24 AM, James Devine wrote: > > It took a week to happen again, I had hopes that it was fixed but alas > it is > > not. Looking at top logs on the active mds server, the load average was > > 0.00 the whole time and memory usage never changed much, it is using > close > > to 100% and some swap but since I changed memory.swappiness swap usage > > hasn't gone up but has been slowly coming back down. Same symptoms, the > > mount on the client is unresponsive and a cat on > > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries. A umount and > > remount seems to fix it. > > > > which version of kernel do you use ? > > Yan, Zheng > > > > > On Fri, Aug 29, 2014 at 11:26 AM, James Devine > wrote: > >> > >> I am running active/standby and it didn't swap over to the standby. If > I > >> shutdown the active server it swaps to the standby fine though. When > there > >> were issues, disk access would back up on the webstats servers and a > cat of > >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas > normally > >> it would only list one or two if any. I have 4 cores and 2GB of ram on > the > >> mds machines. Watching it right now it is using most of the ram and > some of > >> swap although most of the active ram is disk cache. I lowered the > >> memory.swappiness value to see if that helps. I'm also logging top > output > >> if it happens again. > >> > >> > >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng wrote: > >>> > >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine > wrote: > >>> > > >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum > >>> > wrote: > >>> >> > >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman > >>> >> wrote: > >>> >> > Is Ceph Filesystem ready for production servers? > >>> >> > > >>> >> > The documentation says it's not, but I don't see that mentioned > >>> >> > anywhere > >>> >> > else. > >>> >> > http://ceph.com/docs/master/cephfs/ > >>> >> > >>> >> Everybody has their own standards, but Red Hat isn't supporting it > for > >>> >> general production use at this time. If you're brave you could test > it > >>> >> under your workload for a while and see how it comes out; the known > >>> >> issues are very much workload-dependent (or just general concerns > over > >>> >> polish). > >>> >> -Greg > >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >>> >> ___ > >>> >> ceph-users mailing list > >>> >> ceph-users@lists.ceph.com > >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > > >>> > > >>> > > >>> > I've been testing it with our webstats since it gets live hits but > >>> > isn't > >>> > customer affecting. Seems the MDS server has problems every few days > >>> > requiring me to umount and remount the ceph disk to resolve. Not > sure > >>> > if > >>> > the issue is resolved in development versions but as of 0.80.5 we > seem > >>> > to be > >>> > hitting it. I set the log verbosity to 20 so there's tons of logs > but > >>> > ends > >>> > with > >>> > >>> The cephfs client is supposed to be able to handle MDS takeover. > >>> what's symptom makes you umount and remount the cephfs ? > >>> > >>> > > >>> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14 laggy, deferring > >>> > client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1) > >>> > 2014-08-24 07:10:19.682021 7f2b575e7700 5 mds.0.14 is_laggy > 19.324963 > >>> > > 15 > >>> > since last acked beacon > >>> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send > >>> > up:active > >>> > seq 127220 (currently up:active) > >>> > 2014-08-24 07:10:21.515899 7f2b575e7700 5 mds.0.14 is_laggy > 21.158841
Re: [ceph-users] Fwd: Ceph Filesystem - Production?
No messages in dmesg, I've updated the two clients to 3.16, we'll see if that fixes this issue. On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng wrote: > On Fri, Sep 5, 2014 at 8:42 AM, James Devine wrote: > > I'm using 3.13.0-35-generic on Ubuntu 14.04.1 > > > > Was there any kernel message when the hang happened? We have fixed a > few bugs since 3.13 kernel, please use 3.16 kernel if possible. > > Yan, Zheng > > > > > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng wrote: > >> > >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine > wrote: > >> > It took a week to happen again, I had hopes that it was fixed but alas > >> > it is > >> > not. Looking at top logs on the active mds server, the load average > was > >> > 0.00 the whole time and memory usage never changed much, it is using > >> > close > >> > to 100% and some swap but since I changed memory.swappiness swap usage > >> > hasn't gone up but has been slowly coming back down. Same symptoms, > the > >> > mount on the client is unresponsive and a cat on > >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries. A umount > and > >> > remount seems to fix it. > >> > > >> > >> which version of kernel do you use ? > >> > >> Yan, Zheng > >> > >> > > >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine > >> > wrote: > >> >> > >> >> I am running active/standby and it didn't swap over to the standby. > If > >> >> I > >> >> shutdown the active server it swaps to the standby fine though. When > >> >> there > >> >> were issues, disk access would back up on the webstats servers and a > >> >> cat of > >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas > >> >> normally > >> >> it would only list one or two if any. I have 4 cores and 2GB of ram > on > >> >> the > >> >> mds machines. Watching it right now it is using most of the ram and > >> >> some of > >> >> swap although most of the active ram is disk cache. I lowered the > >> >> memory.swappiness value to see if that helps. I'm also logging top > >> >> output > >> >> if it happens again. > >> >> > >> >> > >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng > wrote: > >> >>> > >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine > >> >>> wrote: > >> >>> > > >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum > > >> >>> > wrote: > >> >>> >> > >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman > >> >>> >> wrote: > >> >>> >> > Is Ceph Filesystem ready for production servers? > >> >>> >> > > >> >>> >> > The documentation says it's not, but I don't see that mentioned > >> >>> >> > anywhere > >> >>> >> > else. > >> >>> >> > http://ceph.com/docs/master/cephfs/ > >> >>> >> > >> >>> >> Everybody has their own standards, but Red Hat isn't supporting > it > >> >>> >> for > >> >>> >> general production use at this time. If you're brave you could > test > >> >>> >> it > >> >>> >> under your workload for a while and see how it comes out; the > known > >> >>> >> issues are very much workload-dependent (or just general concerns > >> >>> >> over > >> >>> >> polish). > >> >>> >> -Greg > >> >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >> >>> >> ___ > >> >>> >> ceph-users mailing list > >> >>> >> ceph-users@lists.ceph.com > >> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> >>> > > >> >>> > > >> >>> > > >> >>> > I've been testing it with our webstats since it gets live hits but > >> >>> > isn't > >> >>> > customer affect
Re: [ceph-users] Ceph Filesystem - Production?
The issue isn't so much mounting the ceph client as it is the mounted ceph client becoming unusable requiring a remount. So far so good though. On Fri, Sep 5, 2014 at 5:53 PM, JIten Shah wrote: > We ran into the same issue where we could not mount the filesystem on the > clients because it had 3.9. Once we upgraded the kernel on the client node, > we were able to mount it fine. FWIW, you need kernel 3.14 and above. > > --jiten > > On Sep 5, 2014, at 6:55 AM, James Devine wrote: > > No messages in dmesg, I've updated the two clients to 3.16, we'll see if > that fixes this issue. > > > On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng wrote: > >> On Fri, Sep 5, 2014 at 8:42 AM, James Devine wrote: >> > I'm using 3.13.0-35-generic on Ubuntu 14.04.1 >> > >> >> Was there any kernel message when the hang happened? We have fixed a >> few bugs since 3.13 kernel, please use 3.16 kernel if possible. >> >> Yan, Zheng >> >> > >> > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng wrote: >> >> >> >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine >> wrote: >> >> > It took a week to happen again, I had hopes that it was fixed but >> alas >> >> > it is >> >> > not. Looking at top logs on the active mds server, the load average >> was >> >> > 0.00 the whole time and memory usage never changed much, it is using >> >> > close >> >> > to 100% and some swap but since I changed memory.swappiness swap >> usage >> >> > hasn't gone up but has been slowly coming back down. Same symptoms, >> the >> >> > mount on the client is unresponsive and a cat on >> >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries. A umount >> and >> >> > remount seems to fix it. >> >> > >> >> >> >> which version of kernel do you use ? >> >> >> >> Yan, Zheng >> >> >> >> > >> >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine >> >> > wrote: >> >> >> >> >> >> I am running active/standby and it didn't swap over to the >> standby. If >> >> >> I >> >> >> shutdown the active server it swaps to the standby fine though. >> When >> >> >> there >> >> >> were issues, disk access would back up on the webstats servers and a >> >> >> cat of >> >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas >> >> >> normally >> >> >> it would only list one or two if any. I have 4 cores and 2GB of >> ram on >> >> >> the >> >> >> mds machines. Watching it right now it is using most of the ram and >> >> >> some of >> >> >> swap although most of the active ram is disk cache. I lowered the >> >> >> memory.swappiness value to see if that helps. I'm also logging top >> >> >> output >> >> >> if it happens again. >> >> >> >> >> >> >> >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng >> wrote: >> >> >>> >> >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine >> >> >>> wrote: >> >> >>> > >> >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum < >> g...@inktank.com> >> >> >>> > wrote: >> >> >>> >> >> >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman >> >> >>> >> wrote: >> >> >>> >> > Is Ceph Filesystem ready for production servers? >> >> >>> >> > >> >> >>> >> > The documentation says it's not, but I don't see that >> mentioned >> >> >>> >> > anywhere >> >> >>> >> > else. >> >> >>> >> > http://ceph.com/docs/master/cephfs/ >> >> >>> >> >> >> >>> >> Everybody has their own standards, but Red Hat isn't supporting >> it >> >> >>> >> for >> >> >>> >> general production use at this time. If you're brave you could >> test >> >> >>> >> it >> >> >>> >> under your workload for a while and see how
[ceph-users] ceph-dokan mount error
So I am trying to get ceph-dokan to work. Upon running it with ./ceph-dokan.exe -c ceph.conf -l e it indicates there was a mount error and the monitor it connects to logs cephx server client.admin: unexpected key: req.key=0 expected_key=d7901d515f6b0c61 According to the debug output attached ceph-dokan reads the config file and keyring fine. Any ideas where I might be going wrong? ceph-dokan.log Description: Binary data ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-dokan mount error
Yup I think you are correct, I see this listed under issues https://github.com/ketor/ceph-dokan/issues/5 On Thu, Apr 30, 2015 at 12:58 PM, Gregory Farnum wrote: > On Thu, Apr 30, 2015 at 9:49 AM, James Devine wrote: >> So I am trying to get ceph-dokan to work. Upon running it with >> ./ceph-dokan.exe -c ceph.conf -l e it indicates there was a mount >> error and the monitor it connects to logs cephx server client.admin: >> unexpected key: req.key=0 expected_key=d7901d515f6b0c61 >> >> According to the debug output attached ceph-dokan reads the config >> file and keyring fine. Any ideas where I might be going wrong? > > I haven't used or looked at dokan at all, but if the received key is 0 > then I'm guessing dokan isn't using the cephx security that the server > is expecting. And there was a note from the guys who did rados_dll > that they had to disable cephx — so perhaps dokan doesn't support > cephx at all and you need to disable it? I'm not sure, just spitting > into the wind here... ;) > -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Filesystem - Production?
The issue hasn't popped up since I upgraded the kernel so the issue I was experiencing seems to have been addressed. On Tue, Sep 9, 2014 at 12:13 PM, James Devine wrote: > The issue isn't so much mounting the ceph client as it is the mounted ceph > client becoming unusable requiring a remount. So far so good though. > > > On Fri, Sep 5, 2014 at 5:53 PM, JIten Shah wrote: > >> We ran into the same issue where we could not mount the filesystem on the >> clients because it had 3.9. Once we upgraded the kernel on the client node, >> we were able to mount it fine. FWIW, you need kernel 3.14 and above. >> >> --jiten >> >> On Sep 5, 2014, at 6:55 AM, James Devine wrote: >> >> No messages in dmesg, I've updated the two clients to 3.16, we'll see if >> that fixes this issue. >> >> >> On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng wrote: >> >>> On Fri, Sep 5, 2014 at 8:42 AM, James Devine wrote: >>> > I'm using 3.13.0-35-generic on Ubuntu 14.04.1 >>> > >>> >>> Was there any kernel message when the hang happened? We have fixed a >>> few bugs since 3.13 kernel, please use 3.16 kernel if possible. >>> >>> Yan, Zheng >>> >>> > >>> > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng wrote: >>> >> >>> >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine >>> wrote: >>> >> > It took a week to happen again, I had hopes that it was fixed but >>> alas >>> >> > it is >>> >> > not. Looking at top logs on the active mds server, the load >>> average was >>> >> > 0.00 the whole time and memory usage never changed much, it is using >>> >> > close >>> >> > to 100% and some swap but since I changed memory.swappiness swap >>> usage >>> >> > hasn't gone up but has been slowly coming back down. Same >>> symptoms, the >>> >> > mount on the client is unresponsive and a cat on >>> >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries. A >>> umount and >>> >> > remount seems to fix it. >>> >> > >>> >> >>> >> which version of kernel do you use ? >>> >> >>> >> Yan, Zheng >>> >> >>> >> > >>> >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine >>> >> > wrote: >>> >> >> >>> >> >> I am running active/standby and it didn't swap over to the >>> standby. If >>> >> >> I >>> >> >> shutdown the active server it swaps to the standby fine though. >>> When >>> >> >> there >>> >> >> were issues, disk access would back up on the webstats servers and >>> a >>> >> >> cat of >>> >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas >>> >> >> normally >>> >> >> it would only list one or two if any. I have 4 cores and 2GB of >>> ram on >>> >> >> the >>> >> >> mds machines. Watching it right now it is using most of the ram >>> and >>> >> >> some of >>> >> >> swap although most of the active ram is disk cache. I lowered the >>> >> >> memory.swappiness value to see if that helps. I'm also logging top >>> >> >> output >>> >> >> if it happens again. >>> >> >> >>> >> >> >>> >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng >>> wrote: >>> >> >>> >>> >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine >> > >>> >> >>> wrote: >>> >> >>> > >>> >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum < >>> g...@inktank.com> >>> >> >>> > wrote: >>> >> >>> >> >>> >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman >>> >> >>> >> wrote: >>> >> >>> >> > Is Ceph Filesystem ready for production servers? >>> >> >>> >> > >>> >> >>> >> > The documentation says it's not, but I don't see that >>> mentioned >>> >> >>> >&g
Re: [ceph-users] Giant or Firefly for production
http://kernel.ubuntu.com/~kernel-ppa/mainline/ I'm running 3.17 on my trusty clients without issue On Fri, Dec 5, 2014 at 9:37 AM, Antonio Messina wrote: > On Fri, Dec 5, 2014 at 4:25 PM, Nick Fisk wrote: > > This is probably due to the Kernel RBD client not being recent enough. > Have > > you tried upgrading your kernel to a newer version? 3.16 should contain > all > > the relevant features required by Giant. > > I would rather tune the tunables, as upgrading the kernel would > require a reboot of the client. > Besides, Ubuntu Trusty does not provide a 3.16 kernel, so I would need > to recompile... > > .a. > > -- > antonio.mess...@s3it.uzh.ch +41 (0)44 635 42 22 > antonio.s.mess...@gmail.com > S3IT: Service and Support for Science IT http://www.s3it.uzh.ch/ > University of Zurich > Winterthurerstrasse 190 > CH-8057 Zurich Switzerland > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com