[ceph-users] Fwd: Ceph Filesystem - Production?

2014-08-28 Thread James Devine
On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum  wrote:

> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
>  wrote:
> > Is Ceph Filesystem ready for production servers?
> >
> > The documentation says it's not, but I don't see that mentioned anywhere
> > else.
> > http://ceph.com/docs/master/cephfs/
>
> Everybody has their own standards, but Red Hat isn't supporting it for
> general production use at this time. If you're brave you could test it
> under your workload for a while and see how it comes out; the known
> issues are very much workload-dependent (or just general concerns over
> polish).
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


I've been testing it with our webstats since it gets live hits but isn't
customer affecting.  Seems the MDS server has problems every few days
requiring me to umount and remount the ceph disk to resolve.  Not sure if
the issue is resolved in development versions but as of 0.80.5 we seem to
be hitting it.  I set the log verbosity to 20 so there's tons of logs but
ends with

2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14  laggy, deferring
client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1)
2014-08-24 07:10:19.682021 7f2b575e7700  5 mds.0.14 is_laggy 19.324963 > 15
since last acked beacon
2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send up:active
seq 127220 (currently up:active)
2014-08-24 07:10:21.515899 7f2b575e7700  5 mds.0.14 is_laggy 21.158841 > 15
since last acked beacon
2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14  laggy, deferring
client_session(request_renewcaps seq 26766)
2014-08-24 07:10:21.515915 7f2b575e7700  5 mds.0.14 is_laggy 21.158857 > 15
since last acked beacon
2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map
need_to_purge={}
2014-08-24 07:10:21.981176 7f2b575e7700  5 mds.0.14 is_laggy 21.624117 > 15
since last acked beacon
2014-08-24 07:10:23.170528 7f2b575e7700  5 mds.0.14 handle_mds_map epoch 93
from mon.0
2014-08-24 07:10:23.175367 7f2b532d5700  0 -- 10.251.188.124:6800/985 >>
10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 cs=1 l=0
c=0x2cbfb20).fault with nothing to send, going to standby
2014-08-24 07:10:23.175376 7f2b533d6700  0 -- 10.251.188.124:6800/985 >>
10.251.188.55:0/306923677 pipe(0x5588d00 sd=22 :6800 s=2 pgs=7 cs=1 l=0
c=0x2cbf700).fault with nothing to send, going to standby
2014-08-24 07:10:23.175380 7f2b531d4700  0 -- 10.251.188.124:6800/985 >>
10.251.188.31:0/2854230502 pipe(0x5589480 sd=24 :6800 s=2 pgs=881 cs=1 l=0
c=0x2cbfde0).fault with nothing to send, going to standby
2014-08-24 07:10:23.175438 7f2b534d7700  0 -- 10.251.188.124:6800/985 >>
10.251.188.68:0/2928927296 pipe(0x5588800 sd=21 :6800 s=2 pgs=7 cs=1 l=0
c=0x2cbf5a0).fault with nothing to send, going to standby
2014-08-24 07:10:23.184201 7f2b575e7700 10 mds.0.14  my compat
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data}
2014-08-24 07:10:23.184255 7f2b575e7700 10 mds.0.14  mdsmap compat
compat={},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds
uses versioned encoding,6=dirfrag is stored in omap}
2014-08-24 07:10:23.184264 7f2b575e7700 10 mds.-1.-1 map says i am
10.251.188.124:6800/985 mds.-1.-1 state down:dne
2014-08-24 07:10:23.184275 7f2b575e7700 10 mds.-1.-1  peer mds gid 94665
removed from map
2014-08-24 07:10:23.184282 7f2b575e7700  1 mds.-1.-1 handle_mds_map i (
10.251.188.124:6800/985) dne in the mdsmap, respawning myself
2014-08-24 07:10:23.184284 7f2b575e7700  1 mds.-1.-1 respawn
2014-08-24 07:10:23.184286 7f2b575e7700  1 mds.-1.-1  e: '/usr/bin/ceph-mds'
2014-08-24 07:10:23.184288 7f2b575e7700  1 mds.-1.-1  0: '/usr/bin/ceph-mds'
2014-08-24 07:10:23.184289 7f2b575e7700  1 mds.-1.-1  1: '-i'
2014-08-24 07:10:23.184290 7f2b575e7700  1 mds.-1.-1  2:
'ceph-cluster1-mds2'
2014-08-24 07:10:23.184291 7f2b575e7700  1 mds.-1.-1  3: '--pid-file'
2014-08-24 07:10:23.184292 7f2b575e7700  1 mds.-1.-1  4:
'/var/run/ceph/mds.ceph-cluster1-mds2.pid'
2014-08-24 07:10:23.184293 7f2b575e7700  1 mds.-1.-1  5: '-c'
2014-08-24 07:10:23.184294 7f2b575e7700  1 mds.-1.-1  6:
'/etc/ceph/ceph.conf'
2014-08-24 07:10:23.184295 7f2b575e7700  1 mds.-1.-1  7: '--cluster'
2014-08-24 07:10:23.184296 7f2b575e7700  1 mds.-1.-1  8: 'ceph'
2014-08-24 07:10:23.274640 7f2b575e7700  1 mds.-1.-1  exe_path
/usr/bin/ceph-mds
2014-08-24 07:10:23.606875 7f4c55abb800  0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process ceph-mds, pid 987
2014-08-24 07:10:49.024862 7f4c506ad700  1 mds.-1.0 handle_mds_map standby
2014-08-24 07:10:49.199676 7f4c506ad700  0 mds.-1.0 handle_mds

Re: [ceph-users] Fwd: Ceph Filesystem - Production?

2014-08-29 Thread James Devine
I am running active/standby and it didn't swap over to the standby.  If I
shutdown the active server it swaps to the standby fine though.  When there
were issues, disk access would back up on the webstats servers and a cat of
/sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas normally
it would only list one or two if any.  I have 4 cores and 2GB of ram on the
mds machines.  Watching it right now it is using most of the ram and some
of swap although most of the active ram is disk cache.  I lowered the
memory.swappiness
value to see if that helps.  I'm also logging top output if it happens
again.


On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng  wrote:

> On Fri, Aug 29, 2014 at 8:36 AM, James Devine  wrote:
> >
> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum 
> wrote:
> >>
> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
> >>  wrote:
> >> > Is Ceph Filesystem ready for production servers?
> >> >
> >> > The documentation says it's not, but I don't see that mentioned
> anywhere
> >> > else.
> >> > http://ceph.com/docs/master/cephfs/
> >>
> >> Everybody has their own standards, but Red Hat isn't supporting it for
> >> general production use at this time. If you're brave you could test it
> >> under your workload for a while and see how it comes out; the known
> >> issues are very much workload-dependent (or just general concerns over
> >> polish).
> >> -Greg
> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > I've been testing it with our webstats since it gets live hits but isn't
> > customer affecting.  Seems the MDS server has problems every few days
> > requiring me to umount and remount the ceph disk to resolve.  Not sure if
> > the issue is resolved in development versions but as of 0.80.5 we seem
> to be
> > hitting it.  I set the log verbosity to 20 so there's tons of logs but
> ends
> > with
>
> The cephfs client is supposed to be able to handle MDS takeover.
> what's symptom makes you umount and remount the cephfs ?
>
> >
> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14  laggy, deferring
> > client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1)
> > 2014-08-24 07:10:19.682021 7f2b575e7700  5 mds.0.14 is_laggy 19.324963 >
> 15
> > since last acked beacon
> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send up:active
> > seq 127220 (currently up:active)
> > 2014-08-24 07:10:21.515899 7f2b575e7700  5 mds.0.14 is_laggy 21.158841 >
> 15
> > since last acked beacon
> > 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14  laggy, deferring
> > client_session(request_renewcaps seq 26766)
> > 2014-08-24 07:10:21.515915 7f2b575e7700  5 mds.0.14 is_laggy 21.158857 >
> 15
> > since last acked beacon
> > 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map
> > need_to_purge={}
> > 2014-08-24 07:10:21.981176 7f2b575e7700  5 mds.0.14 is_laggy 21.624117 >
> 15
> > since last acked beacon
> > 2014-08-24 07:10:23.170528 7f2b575e7700  5 mds.0.14 handle_mds_map epoch
> 93
> > from mon.0
> > 2014-08-24 07:10:23.175367 7f2b532d5700  0 -- 10.251.188.124:6800/985 >>
> > 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 cs=1
> l=0
> > c=0x2cbfb20).fault with nothing to send, going to standby
> > 2014-08-24 07:10:23.175376 7f2b533d6700  0 -- 10.251.188.124:6800/985 >>
> > 10.251.188.55:0/306923677 pipe(0x5588d00 sd=22 :6800 s=2 pgs=7 cs=1 l=0
> > c=0x2cbf700).fault with nothing to send, going to standby
> > 2014-08-24 07:10:23.175380 7f2b531d4700  0 -- 10.251.188.124:6800/985 >>
> > 10.251.188.31:0/2854230502 pipe(0x5589480 sd=24 :6800 s=2 pgs=881 cs=1
> l=0
> > c=0x2cbfde0).fault with nothing to send, going to standby
> > 2014-08-24 07:10:23.175438 7f2b534d7700  0 -- 10.251.188.124:6800/985 >>
> > 10.251.188.68:0/2928927296 pipe(0x5588800 sd=21 :6800 s=2 pgs=7 cs=1 l=0
> > c=0x2cbf5a0).fault with nothing to send, going to standby
> > 2014-08-24 07:10:23.184201 7f2b575e7700 10 mds.0.14  my compat
> > compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> > ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds
> > uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline
> data}
> >

Re: [ceph-users] Fwd: Ceph Filesystem - Production?

2014-09-04 Thread James Devine
It took a week to happen again, I had hopes that it was fixed but alas it
is not.  Looking at top logs on the active mds server, the load average was
0.00 the whole time and memory usage never changed much, it is using close
to 100% and some swap but since I changed memory.swappiness swap usage
hasn't gone up but has been slowly coming back down.  Same symptoms, the
mount on the client is unresponsive and a cat on /sys/kernel/debug/ceph/*/mdsc
had a whole list of entries.  A umount and remount seems to fix it.


On Fri, Aug 29, 2014 at 11:26 AM, James Devine  wrote:

> I am running active/standby and it didn't swap over to the standby.  If I
> shutdown the active server it swaps to the standby fine though.  When there
> were issues, disk access would back up on the webstats servers and a cat of
> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas
> normally it would only list one or two if any.  I have 4 cores and 2GB of
> ram on the mds machines.  Watching it right now it is using most of the ram
> and some of swap although most of the active ram is disk cache.  I lowered
> the memory.swappiness value to see if that helps.  I'm also logging top
> output if it happens again.
>
>
> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng  wrote:
>
>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine  wrote:
>> >
>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum 
>> wrote:
>> >>
>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
>> >>  wrote:
>> >> > Is Ceph Filesystem ready for production servers?
>> >> >
>> >> > The documentation says it's not, but I don't see that mentioned
>> anywhere
>> >> > else.
>> >> > http://ceph.com/docs/master/cephfs/
>> >>
>> >> Everybody has their own standards, but Red Hat isn't supporting it for
>> >> general production use at this time. If you're brave you could test it
>> >> under your workload for a while and see how it comes out; the known
>> >> issues are very much workload-dependent (or just general concerns over
>> >> polish).
>> >> -Greg
>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> >
>> >
>> > I've been testing it with our webstats since it gets live hits but isn't
>> > customer affecting.  Seems the MDS server has problems every few days
>> > requiring me to umount and remount the ceph disk to resolve.  Not sure
>> if
>> > the issue is resolved in development versions but as of 0.80.5 we seem
>> to be
>> > hitting it.  I set the log verbosity to 20 so there's tons of logs but
>> ends
>> > with
>>
>> The cephfs client is supposed to be able to handle MDS takeover.
>> what's symptom makes you umount and remount the cephfs ?
>>
>> >
>> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14  laggy, deferring
>> > client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1)
>> > 2014-08-24 07:10:19.682021 7f2b575e7700  5 mds.0.14 is_laggy 19.324963
>> > 15
>> > since last acked beacon
>> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send
>> up:active
>> > seq 127220 (currently up:active)
>> > 2014-08-24 07:10:21.515899 7f2b575e7700  5 mds.0.14 is_laggy 21.158841
>> > 15
>> > since last acked beacon
>> > 2014-08-24 07:10:21.515912 7f2b575e7700 10 mds.0.14  laggy, deferring
>> > client_session(request_renewcaps seq 26766)
>> > 2014-08-24 07:10:21.515915 7f2b575e7700  5 mds.0.14 is_laggy 21.158857
>> > 15
>> > since last acked beacon
>> > 2014-08-24 07:10:21.981148 7f2b575e7700 10 mds.0.snap check_osd_map
>> > need_to_purge={}
>> > 2014-08-24 07:10:21.981176 7f2b575e7700  5 mds.0.14 is_laggy 21.624117
>> > 15
>> > since last acked beacon
>> > 2014-08-24 07:10:23.170528 7f2b575e7700  5 mds.0.14 handle_mds_map
>> epoch 93
>> > from mon.0
>> > 2014-08-24 07:10:23.175367 7f2b532d5700  0 -- 10.251.188.124:6800/985
>> >>
>> > 10.251.188.118:0/2461578479 pipe(0x5588a80 sd=23 :6800 s=2 pgs=91 cs=1
>> l=0
>> > c=0x2cbfb20).fault with nothing to send, going to standby
>> > 2014-08-24 07:10:23.175376 7f2b533d6700  0 -- 10.251.188.124:6800/985
>> >>
>> > 10.251.1

Re: [ceph-users] Fwd: Ceph Filesystem - Production?

2014-09-04 Thread James Devine
I'm using 3.13.0-35-generic on Ubuntu 14.04.1


On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng  wrote:

> On Fri, Sep 5, 2014 at 3:24 AM, James Devine  wrote:
> > It took a week to happen again, I had hopes that it was fixed but alas
> it is
> > not.  Looking at top logs on the active mds server, the load average was
> > 0.00 the whole time and memory usage never changed much, it is using
> close
> > to 100% and some swap but since I changed memory.swappiness swap usage
> > hasn't gone up but has been slowly coming back down.  Same symptoms, the
> > mount on the client is unresponsive and a cat on
> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries.  A umount and
> > remount seems to fix it.
> >
>
> which version of kernel do you use ?
>
> Yan, Zheng
>
> >
> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine 
> wrote:
> >>
> >> I am running active/standby and it didn't swap over to the standby.  If
> I
> >> shutdown the active server it swaps to the standby fine though.  When
> there
> >> were issues, disk access would back up on the webstats servers and a
> cat of
> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas
> normally
> >> it would only list one or two if any.  I have 4 cores and 2GB of ram on
> the
> >> mds machines.  Watching it right now it is using most of the ram and
> some of
> >> swap although most of the active ram is disk cache.  I lowered the
> >> memory.swappiness value to see if that helps.  I'm also logging top
> output
> >> if it happens again.
> >>
> >>
> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng  wrote:
> >>>
> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine 
> wrote:
> >>> >
> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum 
> >>> > wrote:
> >>> >>
> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
> >>> >>  wrote:
> >>> >> > Is Ceph Filesystem ready for production servers?
> >>> >> >
> >>> >> > The documentation says it's not, but I don't see that mentioned
> >>> >> > anywhere
> >>> >> > else.
> >>> >> > http://ceph.com/docs/master/cephfs/
> >>> >>
> >>> >> Everybody has their own standards, but Red Hat isn't supporting it
> for
> >>> >> general production use at this time. If you're brave you could test
> it
> >>> >> under your workload for a while and see how it comes out; the known
> >>> >> issues are very much workload-dependent (or just general concerns
> over
> >>> >> polish).
> >>> >> -Greg
> >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >>> >> ___
> >>> >> ceph-users mailing list
> >>> >> ceph-users@lists.ceph.com
> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >
> >>> >
> >>> >
> >>> > I've been testing it with our webstats since it gets live hits but
> >>> > isn't
> >>> > customer affecting.  Seems the MDS server has problems every few days
> >>> > requiring me to umount and remount the ceph disk to resolve.  Not
> sure
> >>> > if
> >>> > the issue is resolved in development versions but as of 0.80.5 we
> seem
> >>> > to be
> >>> > hitting it.  I set the log verbosity to 20 so there's tons of logs
> but
> >>> > ends
> >>> > with
> >>>
> >>> The cephfs client is supposed to be able to handle MDS takeover.
> >>> what's symptom makes you umount and remount the cephfs ?
> >>>
> >>> >
> >>> > 2014-08-24 07:10:19.682015 7f2b575e7700 10 mds.0.14  laggy, deferring
> >>> > client_request(client.92141:6795587 getattr pAsLsXsFs #1026dc1)
> >>> > 2014-08-24 07:10:19.682021 7f2b575e7700  5 mds.0.14 is_laggy
> 19.324963
> >>> > > 15
> >>> > since last acked beacon
> >>> > 2014-08-24 07:10:20.358011 7f2b554e2700 10 mds.0.14 beacon_send
> >>> > up:active
> >>> > seq 127220 (currently up:active)
> >>> > 2014-08-24 07:10:21.515899 7f2b575e7700  5 mds.0.14 is_laggy
> 21.158841

Re: [ceph-users] Fwd: Ceph Filesystem - Production?

2014-09-05 Thread James Devine
No messages in dmesg, I've updated the two clients to 3.16, we'll see if
that fixes this issue.


On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng  wrote:

> On Fri, Sep 5, 2014 at 8:42 AM, James Devine  wrote:
> > I'm using 3.13.0-35-generic on Ubuntu 14.04.1
> >
>
> Was there any kernel message when the hang happened?  We have fixed a
> few bugs since 3.13 kernel, please use 3.16 kernel if possible.
>
> Yan, Zheng
>
> >
> > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng  wrote:
> >>
> >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine 
> wrote:
> >> > It took a week to happen again, I had hopes that it was fixed but alas
> >> > it is
> >> > not.  Looking at top logs on the active mds server, the load average
> was
> >> > 0.00 the whole time and memory usage never changed much, it is using
> >> > close
> >> > to 100% and some swap but since I changed memory.swappiness swap usage
> >> > hasn't gone up but has been slowly coming back down.  Same symptoms,
> the
> >> > mount on the client is unresponsive and a cat on
> >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries.  A umount
> and
> >> > remount seems to fix it.
> >> >
> >>
> >> which version of kernel do you use ?
> >>
> >> Yan, Zheng
> >>
> >> >
> >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine 
> >> > wrote:
> >> >>
> >> >> I am running active/standby and it didn't swap over to the standby.
> If
> >> >> I
> >> >> shutdown the active server it swaps to the standby fine though.  When
> >> >> there
> >> >> were issues, disk access would back up on the webstats servers and a
> >> >> cat of
> >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas
> >> >> normally
> >> >> it would only list one or two if any.  I have 4 cores and 2GB of ram
> on
> >> >> the
> >> >> mds machines.  Watching it right now it is using most of the ram and
> >> >> some of
> >> >> swap although most of the active ram is disk cache.  I lowered the
> >> >> memory.swappiness value to see if that helps.  I'm also logging top
> >> >> output
> >> >> if it happens again.
> >> >>
> >> >>
> >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng 
> wrote:
> >> >>>
> >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine 
> >> >>> wrote:
> >> >>> >
> >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum  >
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
> >> >>> >>  wrote:
> >> >>> >> > Is Ceph Filesystem ready for production servers?
> >> >>> >> >
> >> >>> >> > The documentation says it's not, but I don't see that mentioned
> >> >>> >> > anywhere
> >> >>> >> > else.
> >> >>> >> > http://ceph.com/docs/master/cephfs/
> >> >>> >>
> >> >>> >> Everybody has their own standards, but Red Hat isn't supporting
> it
> >> >>> >> for
> >> >>> >> general production use at this time. If you're brave you could
> test
> >> >>> >> it
> >> >>> >> under your workload for a while and see how it comes out; the
> known
> >> >>> >> issues are very much workload-dependent (or just general concerns
> >> >>> >> over
> >> >>> >> polish).
> >> >>> >> -Greg
> >> >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >> >>> >> ___
> >> >>> >> ceph-users mailing list
> >> >>> >> ceph-users@lists.ceph.com
> >> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > I've been testing it with our webstats since it gets live hits but
> >> >>> > isn't
> >> >>> > customer affect

Re: [ceph-users] Ceph Filesystem - Production?

2014-09-09 Thread James Devine
The issue isn't so much mounting the ceph client as it is the mounted ceph
client becoming unusable requiring a remount.  So far so good though.


On Fri, Sep 5, 2014 at 5:53 PM, JIten Shah  wrote:

> We ran into the same issue where we could not mount the filesystem on the
> clients because it had 3.9. Once we upgraded the kernel on the client node,
> we were able to mount it fine. FWIW, you need kernel 3.14 and above.
>
> --jiten
>
> On Sep 5, 2014, at 6:55 AM, James Devine  wrote:
>
> No messages in dmesg, I've updated the two clients to 3.16, we'll see if
> that fixes this issue.
>
>
> On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng  wrote:
>
>> On Fri, Sep 5, 2014 at 8:42 AM, James Devine  wrote:
>> > I'm using 3.13.0-35-generic on Ubuntu 14.04.1
>> >
>>
>> Was there any kernel message when the hang happened?  We have fixed a
>> few bugs since 3.13 kernel, please use 3.16 kernel if possible.
>>
>> Yan, Zheng
>>
>> >
>> > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng  wrote:
>> >>
>> >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine 
>> wrote:
>> >> > It took a week to happen again, I had hopes that it was fixed but
>> alas
>> >> > it is
>> >> > not.  Looking at top logs on the active mds server, the load average
>> was
>> >> > 0.00 the whole time and memory usage never changed much, it is using
>> >> > close
>> >> > to 100% and some swap but since I changed memory.swappiness swap
>> usage
>> >> > hasn't gone up but has been slowly coming back down.  Same symptoms,
>> the
>> >> > mount on the client is unresponsive and a cat on
>> >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries.  A umount
>> and
>> >> > remount seems to fix it.
>> >> >
>> >>
>> >> which version of kernel do you use ?
>> >>
>> >> Yan, Zheng
>> >>
>> >> >
>> >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine 
>> >> > wrote:
>> >> >>
>> >> >> I am running active/standby and it didn't swap over to the
>> standby.  If
>> >> >> I
>> >> >> shutdown the active server it swaps to the standby fine though.
>> When
>> >> >> there
>> >> >> were issues, disk access would back up on the webstats servers and a
>> >> >> cat of
>> >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas
>> >> >> normally
>> >> >> it would only list one or two if any.  I have 4 cores and 2GB of
>> ram on
>> >> >> the
>> >> >> mds machines.  Watching it right now it is using most of the ram and
>> >> >> some of
>> >> >> swap although most of the active ram is disk cache.  I lowered the
>> >> >> memory.swappiness value to see if that helps.  I'm also logging top
>> >> >> output
>> >> >> if it happens again.
>> >> >>
>> >> >>
>> >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng 
>> wrote:
>> >> >>>
>> >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine 
>> >> >>> wrote:
>> >> >>> >
>> >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum <
>> g...@inktank.com>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
>> >> >>> >>  wrote:
>> >> >>> >> > Is Ceph Filesystem ready for production servers?
>> >> >>> >> >
>> >> >>> >> > The documentation says it's not, but I don't see that
>> mentioned
>> >> >>> >> > anywhere
>> >> >>> >> > else.
>> >> >>> >> > http://ceph.com/docs/master/cephfs/
>> >> >>> >>
>> >> >>> >> Everybody has their own standards, but Red Hat isn't supporting
>> it
>> >> >>> >> for
>> >> >>> >> general production use at this time. If you're brave you could
>> test
>> >> >>> >> it
>> >> >>> >> under your workload for a while and see how 

[ceph-users] ceph-dokan mount error

2015-04-30 Thread James Devine
So I am trying to get ceph-dokan to work.  Upon running it with
./ceph-dokan.exe -c ceph.conf -l e it indicates there was a mount
error and the monitor it connects to logs cephx server client.admin:
unexpected key: req.key=0 expected_key=d7901d515f6b0c61

According to the debug output attached ceph-dokan reads the config
file and keyring fine.  Any ideas where I might be going wrong?


ceph-dokan.log
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-dokan mount error

2015-04-30 Thread James Devine
Yup I think you are correct, I see this listed under issues
https://github.com/ketor/ceph-dokan/issues/5

On Thu, Apr 30, 2015 at 12:58 PM, Gregory Farnum  wrote:
> On Thu, Apr 30, 2015 at 9:49 AM, James Devine  wrote:
>> So I am trying to get ceph-dokan to work.  Upon running it with
>> ./ceph-dokan.exe -c ceph.conf -l e it indicates there was a mount
>> error and the monitor it connects to logs cephx server client.admin:
>> unexpected key: req.key=0 expected_key=d7901d515f6b0c61
>>
>> According to the debug output attached ceph-dokan reads the config
>> file and keyring fine.  Any ideas where I might be going wrong?
>
> I haven't used or looked at dokan at all, but if the received key is 0
> then I'm guessing dokan isn't using the cephx security that the server
> is expecting. And there was a note from the guys who did rados_dll
> that they had to disable cephx — so perhaps dokan doesn't support
> cephx at all and you need to disable it? I'm not sure, just spitting
> into the wind here... ;)
> -Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Filesystem - Production?

2014-09-29 Thread James Devine
The issue hasn't popped up since I upgraded the kernel so the issue I was
experiencing seems to have been addressed.

On Tue, Sep 9, 2014 at 12:13 PM, James Devine  wrote:

> The issue isn't so much mounting the ceph client as it is the mounted ceph
> client becoming unusable requiring a remount.  So far so good though.
>
>
> On Fri, Sep 5, 2014 at 5:53 PM, JIten Shah  wrote:
>
>> We ran into the same issue where we could not mount the filesystem on the
>> clients because it had 3.9. Once we upgraded the kernel on the client node,
>> we were able to mount it fine. FWIW, you need kernel 3.14 and above.
>>
>> --jiten
>>
>> On Sep 5, 2014, at 6:55 AM, James Devine  wrote:
>>
>> No messages in dmesg, I've updated the two clients to 3.16, we'll see if
>> that fixes this issue.
>>
>>
>> On Fri, Sep 5, 2014 at 12:28 AM, Yan, Zheng  wrote:
>>
>>> On Fri, Sep 5, 2014 at 8:42 AM, James Devine  wrote:
>>> > I'm using 3.13.0-35-generic on Ubuntu 14.04.1
>>> >
>>>
>>> Was there any kernel message when the hang happened?  We have fixed a
>>> few bugs since 3.13 kernel, please use 3.16 kernel if possible.
>>>
>>> Yan, Zheng
>>>
>>> >
>>> > On Thu, Sep 4, 2014 at 6:08 PM, Yan, Zheng  wrote:
>>> >>
>>> >> On Fri, Sep 5, 2014 at 3:24 AM, James Devine 
>>> wrote:
>>> >> > It took a week to happen again, I had hopes that it was fixed but
>>> alas
>>> >> > it is
>>> >> > not.  Looking at top logs on the active mds server, the load
>>> average was
>>> >> > 0.00 the whole time and memory usage never changed much, it is using
>>> >> > close
>>> >> > to 100% and some swap but since I changed memory.swappiness swap
>>> usage
>>> >> > hasn't gone up but has been slowly coming back down.  Same
>>> symptoms, the
>>> >> > mount on the client is unresponsive and a cat on
>>> >> > /sys/kernel/debug/ceph/*/mdsc had a whole list of entries.  A
>>> umount and
>>> >> > remount seems to fix it.
>>> >> >
>>> >>
>>> >> which version of kernel do you use ?
>>> >>
>>> >> Yan, Zheng
>>> >>
>>> >> >
>>> >> > On Fri, Aug 29, 2014 at 11:26 AM, James Devine 
>>> >> > wrote:
>>> >> >>
>>> >> >> I am running active/standby and it didn't swap over to the
>>> standby.  If
>>> >> >> I
>>> >> >> shutdown the active server it swaps to the standby fine though.
>>> When
>>> >> >> there
>>> >> >> were issues, disk access would back up on the webstats servers and
>>> a
>>> >> >> cat of
>>> >> >> /sys/kernel/debug/ceph/*/mdsc would have a list of entries whereas
>>> >> >> normally
>>> >> >> it would only list one or two if any.  I have 4 cores and 2GB of
>>> ram on
>>> >> >> the
>>> >> >> mds machines.  Watching it right now it is using most of the ram
>>> and
>>> >> >> some of
>>> >> >> swap although most of the active ram is disk cache.  I lowered the
>>> >> >> memory.swappiness value to see if that helps.  I'm also logging top
>>> >> >> output
>>> >> >> if it happens again.
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Aug 28, 2014 at 8:22 PM, Yan, Zheng 
>>> wrote:
>>> >> >>>
>>> >> >>> On Fri, Aug 29, 2014 at 8:36 AM, James Devine >> >
>>> >> >>> wrote:
>>> >> >>> >
>>> >> >>> > On Thu, Aug 28, 2014 at 1:30 PM, Gregory Farnum <
>>> g...@inktank.com>
>>> >> >>> > wrote:
>>> >> >>> >>
>>> >> >>> >> On Thu, Aug 28, 2014 at 10:36 AM, Brian C. Huffman
>>> >> >>> >>  wrote:
>>> >> >>> >> > Is Ceph Filesystem ready for production servers?
>>> >> >>> >> >
>>> >> >>> >> > The documentation says it's not, but I don't see that
>>> mentioned
>>> >> >>> >&g

Re: [ceph-users] Giant or Firefly for production

2014-12-05 Thread James Devine
http://kernel.ubuntu.com/~kernel-ppa/mainline/

I'm running 3.17 on my trusty clients without issue

On Fri, Dec 5, 2014 at 9:37 AM, Antonio Messina  wrote:

> On Fri, Dec 5, 2014 at 4:25 PM, Nick Fisk  wrote:
> > This is probably due to the Kernel RBD client not being recent enough.
> Have
> > you tried upgrading your kernel to a newer version? 3.16 should contain
> all
> > the relevant features required by Giant.
>
> I would rather tune the tunables, as upgrading the kernel would
> require a reboot of the client.
> Besides, Ubuntu Trusty does not provide a 3.16 kernel, so I would need
> to recompile...
>
> .a.
>
> --
> antonio.mess...@s3it.uzh.ch +41 (0)44 635 42 22
> antonio.s.mess...@gmail.com
> S3IT: Service and Support for Science IT   http://www.s3it.uzh.ch/
> University of Zurich
> Winterthurerstrasse 190
> CH-8057 Zurich Switzerland
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com