[ceph-users] Speeding Up "rbd ls -l " output

2017-02-09 Thread Özhan Rüzgar Karaman
Hi;
I am using Hammer 0.49.9 release on my Ceph Storage, today i noticed that
listing an rbd pool takes to much time then the old days. If i have more
rbd images on pool it takes much more time.

My clusters health is ok and currently there is no load on the cluster.
Only rbd images are used to serve to vm's.

I am sending some information below. My level.db size is also 280 mb, i
also compacted level.db to 40 mb size but again "rbd ls -l" output is too
slow.

This timing is important for my vm deploy time to complete because when i
refresh a pool/datastore it takes nearly to 20 seconds or more for 350 rbd
images+snapshots.

Thanks for all help

Regards
Ozhan Ruzgar

root@mont3:/var/lib/ceph/mon/ceph-mont3/store.db# ceph -s
cluster 6b1cb3f4-85e6-4b70-b057-ba7716f823cc
 health HEALTH_OK
 monmap e1: 3 mons at
{mont1=172.16.x.x:6789/0,mont2=172.16.x.x:6789/0,mont3=172.16.x.x:6789/0}
election epoch 126, quorum 0,1,2 mont1,mont2,mont3
 osdmap e20509: 40 osds: 40 up, 40 in
  pgmap v20333442: 1536 pgs, 3 pools, 235 GB data, 63442 objects
700 GB used, 3297 GB / 3998 GB avail
1536 active+clean
  client io 0 B/s rd, 3785 kB/s wr, 314 op/s

root@mont1:~# time rbd ls -l cst2|wc -l
278

real 0m11.970s
user 0m0.572s
sys 0m0.316s
root@mont1:~# time rbd ls -l cst3|wc -l
15

real 0m0.396s
user 0m0.020s
sys 0m0.032s
root@mont1:~# time rbd ls -l cst4|wc -l
330

real 0m16.630s
user 0m0.668s
sys 0m0.336s
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrating data from a Ceph clusters to another

2017-02-09 Thread 林自均
Hi,

I have 2 Ceph clusters, cluster A and cluster B. I want to move all the
pools on A to B. The pool names don't conflict between clusters. I guess
it's like RBD mirroring, except that it's pool mirroring. Is there any
proper ways to do it?

Thanks for any suggestions.

Best,
John Lin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating data from a Ceph clusters to another

2017-02-09 Thread Irek Fasikhov
Hi.
I recommend using rbd import/export.

С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757

2017-02-09 11:13 GMT+03:00 林自均 :

> Hi,
>
> I have 2 Ceph clusters, cluster A and cluster B. I want to move all the
> pools on A to B. The pool names don't conflict between clusters. I guess
> it's like RBD mirroring, except that it's pool mirroring. Is there any
> proper ways to do it?
>
> Thanks for any suggestions.
>
> Best,
> John Lin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding Up "rbd ls -l " output

2017-02-09 Thread Wido den Hollander

> Op 9 februari 2017 om 9:13 schreef Özhan Rüzgar Karaman 
> :
> 
> 
> Hi;
> I am using Hammer 0.49.9 release on my Ceph Storage, today i noticed that
> listing an rbd pool takes to much time then the old days. If i have more
> rbd images on pool it takes much more time.
> 

It is the -l flag that you are using in addition. That flag opens each RBD 
image and stats the header of it to get the size.

A regular 'rbd ls' will only read the RADOS object rbd_directory, but it is the 
-l flag which causes the RBD tool to iterate over all the images and query 
their header.

> My clusters health is ok and currently there is no load on the cluster.
> Only rbd images are used to serve to vm's.
> 
> I am sending some information below. My level.db size is also 280 mb, i
> also compacted level.db to 40 mb size but again "rbd ls -l" output is too
> slow.
> 
> This timing is important for my vm deploy time to complete because when i
> refresh a pool/datastore it takes nearly to 20 seconds or more for 350 rbd
> images+snapshots.
> 
> Thanks for all help
> 
> Regards
> Ozhan Ruzgar
> 
> root@mont3:/var/lib/ceph/mon/ceph-mont3/store.db# ceph -s
> cluster 6b1cb3f4-85e6-4b70-b057-ba7716f823cc
>  health HEALTH_OK
>  monmap e1: 3 mons at
> {mont1=172.16.x.x:6789/0,mont2=172.16.x.x:6789/0,mont3=172.16.x.x:6789/0}
> election epoch 126, quorum 0,1,2 mont1,mont2,mont3
>  osdmap e20509: 40 osds: 40 up, 40 in
>   pgmap v20333442: 1536 pgs, 3 pools, 235 GB data, 63442 objects
> 700 GB used, 3297 GB / 3998 GB avail
> 1536 active+clean
>   client io 0 B/s rd, 3785 kB/s wr, 314 op/s
> 
> root@mont1:~# time rbd ls -l cst2|wc -l
> 278
> 
> real 0m11.970s
> user 0m0.572s
> sys 0m0.316s
> root@mont1:~# time rbd ls -l cst3|wc -l
> 15
> 
> real 0m0.396s
> user 0m0.020s
> sys 0m0.032s
> root@mont1:~# time rbd ls -l cst4|wc -l
> 330
> 
> real 0m16.630s
> user 0m0.668s
> sys 0m0.336s
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating data from a Ceph clusters to another

2017-02-09 Thread Craig Chi
Hi John,

rbd mirroring can configured by 
pool.http://docs.ceph.com/docs/master/rbd/rbd-mirroring/
However the rbd mirroring method can only be used on rbd with layering feature, 
it can not mirror objects other than rbd for you.

Sincerely,
Craig Chi

On 2017-02-09 16:24, Irek Fasikhovwrote:
> Hi.
> I recommend using rbd import/export.
>   
>   
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757
>   
>   
>   
>   
> 2017-02-09 11:13 GMT+03:00 林自均mailto:johnl...@gmail.com)>:
> > Hi,
> >   
> > I have 2 Ceph clusters, cluster A and cluster B. I want to move all the 
> > pools on A to B. The pool names don't conflict between clusters. I guess 
> > it's like RBD mirroring, except that it's pool mirroring.Is there any 
> > proper ways to do it?
> >   
> > Thanks for any suggestions.
> >   
> > Best,
> > John Lin
> >   
> >   
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com(mailto:ceph-users@lists.ceph.com)
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
> >  ceph-users mailing list ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding Up "rbd ls -l " output

2017-02-09 Thread Özhan Rüzgar Karaman
Hi Wido;
Thanks for fast response rbd ls -l reads all images header for its sizes
yes it makes sense you are right.

My main problem is when i refresh a rbd storage pool using virsh over
kvm(Ubuntu 14.04.5) it takes too much time then the old days and i suspect
that virsh makes "rbd ls -l" over Ceph storage so thats why i asked.

Does virsh use same "rbd ls -l" for pool refresh?

So in this case below 22 second is normal for virsh rbd pool refresh?

root@kvmt1:~# time virsh pool-refresh 01b375db-d3f5-33c1-9389-8bf226c887e8
Pool 01b375db-d3f5-33c1-9389-8bf226c887e8 refreshed


real 0m22.504s
user 0m0.012s
sys 0m0.004s

Thanks
Özhan


On Thu, Feb 9, 2017 at 11:30 AM, Wido den Hollander  wrote:

>
> > Op 9 februari 2017 om 9:13 schreef Özhan Rüzgar Karaman <
> oruzgarkara...@gmail.com>:
> >
> >
> > Hi;
> > I am using Hammer 0.49.9 release on my Ceph Storage, today i noticed that
> > listing an rbd pool takes to much time then the old days. If i have more
> > rbd images on pool it takes much more time.
> >
>
> It is the -l flag that you are using in addition. That flag opens each RBD
> image and stats the header of it to get the size.
>
> A regular 'rbd ls' will only read the RADOS object rbd_directory, but it
> is the -l flag which causes the RBD tool to iterate over all the images and
> query their header.
>
> > My clusters health is ok and currently there is no load on the cluster.
> > Only rbd images are used to serve to vm's.
> >
> > I am sending some information below. My level.db size is also 280 mb, i
> > also compacted level.db to 40 mb size but again "rbd ls -l" output is too
> > slow.
> >
> > This timing is important for my vm deploy time to complete because when i
> > refresh a pool/datastore it takes nearly to 20 seconds or more for 350
> rbd
> > images+snapshots.
> >
> > Thanks for all help
> >
> > Regards
> > Ozhan Ruzgar
> >
> > root@mont3:/var/lib/ceph/mon/ceph-mont3/store.db# ceph -s
> > cluster 6b1cb3f4-85e6-4b70-b057-ba7716f823cc
> >  health HEALTH_OK
> >  monmap e1: 3 mons at
> > {mont1=172.16.x.x:6789/0,mont2=172.16.x.x:6789/0,mont3=
> 172.16.x.x:6789/0}
> > election epoch 126, quorum 0,1,2 mont1,mont2,mont3
> >  osdmap e20509: 40 osds: 40 up, 40 in
> >   pgmap v20333442: 1536 pgs, 3 pools, 235 GB data, 63442 objects
> > 700 GB used, 3297 GB / 3998 GB avail
> > 1536 active+clean
> >   client io 0 B/s rd, 3785 kB/s wr, 314 op/s
> >
> > root@mont1:~# time rbd ls -l cst2|wc -l
> > 278
> >
> > real 0m11.970s
> > user 0m0.572s
> > sys 0m0.316s
> > root@mont1:~# time rbd ls -l cst3|wc -l
> > 15
> >
> > real 0m0.396s
> > user 0m0.020s
> > sys 0m0.032s
> > root@mont1:~# time rbd ls -l cst4|wc -l
> > 330
> >
> > real 0m16.630s
> > user 0m0.668s
> > sys 0m0.336s
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread Henrik Korkuc

On 17-02-09 05:09, Sage Weil wrote:

Hello, ceph operators...

Several times in the past we've had to do some ondisk format conversion
during upgrade which mean that the first time the ceph-osd daemon started
after upgrade it had to spend a few minutes fixing up it's ondisk files.
We haven't had to recently, though, and generally try to avoid such
things.

However, there's a change we'd like to make in FileStore for luminous (*)
and it would save us a lot of time and complexity if it was a one-shot
update during the upgrade.  I would probably take in the neighborhood of
1-5 minutes for a 4-6TB HDD.  That means that when restarting the daemon
during the upgrade the OSD would stay down for that period (vs the usual
<1 restart time).

Does this concern anyone?  It probably means the upgrades will take longer
if you're going host by host since the time per host will go up.
In my opinion if this is clearly communicated (release notes + OSD logs) 
it's fine otherwise it may feel that something is wrong if OSD will take 
long time to start.



sage


* eliminate 'snapdir' objects, replacing them with a head object +
whiteout.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating data from a Ceph clusters to another

2017-02-09 Thread Craig Chi
Hi,

Sorry I gave the wrong feature.
rbd mirroring method can only be used on rbd with "journaling" feature (not 
layering).

Sincerely,
Craig Chi

On 2017-02-09 16:41, Craig Chiwrote:
> Hi John,
>   
> rbd mirroring can configured by 
> pool.http://docs.ceph.com/docs/master/rbd/rbd-mirroring/
> However the rbd mirroring method can only be used on rbd with layering 
> feature, it can not mirror objects other than rbd for you.
>   
> Sincerely,
> Craig Chi
>   
> On 2017-02-09 16:24, Irek Fasikhovwrote:
> > Hi.
> > I recommend using rbd import/export.
> >   
> >   
> > С уважением, Фасихов Ирек Нургаязович
> > Моб.: +79229045757
> >   
> >   
> >   
> >   
> > 2017-02-09 11:13 GMT+03:00 
> > 林自均mailto:johnl...@gmail.com)>:
> > > Hi,
> > >   
> > > I have 2 Ceph clusters, cluster A and cluster B. I want to move all the 
> > > pools on A to B. The pool names don't conflict between clusters. I guess 
> > > it's like RBD mirroring, except that it's pool mirroring.Is there any 
> > > proper ways to do it?
> > >   
> > > Thanks for any suggestions.
> > >   
> > > Best,
> > > John Lin
> > >   
> > >   
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com(mailto:ceph-users@lists.ceph.com)
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
> > >  ceph-users mailing list ceph-users@lists.ceph.com 
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding Up "rbd ls -l " output

2017-02-09 Thread Wido den Hollander

> Op 9 februari 2017 om 9:41 schreef Özhan Rüzgar Karaman 
> :
> 
> 
> Hi Wido;
> Thanks for fast response rbd ls -l reads all images header for its sizes
> yes it makes sense you are right.
> 
> My main problem is when i refresh a rbd storage pool using virsh over
> kvm(Ubuntu 14.04.5) it takes too much time then the old days and i suspect
> that virsh makes "rbd ls -l" over Ceph storage so thats why i asked.
> 
> Does virsh use same "rbd ls -l" for pool refresh?
> 

Yes, it does: 
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend_rbd.c;h=45beb107aa2a5c85b7d65b8687c2b65751871595;hb=HEAD#l425

In short, this C code does (pseudo):

images = []
for image in rbd_list():
  images.append(rbd_stat(image))

The more images you have, the longer it takes.

> So in this case below 22 second is normal for virsh rbd pool refresh?
> 

Yes. One of my goals for libvirt is still to make this refresh a async 
operation inside libvirt, but that's a bit difficult inside libvirt and have 
never gotten to actually implementing that.

Wido

> root@kvmt1:~# time virsh pool-refresh 01b375db-d3f5-33c1-9389-8bf226c887e8
> Pool 01b375db-d3f5-33c1-9389-8bf226c887e8 refreshed
> 
> 
> real 0m22.504s
> user 0m0.012s
> sys 0m0.004s
> 
> Thanks
> Özhan
> 
> 
> On Thu, Feb 9, 2017 at 11:30 AM, Wido den Hollander  wrote:
> 
> >
> > > Op 9 februari 2017 om 9:13 schreef Özhan Rüzgar Karaman <
> > oruzgarkara...@gmail.com>:
> > >
> > >
> > > Hi;
> > > I am using Hammer 0.49.9 release on my Ceph Storage, today i noticed that
> > > listing an rbd pool takes to much time then the old days. If i have more
> > > rbd images on pool it takes much more time.
> > >
> >
> > It is the -l flag that you are using in addition. That flag opens each RBD
> > image and stats the header of it to get the size.
> >
> > A regular 'rbd ls' will only read the RADOS object rbd_directory, but it
> > is the -l flag which causes the RBD tool to iterate over all the images and
> > query their header.
> >
> > > My clusters health is ok and currently there is no load on the cluster.
> > > Only rbd images are used to serve to vm's.
> > >
> > > I am sending some information below. My level.db size is also 280 mb, i
> > > also compacted level.db to 40 mb size but again "rbd ls -l" output is too
> > > slow.
> > >
> > > This timing is important for my vm deploy time to complete because when i
> > > refresh a pool/datastore it takes nearly to 20 seconds or more for 350
> > rbd
> > > images+snapshots.
> > >
> > > Thanks for all help
> > >
> > > Regards
> > > Ozhan Ruzgar
> > >
> > > root@mont3:/var/lib/ceph/mon/ceph-mont3/store.db# ceph -s
> > > cluster 6b1cb3f4-85e6-4b70-b057-ba7716f823cc
> > >  health HEALTH_OK
> > >  monmap e1: 3 mons at
> > > {mont1=172.16.x.x:6789/0,mont2=172.16.x.x:6789/0,mont3=
> > 172.16.x.x:6789/0}
> > > election epoch 126, quorum 0,1,2 mont1,mont2,mont3
> > >  osdmap e20509: 40 osds: 40 up, 40 in
> > >   pgmap v20333442: 1536 pgs, 3 pools, 235 GB data, 63442 objects
> > > 700 GB used, 3297 GB / 3998 GB avail
> > > 1536 active+clean
> > >   client io 0 B/s rd, 3785 kB/s wr, 314 op/s
> > >
> > > root@mont1:~# time rbd ls -l cst2|wc -l
> > > 278
> > >
> > > real 0m11.970s
> > > user 0m0.572s
> > > sys 0m0.316s
> > > root@mont1:~# time rbd ls -l cst3|wc -l
> > > 15
> > >
> > > real 0m0.396s
> > > user 0m0.020s
> > > sys 0m0.032s
> > > root@mont1:~# time rbd ls -l cst4|wc -l
> > > 330
> > >
> > > real 0m16.630s
> > > user 0m0.668s
> > > sys 0m0.336s
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Speeding Up "rbd ls -l " output

2017-02-09 Thread Özhan Rüzgar Karaman
Thanks Wido, you are the best :)

On Thu, Feb 9, 2017 at 11:50 AM, Wido den Hollander  wrote:

>
> > Op 9 februari 2017 om 9:41 schreef Özhan Rüzgar Karaman <
> oruzgarkara...@gmail.com>:
> >
> >
> > Hi Wido;
> > Thanks for fast response rbd ls -l reads all images header for its sizes
> > yes it makes sense you are right.
> >
> > My main problem is when i refresh a rbd storage pool using virsh over
> > kvm(Ubuntu 14.04.5) it takes too much time then the old days and i
> suspect
> > that virsh makes "rbd ls -l" over Ceph storage so thats why i asked.
> >
> > Does virsh use same "rbd ls -l" for pool refresh?
> >
>
> Yes, it does: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/
> storage/storage_backend_rbd.c;h=45beb107aa2a5c85b7d65b8687c2b6
> 5751871595;hb=HEAD#l425
>
> In short, this C code does (pseudo):
>
> images = []
> for image in rbd_list():
>   images.append(rbd_stat(image))
>
> The more images you have, the longer it takes.
>
> > So in this case below 22 second is normal for virsh rbd pool refresh?
> >
>
> Yes. One of my goals for libvirt is still to make this refresh a async
> operation inside libvirt, but that's a bit difficult inside libvirt and
> have never gotten to actually implementing that.
>
> Wido
>
> > root@kvmt1:~# time virsh pool-refresh 01b375db-d3f5-33c1-9389-
> 8bf226c887e8
> > Pool 01b375db-d3f5-33c1-9389-8bf226c887e8 refreshed
> >
> >
> > real 0m22.504s
> > user 0m0.012s
> > sys 0m0.004s
> >
> > Thanks
> > Özhan
> >
> >
> > On Thu, Feb 9, 2017 at 11:30 AM, Wido den Hollander 
> wrote:
> >
> > >
> > > > Op 9 februari 2017 om 9:13 schreef Özhan Rüzgar Karaman <
> > > oruzgarkara...@gmail.com>:
> > > >
> > > >
> > > > Hi;
> > > > I am using Hammer 0.49.9 release on my Ceph Storage, today i noticed
> that
> > > > listing an rbd pool takes to much time then the old days. If i have
> more
> > > > rbd images on pool it takes much more time.
> > > >
> > >
> > > It is the -l flag that you are using in addition. That flag opens each
> RBD
> > > image and stats the header of it to get the size.
> > >
> > > A regular 'rbd ls' will only read the RADOS object rbd_directory, but
> it
> > > is the -l flag which causes the RBD tool to iterate over all the
> images and
> > > query their header.
> > >
> > > > My clusters health is ok and currently there is no load on the
> cluster.
> > > > Only rbd images are used to serve to vm's.
> > > >
> > > > I am sending some information below. My level.db size is also 280
> mb, i
> > > > also compacted level.db to 40 mb size but again "rbd ls -l" output
> is too
> > > > slow.
> > > >
> > > > This timing is important for my vm deploy time to complete because
> when i
> > > > refresh a pool/datastore it takes nearly to 20 seconds or more for
> 350
> > > rbd
> > > > images+snapshots.
> > > >
> > > > Thanks for all help
> > > >
> > > > Regards
> > > > Ozhan Ruzgar
> > > >
> > > > root@mont3:/var/lib/ceph/mon/ceph-mont3/store.db# ceph -s
> > > > cluster 6b1cb3f4-85e6-4b70-b057-ba7716f823cc
> > > >  health HEALTH_OK
> > > >  monmap e1: 3 mons at
> > > > {mont1=172.16.x.x:6789/0,mont2=172.16.x.x:6789/0,mont3=
> > > 172.16.x.x:6789/0}
> > > > election epoch 126, quorum 0,1,2 mont1,mont2,mont3
> > > >  osdmap e20509: 40 osds: 40 up, 40 in
> > > >   pgmap v20333442: 1536 pgs, 3 pools, 235 GB data, 63442 objects
> > > > 700 GB used, 3297 GB / 3998 GB avail
> > > > 1536 active+clean
> > > >   client io 0 B/s rd, 3785 kB/s wr, 314 op/s
> > > >
> > > > root@mont1:~# time rbd ls -l cst2|wc -l
> > > > 278
> > > >
> > > > real 0m11.970s
> > > > user 0m0.572s
> > > > sys 0m0.316s
> > > > root@mont1:~# time rbd ls -l cst3|wc -l
> > > > 15
> > > >
> > > > real 0m0.396s
> > > > user 0m0.020s
> > > > sys 0m0.032s
> > > > root@mont1:~# time rbd ls -l cst4|wc -l
> > > > 330
> > > >
> > > > real 0m16.630s
> > > > user 0m0.668s
> > > > sys 0m0.336s
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4

2017-02-09 Thread Joao Eduardo Luis

Hi Jim,

On 02/08/2017 07:45 PM, Jim Kilborn wrote:

I have had two ceph monitor nodes generate swap space alerts this week.
Looking at the memory, I see ceph-mon using a lot of memory and most of the 
swap space. My ceph nodes have 128GB mem, with 2GB swap  (I know the 
memory/swap ratio is odd)

When I get the alert, I see the following

[snip]

root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM'

USERPID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND

ceph 174239  0.3 45.8 62812848 60405112 ?   Ssl   2016 269:08 
/usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph 
--setgroup ceph

[snip]


Is this a setting issue? Or Maybe a bug?
When I look at the other ceph-mon processes on other nodes, they aren’t using 
any swap, and only about 500MB of memory.


Can you get us the result of `ceph -s`, of `ceph daemon mon.ID ops`, and 
the size of your monitor's data directory? The latter, ideally, 
recursive with the sizes of all the children in the tree (which, 
assuming they're a lot, would likely be better on a pastebin).


  -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating data from a Ceph clusters to another

2017-02-09 Thread 林自均
Hi Irek & Craig,

Sorry, I misunderstood "RBD mirroring". What I want to do is not like that.

I just want to move all the data from a cluster to another. It can be
achieved by `rados -p  get  ` for all objects on
cluster A, and then `rados -p  put  ` on cluster B.
Is there any tool for that?

Best,
John Lin

Craig Chi  於 2017年2月9日 週四 下午4:43寫道:

> Hi,
>
> Sorry I gave the wrong feature.
> rbd mirroring method can only be used on rbd with "journaling" feature
> (not layering).
>
> Sincerely,
> Craig Chi
>
> On 2017-02-09 16:41, Craig Chi  wrote:
>
> Hi John,
>
> rbd mirroring can configured by pool.
> http://docs.ceph.com/docs/master/rbd/rbd-mirroring/
> However the rbd mirroring method can only be used on rbd with layering
> feature, it can not mirror objects other than rbd for you.
>
> Sincerely,
> Craig Chi
>
> On 2017-02-09 16:24, Irek Fasikhov  wrote:
>
> Hi.
> I recommend using rbd import/export.
>
> С уважением, Фасихов Ирек Нургаязович
> Моб.: +79229045757 <+7%20922%20904-57-57>
>
> 2017-02-09 11:13 GMT+03:00 林自均 :
>
> Hi,
>
> I have 2 Ceph clusters, cluster A and cluster B. I want to move all the
> pools on A to B. The pool names don't conflict between clusters. I guess
> it's like RBD mirroring, except that it's pool mirroring. Is there any
> proper ways to do it?
>
> Thanks for any suggestions.
>
> Best,
> John Lin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___ ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread Joao Eduardo Luis

On 02/09/2017 04:19 AM, David Turner wrote:

The only issue I can think of is if there isn't a version of the clients
fully tested to work with a partially upgraded cluster or a documented
incompatibility requiring downtime. We've had upgrades where we had to
upgrade clients first and others that we had to do the clients last due
to issues with how the clients interacted with an older cluster,
partially upgraded cluster, or newer cluster.

If the FileStore is changing this much, I can imagine a Jewel client
having a hard time locating the objects it needs from a Luminous cluster.


AFAIU, this would be on the osd side and completely transparent to clients.

This has to do with how the osds keep track of object snapshots (in the 
event of head being deleted?), and clients themselves should have 
nothing to worry about.


  -Joao


On Wed, Feb 8, 2017 at 8:09 PM Sage Weil mailto:sw...@redhat.com>> wrote:

Hello, ceph operators...

Several times in the past we've had to do some ondisk format conversion
during upgrade which mean that the first time the ceph-osd daemon
started
after upgrade it had to spend a few minutes fixing up it's ondisk files.
We haven't had to recently, though, and generally try to avoid such
things.

However, there's a change we'd like to make in FileStore for
luminous (*)
and it would save us a lot of time and complexity if it was a one-shot
update during the upgrade.  I would probably take in the neighborhood of
1-5 minutes for a 4-6TB HDD.  That means that when restarting the daemon
during the upgrade the OSD would stay down for that period (vs the usual
<1 restart time).

Does this concern anyone?  It probably means the upgrades will take
longer
if you're going host by host since the time per host will go up.

sage


* eliminate 'snapdir' objects, replacing them with a head object +
whiteout.
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread Dave Holland
On Thu, Feb 09, 2017 at 10:41:44AM +0200, Henrik Korkuc wrote:
> On 17-02-09 05:09, Sage Weil wrote:
> >Does this concern anyone?  It probably means the upgrades will take longer
> >if you're going host by host since the time per host will go up.
> In my opinion if this is clearly communicated (release notes + OSD logs)

+1 for having the OSD log something when it starts the upgrade
process, so the sysadmin who goes looking will see what's happening.

Cheers,
Dave
-- 
** Dave Holland ** Systems Support -- Informatics Systems Group **
** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK **


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4

2017-02-09 Thread Jim Kilborn
Joao,

Here is the information requested. Thanks for taking a look. Note that the 
below is after I restarted the ceph-mon processes yesterday. If this is not 
acceptable, I will have to wait until the issue reappears. This is on a small 
cluster. 4 ceph nodes, and 6 ceph kernel clients running over infiniband.



[root@empire-ceph02 log]# ceph -s

cluster 62ed97d6-adf4-12e4-8fd5-3d9701b22b87

 health HEALTH_OK

 monmap e3: 3 mons at 
{empire-ceph01=192.168.20.241:6789/0,empire-ceph02=192.168.20.242:6789/0,empire-ceph03=192.168.20.243:6789/0}

election epoch 56, quorum 0,1,2 
empire-ceph01,empire-ceph02,empire-ceph03

  fsmap e526: 1/1/1 up {0=empire-ceph03=up:active}, 1 up:standby

 osdmap e361: 32 osds: 32 up, 32 in

flags sortbitwise,require_jewel_osds

  pgmap v2427955: 768 pgs, 2 pools, 2370 GB data, 1759 kobjects

7133 GB used, 109 TB / 116 TB avail

 768 active+clean

  client io 256 B/s wr, 0 op/s rd, 0 op/s wr



[root@empire-ceph02 log]# ceph daemon mon.empire-ceph02 ops

{

"ops": [],

"num_ops": 0

}



[root@empire-ceph02 mon]# du -sh ceph-empire-ceph02

30M ceph-empire-ceph02



[root@empire-ceph02 mon]# ls -lR

.:

total 0

drwxr-xr-x. 3 ceph ceph 46 Dec  6 14:26 ceph-empire-ceph02



./ceph-empire-ceph02:

total 8

-rw-r--r--. 1 ceph ceph0 Dec  6 14:26 done

-rw---. 1 ceph ceph   77 Dec  6 14:26 keyring

drwxr-xr-x. 2 ceph ceph 4096 Feb  9 06:58 store.db



./ceph-empire-ceph02/store.db:

total 30056

-rw-r--r--. 1 ceph ceph  396167 Feb  9 06:06 510929.sst

-rw-r--r--. 1 ceph ceph  778898 Feb  9 06:56 511298.sst

-rw-r--r--. 1 ceph ceph 5177344 Feb  9 07:01 511301.log

-rw-r--r--. 1 ceph ceph 1491740 Feb  9 06:58 511305.sst

-rw-r--r--. 1 ceph ceph 2162405 Feb  9 06:58 511306.sst

-rw-r--r--. 1 ceph ceph 2162047 Feb  9 06:58 511307.sst

-rw-r--r--. 1 ceph ceph 2104201 Feb  9 06:58 511308.sst

-rw-r--r--. 1 ceph ceph 2146113 Feb  9 06:58 511309.sst

-rw-r--r--. 1 ceph ceph 2123659 Feb  9 06:58 511310.sst

-rw-r--r--. 1 ceph ceph 2162927 Feb  9 06:58 511311.sst

-rw-r--r--. 1 ceph ceph 2129640 Feb  9 06:58 511312.sst

-rw-r--r--. 1 ceph ceph 2133590 Feb  9 06:58 511313.sst

-rw-r--r--. 1 ceph ceph 2143906 Feb  9 06:58 511314.sst

-rw-r--r--. 1 ceph ceph 2158434 Feb  9 06:58 511315.sst

-rw-r--r--. 1 ceph ceph 1649589 Feb  9 06:58 511316.sst

-rw-r--r--. 1 ceph ceph  16 Feb  8 13:42 CURRENT

-rw-r--r--. 1 ceph ceph   0 Dec  6 14:26 LOCK

-rw-r--r--. 1 ceph ceph  983040 Feb  9 06:58 MANIFEST-503363





Sent from Mail for Windows 10



From: Joao Eduardo Luis
Sent: Thursday, February 9, 2017 3:06 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4



Hi Jim,

On 02/08/2017 07:45 PM, Jim Kilborn wrote:
> I have had two ceph monitor nodes generate swap space alerts this week.
> Looking at the memory, I see ceph-mon using a lot of memory and most of the 
> swap space. My ceph nodes have 128GB mem, with 2GB swap  (I know the 
> memory/swap ratio is odd)
>
> When I get the alert, I see the following
[snip]
> root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM'
>
> USERPID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>
> ceph 174239  0.3 45.8 62812848 60405112 ?   Ssl   2016 269:08 
> /usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph 
> --setgroup ceph
>
> [snip]
>
>
> Is this a setting issue? Or Maybe a bug?
> When I look at the other ceph-mon processes on other nodes, they aren’t using 
> any swap, and only about 500MB of memory.

Can you get us the result of `ceph -s`, of `ceph daemon mon.ID ops`, and
the size of your monitor's data directory? The latter, ideally,
recursive with the sizes of all the children in the tree (which,
assuming they're a lot, would likely be better on a pastebin).

   -Joao
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread George Mihaiescu
Hi Sage,

Is the update running in parallel for all OSDs being restarted? 

Because 5 min per server is different than 150 min when there are 30 OSDs 
there..

Thank you,
George 

> On Feb 8, 2017, at 22:09, Sage Weil  wrote:
> 
> Hello, ceph operators...
> 
> Several times in the past we've had to do some ondisk format conversion 
> during upgrade which mean that the first time the ceph-osd daemon started 
> after upgrade it had to spend a few minutes fixing up it's ondisk files.  
> We haven't had to recently, though, and generally try to avoid such 
> things.
> 
> However, there's a change we'd like to make in FileStore for luminous (*) 
> and it would save us a lot of time and complexity if it was a one-shot 
> update during the upgrade.  I would probably take in the neighborhood of 
> 1-5 minutes for a 4-6TB HDD.  That means that when restarting the daemon 
> during the upgrade the OSD would stay down for that period (vs the usual 
> <1 restart time).
> 
> Does this concern anyone?  It probably means the upgrades will take longer 
> if you're going host by host since the time per host will go up.
> 
> sage
> 
> 
> * eliminate 'snapdir' objects, replacing them with a head object + 
> whiteout.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread Sage Weil
On Thu, 9 Feb 2017, George Mihaiescu wrote:
> Hi Sage,
> 
> Is the update running in parallel for all OSDs being restarted? 
> 
> Because 5 min per server is different than 150 min when there are 30 
> OSDs there..

In parallel.

sage

> 
> Thank you,
> George 
> 
> > On Feb 8, 2017, at 22:09, Sage Weil  wrote:
> > 
> > Hello, ceph operators...
> > 
> > Several times in the past we've had to do some ondisk format conversion 
> > during upgrade which mean that the first time the ceph-osd daemon started 
> > after upgrade it had to spend a few minutes fixing up it's ondisk files.  
> > We haven't had to recently, though, and generally try to avoid such 
> > things.
> > 
> > However, there's a change we'd like to make in FileStore for luminous (*) 
> > and it would save us a lot of time and complexity if it was a one-shot 
> > update during the upgrade.  I would probably take in the neighborhood of 
> > 1-5 minutes for a 4-6TB HDD.  That means that when restarting the daemon 
> > during the upgrade the OSD would stay down for that period (vs the usual 
> > <1 restart time).
> > 
> > Does this concern anyone?  It probably means the upgrades will take longer 
> > if you're going host by host since the time per host will go up.
> > 
> > sage
> > 
> > 
> > * eliminate 'snapdir' objects, replacing them with a head object + 
> > whiteout.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread Sage Weil
On Thu, 9 Feb 2017, David Turner wrote:
> The only issue I can think of is if there isn't a version of the clients
> fully tested to work with a partially upgraded cluster or a documented
> incompatibility requiring downtime. We've had upgrades where we had to
> upgrade clients first and others that we had to do the clients last due to
> issues with how the clients interacted with an older cluster, partially
> upgraded cluster, or newer cluster.

We maintain client compatibiltity across *many* releases and several 
years.  In general this under the control of the administrator via their 
choice of CRUSH tunables, which effectively let you choose the oldest 
client you'd like to support.

I'm curious which upgrade you had problems with?  Generally speaking the 
only "client" upgrade ordering issue is with the radosgw clients, which 
need to be upgraded after the OSDs.

> If the FileStore is changing this much, I can imagine a Jewel client having
> a hard time locating the objects it needs from a Luminous cluster.

In this case the change would be internal to a single OSD and have no 
effect on the client/osd interaction or placement of objects.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph security hardening

2017-02-09 Thread nigel davies
Hay All

Does any one have an advise on hardening my ceph cluster?

I have or ready doen the cephx auth part, but not sure if i can say limit
my ceph user sudo permission to use only ceph commands.

Any advise on this would be grateful
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: Ceph security hardening

2017-02-09 Thread nigel davies
Hay All

Does any one have an advise on hardening my ceph cluster?

I have or ready doen the cephx auth part, but not sure if i can say limit
my ceph user sudo permission to use only ceph commands.

Any advise on this would be grateful
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread David Turner
When we upgraded to Jewel 10.2.3 from Hammer 0.94.7 in our QA cluster we
had issues with client incompatibility.  We first tried upgrading our
clients before upgrading the cluster.  This broke creating RBDs, cloning
RBDs, and probably many other things.  We quickly called that test a wash
and redeployed the cluster back to 0.94.7 and redid the upgrade by
partially upgrading the cluster, testing, fully upgrading the cluster,
testing, and finally upgraded the clients to Jewel.  This worked with no
issues creating RBDs, cloning, snapshots, deleting, etc.

I'm not sure if there was a previous reason that we decided to always
upgrade the clients first.  It might have had to do with the upgrade from
Firefly to Hammer.  It's just something we always test now, especially with
full version upgrades.  That being said, making sure that there is a client
that was regression tested throughout the cluster upgrade would be great to
have in the release notes.

On Thu, Feb 9, 2017 at 7:29 AM Sage Weil  wrote:

> On Thu, 9 Feb 2017, David Turner wrote:
> > The only issue I can think of is if there isn't a version of the clients
> > fully tested to work with a partially upgraded cluster or a documented
> > incompatibility requiring downtime. We've had upgrades where we had to
> > upgrade clients first and others that we had to do the clients last due
> to
> > issues with how the clients interacted with an older cluster, partially
> > upgraded cluster, or newer cluster.
>
> We maintain client compatibiltity across *many* releases and several
> years.  In general this under the control of the administrator via their
> choice of CRUSH tunables, which effectively let you choose the oldest
> client you'd like to support.
>
> I'm curious which upgrade you had problems with?  Generally speaking the
> only "client" upgrade ordering issue is with the radosgw clients, which
> need to be upgraded after the OSDs.
>
> > If the FileStore is changing this much, I can imagine a Jewel client
> having
> > a hard time locating the objects it needs from a Luminous cluster.
>
> In this case the change would be internal to a single OSD and have no
> effect on the client/osd interaction or placement of objects.
>
> sage
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Ceph security hardening

2017-02-09 Thread David Turner
You can change your ceph.conf file's permissions to only be readable by
root so that you have to use sudo to run ceph commands.  Configuring sudo
to only work with certain commands is a simple and common practice which
should be easy to implement.

On Thu, Feb 9, 2017 at 10:12 AM nigel davies  wrote:

> Hay All
>
> Does any one have an advise on hardening my ceph cluster?
>
> I have or ready doen the cephx auth part, but not sure if i can say limit
> my ceph user sudo permission to use only ceph commands.
>
> Any advise on this would be grateful
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4

2017-02-09 Thread Graham Allan
I've been trying to figure out the same thing recently - I had the same 
issues as others with jewel 10.2.3 (?) but for my current problem I 
don't think it's a ceph issue.


Specifically ever since our last maintenance day, some of our OSD nodes 
having been suffering OSDs killed by OOM killer despite having enough 
memory.


I looked for ages at the discussions about reducing the map cache size 
but it just didn't seem a likely cause.


It looks like a kernel bug. Here for ubuntu:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842

I was seeing this OOM issue on kernels 4.4.0.59 and 4.4.0.62. It sounds 
like downgrading into 4.4.0.57 should resolve the issue, and 4.4.0.63 
out shortly should also fix it.


Our unaffected machines in the cluster are running a different release 
and kernel (though same version of ceph).


Haven't actually tested this yet, just found the reference in the last 
hour... could this also be the problem you are seeing?


Graham

On 2/8/2017 6:58 PM, Andrei Mikhailovsky wrote:

+1

Ever since upgrading to 10.2.x I have been seeing a lot of issues with our ceph 
cluster. I have been seeing osds down, osd servers running out of memory and 
killing all ceph-osd processes. Again, 10.2.5 on 4.4.x kernel.

It seems what with every release there are more and more problems with ceph 
(((, which is a shame.

Andrei

- Original Message -

From: "Jim Kilborn" 
To: "ceph-users" 
Sent: Wednesday, 8 February, 2017 19:45:58
Subject: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel  4.4



I have had two ceph monitor nodes generate swap space alerts this week.
Looking at the memory, I see ceph-mon using a lot of memory and most of the swap
space. My ceph nodes have 128GB mem, with 2GB swap  (I know the memory/swap
ratio is odd)

When I get the alert, I see the following


root@empire-ceph02 ~]# free

 totalusedfree  shared  buff/cache   available

Mem:  1317838766761800013383516   538685078236061599096

Swap:   2097148 2097092  56



root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM'

USERPID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND

ceph 174239  0.3 45.8 62812848 60405112 ?   Ssl   2016 269:08
/usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph
--setgroup ceph


In the ceph-mon log, I see the following:

Feb  8 09:31:21 empire-ceph02 ceph-mon: 2017-02-08 09:31:21.211268 7f414d974700
-1 lsb_release_parse - failed to call lsb_release binary with error: (12)
Cannot allocate memory
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012856 7f3dcfe94700
-1 osd.8 344 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back
2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012900 7f3dcfe94700
-1 osd.8 344 heartbeat_check: no reply from 0x563e4214da10 osd.3 since back
2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012915 7f3dcfe94700
-1 osd.8 344 heartbeat_check: no reply from 0x563e4214d410 osd.5 since back
2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012927 7f3dcfe94700
-1 osd.8 344 heartbeat_check: no reply from 0x563e4214e490 osd.6 since back
2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012934 7f3dcfe94700
-1 osd.8 344 heartbeat_check: no reply from 0x563e42149a10 osd.7 since back
2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:04.012854)
Feb  8 09:31:25 empire-ceph02 ceph-osd: 2017-02-08 09:31:25.013038 7f3dcfe94700
-1 osd.8 345 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back
2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
(cutoff 2017-02-08 09:31:05.013020)


Is this a setting issue? Or Maybe a bug?
When I look at the other ceph-mon processes on other nodes, they aren’t using
any swap, and only about 500MB of memory.

When I restart ceph-mds on the server that shows the issue, the swap frees up,
and the memory for the new ceph-mon is 500MB again.

Any ideas would be appreciated.


Sent from Mail for Windows 10

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/li

Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread Brian Andrus
On Thu, Feb 9, 2017 at 9:12 AM, David Turner  wrote:

> When we upgraded to Jewel 10.2.3 from Hammer 0.94.7 in our QA cluster we
> had issues with client incompatibility.  We first tried upgrading our
> clients before upgrading the cluster.  This broke creating RBDs, cloning
> RBDs, and probably many other things.  We quickly called that test a wash
> and redeployed the cluster back to 0.94.7 and redid the upgrade by
> partially upgrading the cluster, testing, fully upgrading the cluster,
> testing, and finally upgraded the clients to Jewel.  This worked with no
> issues creating RBDs, cloning, snapshots, deleting, etc.
>
> I'm not sure if there was a previous reason that we decided to always
> upgrade the clients first.  It might have had to do with the upgrade from
> Firefly to Hammer.  It's just something we always test now, especially with
> full version upgrades.  That being said, making sure that there is a client
> that was regression tested throughout the cluster upgrade would be great to
> have in the release notes.
>

I agree - it would have been nice to have this in the release notes,
however we only hit it because we're hyperconverged (clients using Jewel
against a Hammer cluster that hasn't yet had daemons restarted). We are
fixing it by setting rbd_default_features = 3 in our upcoming upgrade. We
will then unset it once the cluster is running Jewel.


>
> On Thu, Feb 9, 2017 at 7:29 AM Sage Weil  wrote:
>
>> On Thu, 9 Feb 2017, David Turner wrote:
>> > The only issue I can think of is if there isn't a version of the clients
>> > fully tested to work with a partially upgraded cluster or a documented
>> > incompatibility requiring downtime. We've had upgrades where we had to
>> > upgrade clients first and others that we had to do the clients last due
>> to
>> > issues with how the clients interacted with an older cluster, partially
>> > upgraded cluster, or newer cluster.
>>
>> We maintain client compatibiltity across *many* releases and several
>> years.  In general this under the control of the administrator via their
>> choice of CRUSH tunables, which effectively let you choose the oldest
>> client you'd like to support.
>>
>> I'm curious which upgrade you had problems with?  Generally speaking the
>> only "client" upgrade ordering issue is with the radosgw clients, which
>> need to be upgraded after the OSDs.
>>
>> > If the FileStore is changing this much, I can imagine a Jewel client
>> having
>> > a hard time locating the objects it needs from a Luminous cluster.
>>
>> In this case the change would be internal to a single OSD and have no
>> effect on the client/osd interaction or placement of objects.
>>
>> sage
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG stuck peering after host reboot

2017-02-09 Thread george.vasilakakos
OK, I've had a look.

Haven't been able to take a proper look at the network yet but here's what I've 
gathered on other fronts so far:

* Marking either osd.595 or osd.7 out results in this:

$ ceph health detail | grep -v stuck | grep 1.323
pg 1.323 is remapped+peering, acting 
[2147483647,1391,240,127,937,362,267,320,7,634,716]

The only way to fix this is to restart 595 and 1391 a couple times. Then you 
get a proper set with 595(0) and a peering state as opposed to remapped+peering.

* I have looked through the PG mappings and
** PG 1.323 is the only PG which has both 595 and 7 in its acting set.
** there are 218 PGs which have OSDs that live on both the hosts that 595 and 7 
live on

Given the above information I'm very inclined to think it's a network issue. If 
it were I'd expect at least another PG that requires the same network path to 
work to be failing. 

As it stands this persists after:

* having restarted all OSDs in the acting set with 
osd_find_best_info_ignore_history_les = true
* having restarted both hosts that the OSDs failing to talk to each other live 
on
* marking either OSD out and allowing recovery to finish

Also worth noting that after multiple restarts, osd.595 is still not responding 
to `ceph tell osd.595`  and `ceph pg 1.323 query`.


Anybody have any clue as to what this might be?


George



From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of 
george.vasilaka...@stfc.ac.uk [george.vasilaka...@stfc.ac.uk]
Sent: 08 February 2017 18:32
To: gfar...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG stuck peering after host reboot

Hey Greg,

Thanks for your quick responses. I have to leave the office now but I'll look 
deeper into it tomorrow to try and understand what's the cause of this. I'll 
try to find other peerings between these two hosts and check those OSDs' logs 
for potential anomalies. I'll also have a look at any potential configuration 
changes that might have affected the host post-reboot.

I'll be back here with more info once I have it tomorrow.

Thanks again!

George

From: Gregory Farnum [gfar...@redhat.com]
Sent: 08 February 2017 18:29
To: Vasilakakos, George (STFC,RAL,SC)
Cc: Ceph Users
Subject: Re: [ceph-users] PG stuck peering after host reboot

On Wed, Feb 8, 2017 at 10:25 AM,   wrote:
> Hi Greg,
>
>> Yes, "bad crc" indicates that the checksums on an incoming message did
>> not match what was provided — ie, the message got corrupted. You
>> shouldn't try and fix that by playing around with the peering settings
>> as it's not a peering bug.
>> Unless there's a bug in the messaging layer causing this (very
>> unlikely), you have bad hardware or a bad network configuration
>> (people occasionally talk about MTU settings?). Fix that and things
>> will work; don't and the only software tweaks you could apply are more
>> likely to result in lost data than a happy cluster.
>> -Greg
>
>
> I thought of the network initially but I didn't observe packet loss between 
> the two hosts and neither host is having trouble talking to the rest of its 
> peers. It's these two OSDs that can't talk to each other so I figured it's 
> not likely to be a network issue. Network monitoring does show virtually 
> non-existent inbound traffic over those links compared to the other ports on 
> the switch but no other peerings fail.
>
> Is there something you can suggest to do to drill down deeper?

Sadly no. It being a single route is indeed weird and hopefully
somebody with more networking background can suggest a cause. :)

> Also, am I correct in assuming that I can pull one of these OSDs from the 
> cluster as a last resort to cause a remapping to a different to potentially 
> give this a quick/temp fix and get the cluster serving I/O properly again?

I'd expect so!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure Profile Update

2017-02-09 Thread Lazuardi Nasution
Hi,

I'm looking for the way to do erasure profile update regarding to nodes
addition. Let's say on the first I have 5 OSD nodes with 3+2 erasure
profile so all chunks, including the code chunks, will be spread on every
OSD nodes. In the future, let's say I add 2 OSD nodes and I wan to have 5+2
erasure profile to have all chunks be spread on every OSD nodes too. How
can I do that without interrupting Ceph service to clients?

Best regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Benjeman Meekhof
Hi all,

We're doing some stress testing with clients hitting our rados gw
nodes with simultaneous connections.  When the number of client
connections exceeds about 5400 we start seeing 403 forbidden errors
and log messages like the following:

2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
too big now=2017-02-09 08:53:16.00 req_time=2017-02-09
08:37:18.00

This is version 10.2.5 using embedded civetweb.  There's just one
instance per node, and they all start generating 403 errors and the
above log messages when enough clients start hitting them.  The
hardware is not being taxed at all, negligible load and network
throughput.   OSD don't show any appreciable increase in CPU load or
io wait on journal/data devices.  Unless I'm missing something it
looks like the RGW is just not scaling to fill out the hardware it is
on.

Does anyone have advice on scaling RGW to fully utilize a host?

thanks,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Mark Nelson
I'm not really an RGW expert, but I'd suggest increasing the 
"rgw_thread_pool_size" option to something much higher than the default 
100 threads if you haven't already.  RGW requires at least 1 thread per 
client connection, so with many concurrent connections some of them 
might end up timing out.  You can scale the number of threads and even 
the number of RGW instances on a single server, but at some point you'll 
run out of threads at the OS level.  Probably before that actually 
happens though, you'll want to think about multiple RGW gateway nodes 
behind a load balancer.  Afaik that's how the big sites do it.


I believe some folks are considering trying to migrate rgw to a 
threadpool/event processing model but it sounds like it would be quite a 
bit of work.


Mark

On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:

Hi all,

We're doing some stress testing with clients hitting our rados gw
nodes with simultaneous connections.  When the number of client
connections exceeds about 5400 we start seeing 403 forbidden errors
and log messages like the following:

2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
too big now=2017-02-09 08:53:16.00 req_time=2017-02-09
08:37:18.00

This is version 10.2.5 using embedded civetweb.  There's just one
instance per node, and they all start generating 403 errors and the
above log messages when enough clients start hitting them.  The
hardware is not being taxed at all, negligible load and network
throughput.   OSD don't show any appreciable increase in CPU load or
io wait on journal/data devices.  Unless I'm missing something it
looks like the RGW is just not scaling to fill out the hardware it is
on.

Does anyone have advice on scaling RGW to fully utilize a host?

thanks,
Ben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure Profile Update

2017-02-09 Thread David Turner
The erasure profile cannot be changed on a pool.  If you want to change the 
profile to be 5+2 instead of 3+2, then you need to create a new pool with the 
new profile and migrate your data to it.



[cid:imaged817a3.JPG@663a9504.4baf2b32]   David 
Turner | Cloud Operations Engineer | StorageCraft Technology 
Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Lazuardi 
Nasution [mrxlazuar...@gmail.com]
Sent: Thursday, February 09, 2017 11:18 AM
To: Ceph Users
Subject: [ceph-users] Erasure Profile Update

Hi,

I'm looking for the way to do erasure profile update regarding to nodes 
addition. Let's say on the first I have 5 OSD nodes with 3+2 erasure profile so 
all chunks, including the code chunks, will be spread on every OSD nodes. In 
the future, let's say I add 2 OSD nodes and I wan to have 5+2 erasure profile 
to have all chunks be spread on every OSD nodes too. How can I do that without 
interrupting Ceph service to clients?

Best regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4

2017-02-09 Thread Jim Kilborn
Graham,

I don’t think this is the issue I’m seeing. I’m running Centos on kernel 
4.4.24-1. My processes aren’t dying.



I have two clusters with 3 mons in each cluster. Over the last 3 months that 
the clusters have been running, this is only happened on two nodes, and only 
once per node.



If I check the other nodes (or any nodes at this point), I see zero swap used, 
as in the example below.



[jkilborn@darkjedi-ceph02 ~]$ free -h

  totalusedfree  shared  buff/cache   available

Mem:   125G 10G 85G129M 28G108G

Swap:  2.0G  0B2.0G





These mon nodes are also running 8 osds each with ssd journals.

We have very little load at this point. Even when the ceph-mon process eats all 
the swap, it still shows free memory, and never goes offline.



  totalusedfree  shared  buff/cache   available

Mem:  1317838766761800013383516   538685078236061599096>

Swap:   2097148 2097092  56



Seems like a ceph-mon bug/leak to me.





Sent from Mail for Windows 10



From: Graham Allan
Sent: Thursday, February 9, 2017 11:24 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4



I've been trying to figure out the same thing recently - I had the same
issues as others with jewel 10.2.3 (?) but for my current problem I
don't think it's a ceph issue.

Specifically ever since our last maintenance day, some of our OSD nodes
having been suffering OSDs killed by OOM killer despite having enough
memory.

I looked for ages at the discussions about reducing the map cache size
but it just didn't seem a likely cause.

It looks like a kernel bug. Here for ubuntu:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842

I was seeing this OOM issue on kernels 4.4.0.59 and 4.4.0.62. It sounds
like downgrading into 4.4.0.57 should resolve the issue, and 4.4.0.63
out shortly should also fix it.

Our unaffected machines in the cluster are running a different release
and kernel (though same version of ceph).

Haven't actually tested this yet, just found the reference in the last
hour... could this also be the problem you are seeing?

Graham

On 2/8/2017 6:58 PM, Andrei Mikhailovsky wrote:
> +1
>
> Ever since upgrading to 10.2.x I have been seeing a lot of issues with our 
> ceph cluster. I have been seeing osds down, osd servers running out of memory 
> and killing all ceph-osd processes. Again, 10.2.5 on 4.4.x kernel.
>
> It seems what with every release there are more and more problems with ceph 
> (((, which is a shame.
>
> Andrei
>
> - Original Message -
>> From: "Jim Kilborn" 
>> To: "ceph-users" 
>> Sent: Wednesday, 8 February, 2017 19:45:58
>> Subject: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel  4.4
>
>> I have had two ceph monitor nodes generate swap space alerts this week.
>> Looking at the memory, I see ceph-mon using a lot of memory and most of the 
>> swap
>> space. My ceph nodes have 128GB mem, with 2GB swap  (I know the memory/swap
>> ratio is odd)
>>
>> When I get the alert, I see the following
>>
>>
>> root@empire-ceph02 ~]# free
>>
>>  totalusedfree  shared  buff/cache   
>> available
>>
>> Mem:  1317838766761800013383516   5386850782360
>> 61599096
>>
>> Swap:   2097148 2097092  56
>>
>>
>>
>> root@empire-ceph02 ~]# ps -aux | egrep 'ceph-mon|MEM'
>>
>> USERPID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>>
>> ceph 174239  0.3 45.8 62812848 60405112 ?   Ssl   2016 269:08
>> /usr/bin/ceph-mon -f --cluster ceph --id empire-ceph02 --setuser ceph
>> --setgroup ceph
>>
>>
>> In the ceph-mon log, I see the following:
>>
>> Feb  8 09:31:21 empire-ceph02 ceph-mon: 2017-02-08 09:31:21.211268 
>> 7f414d974700
>> -1 lsb_release_parse - failed to call lsb_release binary with error: (12)
>> Cannot allocate memory
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012856 
>> 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e4214f090 osd.1 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012900 
>> 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e4214da10 osd.3 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012915 
>> 7f3dcfe94700
>> -1 osd.8 344 heartbeat_check: no reply from 0x563e4214d410 osd.5 since back
>> 2017-02-08 09:31:03.778901 front 2017-02-08 09:31:03.778901
>> (cutoff 2017-02-08 09:31:04.012854)
>> Feb  8 09:31:24 empire-ceph02 ceph-osd: 2017-02-08 09:31:24.012927 
>> 7f3dcfe9

[ceph-users] CephFS root squash?

2017-02-09 Thread Jim Kilborn
Does cephfs have an option for root squash, like nfs mounts do?
I am trying to figure out how to allow my users to have sudo on their 
workstation, but not have that root access to the ceph kernel mounted volume.

Can’t seem to find anything. Using cephx for the mount, but can’t find a “root 
squash” type option for mount
sudo still allows them to nuke the whole filesystem from the client.

Sent from Mail for Windows 10

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4

2017-02-09 Thread Andrei Mikhailovsky

Hi Jim,

I've got a few questions for you as it looks like we have a similar cluster for 
our ceph infrastructure. A quick overview of what we have. We are also running 
a small cluster  of 3 storage nodes (30 osds in total) and 5 clients over 
40gig/s infiniband link (ipoib). Ever since installing the cluster (back in 
2013) we have had issues with ceph stability. During the upgrade cycles (ceph 
version upgrades were applied to practically all ceph stable releases, 
including major and minor versions) the stability has varied from improving to 
some degree to being poor once again. 

The main problem that we had (up until release 10.2.x) were slow requests and 
osds being marked as down due to heartbeat. I gave up having spent tons of time 
trying to figure out the cause of the problem with folks on irc, they were 
blaming the networking issue. However, I couldn't confirm this and it doesn't 
seem to be the case. I have ran about a doze of different networking tests for 
months and none of them showed any degradation in speed, packet loss, etc. I 
even tested the initiation of around 1000 tcp connections per second over the 
course of months and not had a single packet drop or unusual delay. While the 
network tests were running the ceph cluster was still producing slow requests 
and osds being marked as down due to heartbeats. The quoted figure of 10K+ per 
year for support is not an option for us, so we ended up biting the bullet.

After the recent upgrade to 10.2.x branch, we started to face additional issues 
of osds either crashing or being killed due to the lack of memory. My guess is 
the memory leaks. Now, I think we are approaching the limit to our suffering 
with ceph and are currently investigating an alternative solution as ceph has 
proved to be unstable and unfortunately, the community support did not help to 
resolve our problems during 4 years period.

I was hoping to have some insight on your setup and configuration on both the 
client and ceph backend and also learn more about the problems you are having 
or had in the past and managed to address? Would you be willing to discuss this 
further?

Many thanks

Andrei

- Original Message -
> From: "Jim Kilborn" 
> To: "Joao Eduardo Luis" , "ceph-users" 
> 
> Sent: Thursday, 9 February, 2017 13:04:16
> Subject: Re: [ceph-users] ceph-mon memory issue jewel 10.2.5 kernel 4.4

> Joao,
> 
> Here is the information requested. Thanks for taking a look. Note that the 
> below
> is after I restarted the ceph-mon processes yesterday. If this is not
> acceptable, I will have to wait until the issue reappears. This is on a small
> cluster. 4 ceph nodes, and 6 ceph kernel clients running over infiniband.
> 
> 
> 
> [root@empire-ceph02 log]# ceph -s
> 
>cluster 62ed97d6-adf4-12e4-8fd5-3d9701b22b87
> 
> health HEALTH_OK
> 
> monmap e3: 3 mons at
> 
> {empire-ceph01=192.168.20.241:6789/0,empire-ceph02=192.168.20.242:6789/0,empire-ceph03=192.168.20.243:6789/0}
> 
>election epoch 56, quorum 0,1,2 
> empire-ceph01,empire-ceph02,empire-ceph03
> 
>  fsmap e526: 1/1/1 up {0=empire-ceph03=up:active}, 1 up:standby
> 
> osdmap e361: 32 osds: 32 up, 32 in
> 
>flags sortbitwise,require_jewel_osds
> 
>  pgmap v2427955: 768 pgs, 2 pools, 2370 GB data, 1759 kobjects
> 
>7133 GB used, 109 TB / 116 TB avail
> 
> 768 active+clean
> 
>  client io 256 B/s wr, 0 op/s rd, 0 op/s wr
> 
> 
> 
> [root@empire-ceph02 log]# ceph daemon mon.empire-ceph02 ops
> 
> {
> 
>"ops": [],
> 
>"num_ops": 0
> 
> }
> 
> 
> 
> [root@empire-ceph02 mon]# du -sh ceph-empire-ceph02
> 
> 30M ceph-empire-ceph02
> 
> 
> 
> [root@empire-ceph02 mon]# ls -lR
> 
> .:
> 
> total 0
> 
> drwxr-xr-x. 3 ceph ceph 46 Dec  6 14:26 ceph-empire-ceph02
> 
> 
> 
> ./ceph-empire-ceph02:
> 
> total 8
> 
> -rw-r--r--. 1 ceph ceph0 Dec  6 14:26 done
> 
> -rw---. 1 ceph ceph   77 Dec  6 14:26 keyring
> 
> drwxr-xr-x. 2 ceph ceph 4096 Feb  9 06:58 store.db
> 
> 
> 
> ./ceph-empire-ceph02/store.db:
> 
> total 30056
> 
> -rw-r--r--. 1 ceph ceph  396167 Feb  9 06:06 510929.sst
> 
> -rw-r--r--. 1 ceph ceph  778898 Feb  9 06:56 511298.sst
> 
> -rw-r--r--. 1 ceph ceph 5177344 Feb  9 07:01 511301.log
> 
> -rw-r--r--. 1 ceph ceph 1491740 Feb  9 06:58 511305.sst
> 
> -rw-r--r--. 1 ceph ceph 2162405 Feb  9 06:58 511306.sst
> 
> -rw-r--r--. 1 ceph ceph 2162047 Feb  9 06:58 511307.sst
> 
> -rw-r--r--. 1 ceph ceph 2104201 Feb  9 06:58 511308.sst
> 
> -rw-r--r--. 1 ceph ceph 2146113 Feb  9 06:58 511309.sst
> 
> -rw-r--r--. 1 ceph ceph 2123659 Feb  9 06:58 511310.sst
> 
> -rw-r--r--. 1 ceph ceph 2162927 Feb  9 06:58 511311.sst
> 
> -rw-r--r--. 1 ceph ceph 2129640 Feb  9 06:58 511312.sst
> 
> -rw-r--r--. 1 ceph ceph 2133590 Feb  9 06:58 511313.sst
> 
> -rw-r--r--. 1 ceph ceph 2143906 Feb  9 06:58 511314.sst
> 
> -rw-r--r--. 1 ceph ceph 2158434 Feb  9 06:58 511315.sst
> 
> -rw-r--r--. 1 ceph ceph 1649589 Feb  9 06:58 511316.sst
> 
> 

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-09 Thread Samuel Just
Ok, https://github.com/athanatos/ceph/tree/wip-snap-trim-sleep (based on
master) passed a rados suite.  It adds a configurable limit to the number
of pgs which can be trimming on any OSD (default: 2).  PGs trimming will be
in snaptrim state, PGs waiting to trim will be in snaptrim_wait state.  I
suspect this'll be adequate to throttle the amount of trimming.  If not, I
can try to add an explicit limit to the rate at which the work items
trickle into the queue.  Can someone test this branch?   Tester beware:
this has not merged into master yet and should only be run on a disposable
cluster.
-Sam

On Tue, Feb 7, 2017 at 1:16 PM, Nick Fisk  wrote:

> Yeah it’s probably just the fact that they have more PG’s so they will
> hold more data and thus serve more IO. As they have a fixed IO limit, they
> will always hit the limit first and become the bottleneck.
>
>
>
> The main problem with reducing the filestore queue is that I believe you
> will start to lose the benefit of having IO’s queued up on the disk, so
> that the scheduler can re-arrange them to action them in the most efficient
> manor as the disk head moves across the platters. You might possibly see up
> to a 20% hit on performance, in exchange for more consistent client
> latency.
>
>
>
> *From:* Steve Taylor [mailto:steve.tay...@storagecraft.com]
> *Sent:* 07 February 2017 20:35
> *To:* n...@fisk.me.uk; ceph-users@lists.ceph.com
>
> *Subject:* RE: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
>
>
>
> Thanks, Nick.
>
>
>
> One other data point that has come up is that nearly all of the blocked
> requests that are waiting on subops are waiting for OSDs with more PGs than
> the others. My test cluster has 184 OSDs, 177 of which are 3TB, with 7 4TB
> OSDs. The cluster is well balanced based on OSD capacity, so those 7 OSDs
> individually have 33% more PGs than the others and are causing almost all
> of the blocked requests. It appears that maps updates are generally not
> blocking long enough to show up as blocked requests.
>
>
>
> I set the reweight on those 7 OSDs to 0.75 and things are backfilling now.
> I’ll test some more when the PG counts per OSD are more balanced and see
> what I get. I’ll also play with the filestore queue. I was telling some of
> my colleagues yesterday that this looked likely to be related to buffer
> bloat somewhere. I appreciate the suggestion.
>
>
> --
>
>
> 
>
> *Steve* *Taylor* | Senior Software Engineer | StorageCraft Technology
> Corporation
> 
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> *Office: *801.871.2799 <(801)%20871-2799> |
> --
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
> --
>
> *From:* Nick Fisk [mailto:n...@fisk.me.uk]
> *Sent:* Tuesday, February 7, 2017 10:25 AM
> *To:* Steve Taylor ;
> ceph-users@lists.ceph.com
> *Subject:* RE: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
>
>
>
> Hi Steve,
>
>
>
> From what I understand, the issue is not with the queueing in Ceph, which
> is correctly moving client IO to the front of the queue. The problem lies
> below what Ceph controls, ie the scheduler and disk layer in Linux. Once
> the IO’s leave Ceph it’s a bit of a free for all and the client IO’s tend
> to get lost in large disk queues surrounded by all the snap trim IO’s.
>
>
>
> The workaround Sam is working on will limit the amount of snap trims that
> are allowed to run, which I believe will have a similar effect to the sleep
> parameters in pre-jewel clusters, but without pausing the whole IO thread.
>
>
>
> Ultimately the solution requires Ceph to be able to control the queuing of
> IO’s at the lower levels of the kernel. Whether this is via some sort of
> tagging per IO (currently CFQ is only per thread/process) or some other
> method, I don’t know. I was speaking to Sage and he thinks the easiest
> method might be to shrink the filestore queue so that you don’t get buffer
> bloat at the disk level. You should be able to test this out pretty easily
> now by changing the parameter, probably around a queue of 5-10 would be
> about right for spinning disks. It’s a trade off of peak throughput vs
> queue latency though.
>
>
>
> Nick
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com
> ] *On Behalf Of *Steve Taylor
> *Sent:* 07 February 2017 17:01
> *To:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG duri

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-09 Thread Nick Fisk
Building now

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Samuel 
Just
Sent: 09 February 2017 19:22
To: Nick Fisk 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

 

Ok, https://github.com/athanatos/ceph/tree/wip-snap-trim-sleep (based on 
master) passed a rados suite.  It adds a configurable limit to the number of 
pgs which can be trimming on any OSD (default: 2).  PGs trimming will be in 
snaptrim state, PGs waiting to trim will be in snaptrim_wait state.  I suspect 
this'll be adequate to throttle the amount of trimming.  If not, I can try to 
add an explicit limit to the rate at which the work items trickle into the 
queue.  Can someone test this branch?   Tester beware: this has not merged into 
master yet and should only be run on a disposable cluster.

-Sam

 

On Tue, Feb 7, 2017 at 1:16 PM, Nick Fisk mailto:n...@fisk.me.uk> > wrote:

Yeah it’s probably just the fact that they have more PG’s so they will hold 
more data and thus serve more IO. As they have a fixed IO limit, they will 
always hit the limit first and become the bottleneck.

 

The main problem with reducing the filestore queue is that I believe you will 
start to lose the benefit of having IO’s queued up on the disk, so that the 
scheduler can re-arrange them to action them in the most efficient manor as the 
disk head moves across the platters. You might possibly see up to a 20% hit on 
performance, in exchange for more consistent client latency. 

 

From: Steve Taylor [mailto:steve.tay...@storagecraft.com 
 ] 
Sent: 07 February 2017 20:35
To: n...@fisk.me.uk  ; ceph-users@lists.ceph.com 
 


Subject: RE: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

 

Thanks, Nick.

 

One other data point that has come up is that nearly all of the blocked 
requests that are waiting on subops are waiting for OSDs with more PGs than the 
others. My test cluster has 184 OSDs, 177 of which are 3TB, with 7 4TB OSDs. 
The cluster is well balanced based on OSD capacity, so those 7 OSDs 
individually have 33% more PGs than the others and are causing almost all of 
the blocked requests. It appears that maps updates are generally not blocking 
long enough to show up as blocked requests.

 

I set the reweight on those 7 OSDs to 0.75 and things are backfilling now. I’ll 
test some more when the PG counts per OSD are more balanced and see what I get. 
I’ll also play with the filestore queue. I was telling some of my colleagues 
yesterday that this looked likely to be related to buffer bloat somewhere. I 
appreciate the suggestion.

 

  _  


 

 

Steve Taylor | Senior Software Engineer |  

 StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799   | 

  _  


If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

  _  

From: Nick Fisk [mailto:n...@fisk.me.uk  ] 
Sent: Tuesday, February 7, 2017 10:25 AM
To: Steve Taylor mailto:steve.tay...@storagecraft.com> >; ceph-users@lists.ceph.com 
 
Subject: RE: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

 

Hi Steve,

 

>From what I understand, the issue is not with the queueing in Ceph, which is 
>correctly moving client IO to the front of the queue. The problem lies below 
>what Ceph controls, ie the scheduler and disk layer in Linux. Once the IO’s 
>leave Ceph it’s a bit of a free for all and the client IO’s tend to get lost 
>in large disk queues surrounded by all the snap trim IO’s.

 

The workaround Sam is working on will limit the amount of snap trims that are 
allowed to run, which I believe will have a similar effect to the sleep 
parameters in pre-jewel clusters, but without pausing the whole IO thread.

 

Ultimately the solution requires Ceph to be able to control the queuing of IO’s 
at the lower levels of the kernel. Whether this is via some sort of tagging per 
IO (currently CFQ is only per thread/process) or some other method, I don’t 
know. I was speaking to Sage and he thinks the easiest method might be to 
shrink the filestore queue so that you don’t get buffer bloat at the disk 
level. You should be able to test this out pretty easily now by changing the 
parameter, probably around a queue of 5-10 would be about right for spinning 
disks. It’s a trade of

[ceph-users] OSDs stuck unclean

2017-02-09 Thread Craig Read
We have 4 OSDs in test environment that are all stuck unclean

I've tried rebuilding the whole environment with the same result.

OSDs are running on XFS disk, partition 1 is OSD, partition 2 is journal

Also seeing degraded despite having 4 OSDs and a default osd pool of 2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Wido den Hollander

> Op 9 februari 2017 om 19:34 schreef Mark Nelson :
> 
> 
> I'm not really an RGW expert, but I'd suggest increasing the 
> "rgw_thread_pool_size" option to something much higher than the default 
> 100 threads if you haven't already.  RGW requires at least 1 thread per 
> client connection, so with many concurrent connections some of them 
> might end up timing out.  You can scale the number of threads and even 
> the number of RGW instances on a single server, but at some point you'll 
> run out of threads at the OS level.  Probably before that actually 
> happens though, you'll want to think about multiple RGW gateway nodes 
> behind a load balancer.  Afaik that's how the big sites do it.
> 

In addition, have you tried to use more RADOS handles?

rgw_num_rados_handles = 8

That with more RGW threads as Mark mentioned.

Wido

> I believe some folks are considering trying to migrate rgw to a 
> threadpool/event processing model but it sounds like it would be quite a 
> bit of work.
> 
> Mark
> 
> On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:
> > Hi all,
> >
> > We're doing some stress testing with clients hitting our rados gw
> > nodes with simultaneous connections.  When the number of client
> > connections exceeds about 5400 we start seeing 403 forbidden errors
> > and log messages like the following:
> >
> > 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
> > too big now=2017-02-09 08:53:16.00 req_time=2017-02-09
> > 08:37:18.00
> >
> > This is version 10.2.5 using embedded civetweb.  There's just one
> > instance per node, and they all start generating 403 errors and the
> > above log messages when enough clients start hitting them.  The
> > hardware is not being taxed at all, negligible load and network
> > throughput.   OSD don't show any appreciable increase in CPU load or
> > io wait on journal/data devices.  Unless I'm missing something it
> > looks like the RGW is just not scaling to fill out the hardware it is
> > on.
> >
> > Does anyone have advice on scaling RGW to fully utilize a host?
> >
> > thanks,
> > Ben
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs stuck unclean

2017-02-09 Thread Shinobu Kinjo
4 OSD nodes or daemons?

please:

 * ceph -v
 * ceph -s
 * ceph osd tree


On Fri, Feb 10, 2017 at 5:26 AM, Craig Read  wrote:
> We have 4 OSDs in test environment that are all stuck unclean
>
>
>
> I’ve tried rebuilding the whole environment with the same result.
>
>
>
> OSDs are running on XFS disk, partition 1 is OSD, partition 2 is journal
>
>
>
> Also seeing degraded despite having 4 OSDs and a default osd pool of 2
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw scaling recommendation?

2017-02-09 Thread Ben Hines
I'm curious how does the num_threads option to civetweb relate to the 'rgw
thread pool size'?  Should i make them equal?

ie:

rgw frontends = civetweb enable_keep_alive=yes port=80 num_threads=125
error_log_file=/var/log/ceph/civetweb.error.log
access_log_file=/var/log/ceph/civetweb.access.log


-Ben

On Thu, Feb 9, 2017 at 12:30 PM, Wido den Hollander  wrote:

>
> > Op 9 februari 2017 om 19:34 schreef Mark Nelson :
> >
> >
> > I'm not really an RGW expert, but I'd suggest increasing the
> > "rgw_thread_pool_size" option to something much higher than the default
> > 100 threads if you haven't already.  RGW requires at least 1 thread per
> > client connection, so with many concurrent connections some of them
> > might end up timing out.  You can scale the number of threads and even
> > the number of RGW instances on a single server, but at some point you'll
> > run out of threads at the OS level.  Probably before that actually
> > happens though, you'll want to think about multiple RGW gateway nodes
> > behind a load balancer.  Afaik that's how the big sites do it.
> >
>
> In addition, have you tried to use more RADOS handles?
>
> rgw_num_rados_handles = 8
>
> That with more RGW threads as Mark mentioned.
>
> Wido
>
> > I believe some folks are considering trying to migrate rgw to a
> > threadpool/event processing model but it sounds like it would be quite a
> > bit of work.
> >
> > Mark
> >
> > On 02/09/2017 12:25 PM, Benjeman Meekhof wrote:
> > > Hi all,
> > >
> > > We're doing some stress testing with clients hitting our rados gw
> > > nodes with simultaneous connections.  When the number of client
> > > connections exceeds about 5400 we start seeing 403 forbidden errors
> > > and log messages like the following:
> > >
> > > 2017-02-09 08:53:16.915536 7f8c667bc700 0 NOTICE: request time skew
> > > too big now=2017-02-09 08:53:16.00 req_time=2017-02-09
> > > 08:37:18.00
> > >
> > > This is version 10.2.5 using embedded civetweb.  There's just one
> > > instance per node, and they all start generating 403 errors and the
> > > above log messages when enough clients start hitting them.  The
> > > hardware is not being taxed at all, negligible load and network
> > > throughput.   OSD don't show any appreciable increase in CPU load or
> > > io wait on journal/data devices.  Unless I'm missing something it
> > > looks like the RGW is just not scaling to fill out the hardware it is
> > > on.
> > >
> > > Does anyone have advice on scaling RGW to fully utilize a host?
> > >
> > > thanks,
> > > Ben
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS root squash?

2017-02-09 Thread Gregory Farnum
On Thu, Feb 9, 2017 at 11:11 AM, Jim Kilborn  wrote:
> Does cephfs have an option for root squash, like nfs mounts do?
> I am trying to figure out how to allow my users to have sudo on their 
> workstation, but not have that root access to the ceph kernel mounted volume.
>
> Can’t seem to find anything. Using cephx for the mount, but can’t find a 
> “root squash” type option for mount
> sudo still allows them to nuke the whole filesystem from the client.

The CephX security capabilities let you specify what uid/gid the
client is allowed to operate as. Looks like
http://docs.ceph.com/docs/master/cephfs/client-auth/ doesn't include
that :/ but the syntax would just be
"allow rw path=/foo uid=1 gids=1,2"
That lets a specified client read and write data only within the
"/foo" directory, and only while acting as user 1 with groups 1 and 2.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs stuck unclean

2017-02-09 Thread Shinobu Kinjo
Please provide us with crushmap

 * sudo ceph osd getcrushmap -o crushmap.`date +%Y%m%d%H`

On Fri, Feb 10, 2017 at 5:46 AM, Craig Read  wrote:
> Sorry, 2 nodes, 6 daemons (forgot I added 2 daemons to see if it made a 
> difference)
>
> On CentOS7
>
> Ceph -v:
>
> 10.2.5
>
> Ceph -s:
>
> Health HEALTH_WARN
> 64 pgs stuck unclean
> Too few PGs per OSD (21 < min 30)
> Monmap e1: 1 mons at {=:6789/0}
> Election epoch 3, quorum 0 
> Osdmap e89: 6 osds: 6 up, 6 in; 64 remapped pgs
> Flags sortbitwise,require_jewel_osds
> Pgmap v263: 64pgs, 1 pools, 0 bytes data, 0 objects
> 209 MB used, 121GB / 121GB avail
> 32 active+remapped
> 32 active
>
> Ceph osd tree:
>
> -1 0.11899 root default
> -2 0.05949  Host 1:
>  0 0.00490  Osd.0   up 1.0  1.0
>  3 0.01070  Osd.3   up 1.0  1.0
>  4 0.04390  Osd.4   up.10   1.0
>
> -3 0.05949  Host 2:
>  1 0.00490  Osd.1   up 1.0  1.0
>  2 0.01070  Osd.2   up 1.0  1.0
>  5 0.04390  Osd.5   up1.0   1.0
>
>
> Appreciate your help
>
> Craig
>
> -Original Message-
> From: Shinobu Kinjo [mailto:ski...@redhat.com]
> Sent: Thursday, February 9, 2017 2:34 PM
> To: Craig Read 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] OSDs stuck unclean
>
> 4 OSD nodes or daemons?
>
> please:
>
>  * ceph -v
>  * ceph -s
>  * ceph osd tree
>
>
> On Fri, Feb 10, 2017 at 5:26 AM, Craig Read  wrote:
>> We have 4 OSDs in test environment that are all stuck unclean
>>
>>
>>
>> I’ve tried rebuilding the whole environment with the same result.
>>
>>
>>
>> OSDs are running on XFS disk, partition 1 is OSD, partition 2 is journal
>>
>>
>>
>> Also seeing degraded despite having 4 OSDs and a default osd pool of 2
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW: No caching when S3 tokens are validated against Keystone?

2017-02-09 Thread Simon Leinen
We're using the Hammer version of RadosGW, with Keystone for authN/Z.
When a user started sending a lot of S3 requests (using rclone), the
load on our Keystone service has skyrocketed.

This surprised me because all those requests are from the same user, and
RadosGW has caching for Keystone tokens.  But looking at the code, this
caching only seems to be used by rgw_swift.cc, not by rgw_rest_s3.cc.
That would explain why no caching is going on here.

Can anyone confirm?

And if so, is there a fundamental problem that makes it hard to use
caching when validating S3 tokens against a Keystone backend?

(Otherwise I guess I should write a feature request and/or start coding
this up myself...)

Here are the facts for background:

$ sudo ceph --admin-daemon /var/run/ceph/ceph-radosgw.gateway.asok config 
show | grep keystone
"rgw_keystone_url": "https:\/\/...",
"rgw_keystone_admin_token": "...",
"rgw_keystone_admin_user": "",
"rgw_keystone_admin_password": "",
"rgw_keystone_admin_tenant": "",
"rgw_keystone_accepted_roles": "_member_, ResellerAdmin",
"rgw_keystone_token_cache_size": "1",
"rgw_keystone_revocation_interval": "900",
"rgw_s3_auth_use_keystone": "true",

$ sudo ceph --admin-daemon /var/run/ceph/ceph-radosgw.gateway.asok perf 
dump | grep token_cache
"keystone_token_cache_hit": 0,
"keystone_token_cache_miss": 0

When I turn on debugging (config set debug_rgw 20/20), I get many
messages like these:

2017-02-09 21:50:06.606216 7f6ac5d83700  5 s3 keystone: validated token: 
: expires: 1486680606
2017-02-09 21:50:06.635940 7f6aac550700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:06.747616 7f6aadd53700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:06.818267 7f6ac2d7d700  5 s3 keystone: validated token: 
: expires: 1486680606
2017-02-09 21:50:06.853492 7f6ab3d5f700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:06.895471 7f6ac5582700  5 s3 keystone: validated token: 
: expires: 1486680606
2017-02-09 21:50:06.951734 7f6abf576700  5 s3 keystone: validated token: 
: expires: 1486680606
2017-02-09 21:50:07.016555 7f6ab7566700  5 s3 keystone: validated token: 
: expires: 1486680606
2017-02-09 21:50:07.038997 7f6ab355e700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:07.160196 7f6ac1d7b700  5 s3 keystone: validated token: 
: expires: 1486680606
2017-02-09 21:50:07.189930 7f6aaf556700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:07.233593 7f6aabd4f700  5 s3 keystone: validated token: 
: expires: 1486680607
2017-02-09 21:50:07.263116 7f6abcd71700  5 s3 keystone: validated token: 
: expires: 1486680607
2017-02-09 21:50:07.263915 7f6ab8d69700  5 s3 keystone: validated token: 
: expires: 1486680607
2017-02-09 21:50:07.263990 7f6aae554700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:07.280523 7f6ab2d5d700  5 s3 keystone: validated token: 
: expires: 1486680607
2017-02-09 21:50:07.290892 7f6aa954a700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:07.311201 7f6ab6d65700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:07.317667 7f6aad552700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:07.380957 7f6ab6564700  5 s3 keystone: validated token: 
: expires: 1486680607
2017-02-09 21:50:07.421227 7f6abd572700  5 s3 keystone: validated token: 
: expires: 1486680607
2017-02-09 21:50:07.446867 7f6ab0d59700 20 s3 keystone: trying keystone auth
2017-02-09 21:50:07.459225 7f6aa9d4b700 20 s3 keystone: trying keystone auth

and, as I said, our Keystone service is pretty much DoSed right now...
-- 
Simon.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] trying to test S3 bucket lifecycles in Kraken

2017-02-09 Thread Uwe Mesecke
Hey,

I am trying to do some testing of S3 bucket lifecycles in Kraken but I am 
unable to setup a lifecycle. RGW always returns "501 Not Implemented“.

>>>
PUT /pdtest_expire_test?lifecycle HTTP/1.1
[…]


http://s3.amazonaws.com/doc/2006-03-01/";>testEnabled1
>>>
<<<
HTTP/1.1 501 Not Implemented
[…]

NotImplementedtx01e40-00589cff17-6f1d60-default6f1d60-default-default
<<<

The cluster is running version 11.2.0 and was just upgraded from jewel. The 
client used is the PHP aws-sdk. I already double checked the version of the 
running rgw instances and they are all at 11.2.0.

After increasing the log level in one rgw instance I can see following lines:

2017-02-10 01:08:51.225783 7fdc60167700 10 delaying v4 auth
2017-02-10 01:08:51.225785 7fdc60167700 10 ERROR: AWS4 completion for this 
operation NOT IMPLEMENTED
2017-02-10 01:08:51.225788 7fdc60167700 10 failed to authorize request
2017-02-10 01:08:51.225789 7fdc60167700 20 handler->ERRORHANDLER: err_no=-2201 
new_err_no=-2201

Does that mean that we cannot put lifecycles using V4 signature requests? PHP 
aws-sdk seems to have dropped support for earlier signature versions a while 
ago, at least for S3.

Uwe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 2 of 3 monitors down and to recover

2017-02-09 Thread 云平台事业部
Hey,
I tried to simulate the failure of 2 monitors including their monmap, and to 
bring them up in my testing cluster.
The ceph version is 10.2.5, the OS is REHL7.2, and the testing cluster has 3 
nodes with 3 monitors and 24 osds, each node has 1 monitor and 8 osds.
So, I stopped the 2 (the mon.a and mon.c) of 3 ceph-mon daemon and delete all 
file of directory /var/lib/ceph/mon/ceph-a/store.db, and then I tried to 
recover the ceph-mon daemon referring to the ceph documents, but there is 
something unexcepted happened. The recovery procedure is as following:
To grab the monmap from another monitor (stop the monitor daemon before 
extract):
# ceph mon �Ci b �Cextract-monmap /tmp/monmap
The 2 ceph-mon are down, so I do need to stop the monitor.
To inject the monmap:
# ceph-mon �Ci a �Cinject-monmap /tmp/monmap
There is an error after that:
Invalid argument: /var/lib/ceph/mon/ceph-a/store.db: does not exist 
(create_if_missing is false)……
Error opening mon data directory at ‘/var/lib/ceph/mon/ceph-a’: (22) Invalid 
argument.

But the directory /var/lib/ceph/ceph-a is exist and the owner is ceph:ceph, why 
does it happen?
And is my simulation and recovery procedure right?

Best regards,
He taotao
EMAIL:hetaotao...@pingan.com.cn



The information in this email is confidential and may be legally privileged. If 
you have received this email in error or are not the intended recipient, please 
immediately notify the sender and delete this message from your computer. Any 
use, distribution, or copying of this email other than by the intended 
recipient is strictly prohibited. All messages sent to and from us may be 
monitored to ensure compliance with internal policies and to protect our 
business.
Emails are not secure and cannot be guaranteed to be error free as they can be 
intercepted, amended, lost or destroyed, or contain viruses. Anyone who 
communicates with us by email is taken to accept these risks.

收发邮件者请注意:
本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] I can't create new pool in my cluster.

2017-02-09 Thread 周威
The version I'm using is 0.94.9

And when I want to create a pool, It shows:

Error EINVAL: error running crushmap through crushtool: (1) Operation
not permitted

What's wrong about this?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] reference documents of cbt(ceph benchmarking tool)

2017-02-09 Thread mazhongming
Hi guys,
So I was investigating on benchmark tool for ceph. CBT seems to be a good 
candidate.
But the documents on github is limited.Regarding using this tool on existing 
cluster,is there any specific documents for the procedure.
And I'm trying to use it on ubuntu 14.04.I don't know whether this tool is 
suitable for it.
Does anybody try it on above environment?Any advices or suggestion will be 
appreciated.Any document sharing will be awesome.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I can't create new pool in my cluster.

2017-02-09 Thread Shinobu Kinjo
What did you exactly do?

On Fri, Feb 10, 2017 at 11:48 AM, 周威  wrote:
> The version I'm using is 0.94.9
>
> And when I want to create a pool, It shows:
>
> Error EINVAL: error running crushmap through crushtool: (1) Operation
> not permitted
>
> What's wrong about this?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I can't create new pool in my cluster.

2017-02-09 Thread choury
# ceph osd pool create test 128
Error EINVAL: error running crushmap through crushtool: (1) Operation
not permitted

# rados mkpool test
error creating pool test: (22) Invalid argument

2017-02-10 10:53 GMT+08:00 Shinobu Kinjo :
> What did you exactly do?
>
> On Fri, Feb 10, 2017 at 11:48 AM, 周威  wrote:
>> The version I'm using is 0.94.9
>>
>> And when I want to create a pool, It shows:
>>
>> Error EINVAL: error running crushmap through crushtool: (1) Operation
>> not permitted
>>
>> What's wrong about this?
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] I can't create new pool in my cluster.

2017-02-09 Thread choury
I can find some log in ceph-mon.log  about this:

> 2017-02-10 10:47:54.264026 7f6a6eff4700  0 mon.ceph-test2@1(peon) e9 
> handle_command mon_command({"prefix": "osd pool create", "pg_num": 128, 
> "pool": "test"} v 0) v1
> 2017-02-10 10:47:54.264132 7f6a6eff4700  0 log_channel(audit) log [INF] : 
> from='client.? 10.50.83.69:0/2498128365' entity='client.admin' 
> cmd=[{"prefix": "osd pool create", "pg_num": 128, "pool": "test"}]: dispatch

2017-02-10 10:59 GMT+08:00 choury :
> # ceph osd pool create test 128
> Error EINVAL: error running crushmap through crushtool: (1) Operation
> not permitted
>
> # rados mkpool test
> error creating pool test: (22) Invalid argument
>
> 2017-02-10 10:53 GMT+08:00 Shinobu Kinjo :
>> What did you exactly do?
>>
>> On Fri, Feb 10, 2017 at 11:48 AM, 周威  wrote:
>>> The version I'm using is 0.94.9
>>>
>>> And when I want to create a pool, It shows:
>>>
>>> Error EINVAL: error running crushmap through crushtool: (1) Operation
>>> not permitted
>>>
>>> What's wrong about this?
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 2 of 3 monitors down and to recover

2017-02-09 Thread jiajia zhong
hi taotao :)

you can follow
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/  ,
Remove the non-surviving or problematic monitors. For example ...

remember to backup the surviving monitor data before any futher.



2017-02-10 9:52 GMT+08:00 何涛涛(云平台事业部) :

> Hey,
>
> I tried to simulate the failure of 2 monitors including their monmap, and
> to bring them up in my testing cluster.
>
> The ceph version is 10.2.5, the OS is REHL7.2, and the testing cluster has
> 3 nodes with 3 monitors and 24 osds, each node has 1 monitor and 8 osds.
>
> So, I stopped the 2 (the mon.a and mon.c) of 3 ceph-mon daemon and delete
> all file of directory /var/lib/ceph/mon/ceph-a/store.db, and then I tried
> to recover the ceph-mon daemon referring to the ceph documents, but there
> is something unexcepted happened. The recovery procedure is as following:
>
> To grab the monmap from another monitor (stop the monitor daemon before
> extract):
> # ceph mon –i b –extract-monmap /tmp/monmap
>
> The 2 ceph-mon are down, so I do need to stop the monitor.
>
> To inject the monmap:
>
> # ceph-mon –i a –inject-monmap /tmp/monmap
>
> There is an error after that:
> Invalid argument: /var/lib/ceph/mon/ceph-a/store.db: does not exist
> (create_if_missing is false)……
>
> Error opening mon data directory at ‘/var/lib/ceph/mon/ceph-a’: (22)
> Invalid argument.
>
>
>
> But the directory /var/lib/ceph/ceph-a is exist and the owner is
> ceph:ceph, why does it happen?
>
> And is my simulation and recovery procedure right?
>
>
>
> Best regards,
>
> He taotao
>
> EMAIL:hetaotao...@pingan.com.cn
>
>
>
>
> 
> 
> The information in this email is confidential and may be legally
> privileged. If you have received this email in error or are not the
> intended recipient, please immediately notify the sender and delete this
> message from your computer. Any use, distribution, or copying of this email
> other than by the intended recipient is strictly prohibited. All messages
> sent to and from us may be monitored to ensure compliance with internal
> policies and to protect our business.
> Emails are not secure and cannot be guaranteed to be error free as they
> can be intercepted, amended, lost or destroyed, or contain viruses. Anyone
> who communicates with us by email is taken to accept these risks.
>
> 收发邮件者请注意:
> 本邮件含保密信息,若误收本邮件,请务必通知发送人并直接删去,不得使用、传播或复制本邮件。
> 进出邮件均受到本公司合规监控。邮件可能发生被截留、被修改、丢失、被破坏或包含计算机病毒等不安全情况。
> 
> 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Shrink cache target_max_bytes

2017-02-09 Thread Kees Meijs
Hi Cephers,

Long story short: I'd like to shrink our cache pool a little.

Is it safe to just alter cache target_max_byte and wait for objects to
get evicted? Anything to take into account?

Thanks!

Regards,
Kees

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com