Re: [ceph-users] bluestore block.db on SSD, where block.wal?

2019-06-05 Thread Félix Barbeira
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#devices

"The BlueStore journal will always be placed on the fastest device
available, so using a DB device will provide the same benefit that the
WAL device
would while *also* allowing additional metadata to be stored there (if it
will fit)."

So I guess if you only specify block.db (on faster device), block.wal it
will go into that lvm/partition.

El dom., 2 jun. 2019 a las 18:43, M Ranga Swami Reddy ()
escribió:

> Hello - I planned to use the bluestore's block.db on SSD (and data is on
> HDD) with 4% of HDD size. Here I have not mentioned the block.wal..in this
> case where block.wal place?
> is it in HDD (ie data) or in block.db of SSD?
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Radosgw in container

2019-06-05 Thread Marc Roos



Has anyone put the radosgw in a container? What files do I need to put 
in the sandbox directory? Are there other things I should consider?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to remove ceph-mgr from a node

2019-06-05 Thread Vandeir Eduardo
Hi guys,

sorry, but I'm not finding in documentation how to remove ceph-mgr
from a node. Is it possible?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove ceph-mgr from a node

2019-06-05 Thread Marc Roos
 
What is wrong with?

service ceph-mgr@c stop
systemctl disable ceph-mgr@c


-Original Message-
From: Vandeir Eduardo [mailto:vandeir.edua...@gmail.com] 
Sent: woensdag 5 juni 2019 16:44
To: ceph-users
Subject: [ceph-users] How to remove ceph-mgr from a node

Hi guys,

sorry, but I'm not finding in documentation how to remove ceph-mgr from 
a node. Is it possible?

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Single threaded IOPS on SSD pool.

2019-06-05 Thread vitalif

Ok, average network latency from VM to OSD's ~0.4ms.


It's rather bad, you can improve the latency by 0.3ms just by upgrading 
the network.



Single threaded performance ~500-600 IOPS - or average latency of 1.6ms
Is that comparable to what other are seeing?


Good "reference" numbers are 0.5ms for reads (~2000 iops) and 1ms for 
writes (~1000 iops).


I confirm that the most powerful thing to do is disabling CPU powersave 
(governor=performance + cpupower -D 0). You usually get 2x single thread 
iops at once.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw in container

2019-06-05 Thread Brett Chancellor
It works okay. You need a ceph.conf and a generic radosgw cephx key. That's
it.

On Wed, Jun 5, 2019, 5:37 AM Marc Roos  wrote:

>
>
> Has anyone put the radosgw in a container? What files do I need to put
> in the sandbox directory? Are there other things I should consider?
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Single threaded IOPS on SSD pool.

2019-06-05 Thread Marc Roos
 We have this, if it is any help

write-4k-seq: (groupid=0, jobs=1): err= 0: pid=1446964: Fri May 24 
19:41:48 2019
  write: IOPS=760, BW=3042KiB/s (3115kB/s)(535MiB/180001msec)
slat (usec): min=7, max=234, avg=16.59, stdev=13.59
clat (usec): min=786, max=167483, avg=1295.60, stdev=1933.25
 lat (usec): min=810, max=167492, avg=1312.46, stdev=1933.67
clat percentiles (usec):
 |  1.00th=[   914],  5.00th=[   979], 10.00th=[  1020], 20.00th=[  
1074],
 | 30.00th=[  1123], 40.00th=[  1172], 50.00th=[  1205], 60.00th=[  
1254],
 | 70.00th=[  1319], 80.00th=[  1401], 90.00th=[  1516], 95.00th=[  
1631],
 | 99.00th=[  2704], 99.50th=[  3949], 99.90th=[  5145], 99.95th=[  
5538],
 | 99.99th=[139461]
   bw (  KiB/s): min=  625, max= 3759, per=80.13%, avg=2436.63, 
stdev=653.68, samples=359
   iops: min=  156, max=  939, avg=608.76, stdev=163.41, 
samples=359
  lat (usec)   : 1000=7.83%
  lat (msec)   : 2=90.27%, 4=1.42%, 10=0.45%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.02%
  cpu  : usr=0.52%, sys=1.55%, ctx=162087, majf=0, minf=28
  IO depths: 1=117.6%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 issued rwt: total=0,136889,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1
randwrite-4k-seq: (groupid=1, jobs=1): err= 0: pid=1448032: Fri May 24 
19:41:48 2019
  write: IOPS=655, BW=2620KiB/s (2683kB/s)(461MiB/180001msec)
slat (usec): min=7, max=120, avg=10.79, stdev= 6.22
clat (usec): min=897, max=77251, avg=1512.76, stdev=368.36
 lat (usec): min=906, max=77262, avg=1523.77, stdev=368.54
clat percentiles (usec):
 |  1.00th=[ 1106],  5.00th=[ 1205], 10.00th=[ 1254], 20.00th=[ 
1319],
 | 30.00th=[ 1369], 40.00th=[ 1418], 50.00th=[ 1483], 60.00th=[ 
1532],
 | 70.00th=[ 1598], 80.00th=[ 1663], 90.00th=[ 1778], 95.00th=[ 
1893],
 | 99.00th=[ 2540], 99.50th=[ 2933], 99.90th=[ 3392], 99.95th=[ 
4080],
 | 99.99th=[ 6194]
   bw (  KiB/s): min= 1543, max= 2830, per=79.66%, avg=2087.02, 
stdev=396.14, samples=359
   iops: min=  385, max=  707, avg=521.39, stdev=99.06, 
samples=359
  lat (usec)   : 1000=0.06%
  lat (msec)   : 2=97.19%, 4=2.70%, 10=0.04%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu  : usr=0.39%, sys=1.13%, ctx=118477, majf=0, minf=50
  IO depths: 1=116.6%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 issued rwt: total=0,117905,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1
read-4k-seq: (groupid=2, jobs=1): err= 0: pid=1449103: Fri May 24 
19:41:48 2019
   read: IOPS=2736, BW=10.7MiB/s (11.2MB/s)(1924MiB/180001msec)
slat (usec): min=6, max=142, avg= 9.26, stdev= 5.02
clat (usec): min=152, max=13253, avg=353.73, stdev=98.92
 lat (usec): min=160, max=13262, avg=363.24, stdev=99.15
clat percentiles (usec):
 |  1.00th=[  182],  5.00th=[  215], 10.00th=[  239], 20.00th=[  
273],
 | 30.00th=[  306], 40.00th=[  330], 50.00th=[  355], 60.00th=[  
375],
 | 70.00th=[  396], 80.00th=[  420], 90.00th=[  461], 95.00th=[  
498],
 | 99.00th=[  586], 99.50th=[  635], 99.90th=[  775], 99.95th=[  
889],
 | 99.99th=[ 1958]
   bw (  KiB/s): min= 5883, max=13817, per=79.66%, avg=8720.01, 
stdev=1895.05, samples=359
   iops: min= 1470, max= 3454, avg=2179.63, stdev=473.78, 
samples=359
  lat (usec)   : 250=13.13%, 500=82.11%, 750=4.64%, 1000=0.09%
  lat (msec)   : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%
  cpu  : usr=1.31%, sys=3.69%, ctx=493433, majf=0, minf=32
  IO depths: 1=115.9%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 issued rwt: total=492640,0,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1
randread-4k-seq: (groupid=3, jobs=1): err= 0: pid=1450173: Fri May 24 
19:41:48 2019
   read: IOPS=1812, BW=7251KiB/s (7425kB/s)(1275MiB/180001msec)
slat (usec): min=6, max=161, avg=10.25, stdev= 6.37
clat (usec): min=182, max=23748, avg=538.35, stdev=136.71
 lat (usec): min=189, max=23758, avg=548.86, stdev=137.19
clat percentiles (usec):
 |  1.00th=[  265],  5.00th=[  310], 10.00th=[  351], 20.00th=[  
445],
 | 30.00th=[  494], 40.00th=[  519], 50.00th=[  537], 60.00th=[  
562],
 | 70.00th=[  594], 80.00th=[  644], 90.00th=[  701], 95.00th=[  
742],
 | 99.00th=[  816], 99.50th=[  840], 99.90th=[  914], 99.95th=[ 
1172],
 | 99.99th=[ 2442]
   bw (  KiB/s): min= 4643, 

Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-05 Thread CUZA Frédéric
Thank you all for you quick answer.
I think that will solve our problem.

This is what we came up with this :
rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
/etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test

This rbd image is a test with only 5Gb of datas inside of it.

Unfortunately the command seems to be stuck and nothing happens, both ports 
7800 / 6789 / 22.

We can't find no logs on any monitors.

Thanks !

-Message d'origine-
De : ceph-users  De la part de Jason Dillaman
Envoyé : 04 June 2019 14:14
À : 解决 
Cc : ceph-users 
Objet : Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
>
> Hi all,
> We use ceph(luminous) + openstack(queens) in my test 
> environment。The virtual machine does not start properly after the 
> disaster test and the image of virtual machine can not create snap.The 
> procedure is as follows:
> #!/usr/bin/env python
>
> import rados
> import rbd
> with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
> with cluster.open_ioctx('vms') as ioctx:
> rbd_inst = rbd.RBD()
> print "start open rbd image"
> with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') as 
> image:
> print "start create snapshot"
> image.create_snap('myimage_snap1')
>
> when i run it ,it show readonlyimage,as follows:
>
> start open rbd image
> start create snapshot
> Traceback (most recent call last):
>   File "testpool.py", line 17, in 
> image.create_snap('myimage_snap1')
>   File "rbd.pyx", line 1790, in rbd.Image.create_snap 
> (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15
> 682)
> rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1 
> from 10df4634-4401-45ca-9c57-f349b78da475_disk
>
> but i run it with admin instead of nova,it is ok.
>
> "ceph auth list"  as follow
>
> installed auth entries:
>
> osd.1
> key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.10
> key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.11
> key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.2
> key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.4
> key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.5
> key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.7
> key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> osd.8
> key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
> caps: [mon] allow profile osd
> caps: [osd] allow *
> client.admin
> key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
> auid: 0
> caps: [mds] allow
> caps: [mgr] allow *
> caps: [mon] allow *
> caps: [osd] allow *
> client.cinder
> key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow 
> rwx pool=vms-cache, allow rx pool=images, allow rx pool=images-cache 
> client.cinder-backup
> key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=backups, allow rwx pool=backups-cache client.glance
> key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=images, allow rwx pool=images-cache client.nova
> key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
> caps: [mon] allow r
> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow 
> rwx pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache 
> client.radosgw.gateway
> key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
> auid: 0
> caps: [mon] allow rwx
> caps: [osd] allow rwx
> mgr.172.30.126.26
> key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
> mgr.172.30.126.27
> key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
> mgr.172.30.126.28
> key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
> caps: [mds] allow *
> caps: [mon] allow profile mgr
> caps: [osd] allow *
>
>
> Can someone explain it to me?

Your clients don't have the correct caps. See [1] or [2].


> thanks!!
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[1] 
http://docs.ceph.com/docs/mimic/releases/luminous/#upgrade-from-jewel-or-kraken
[2] 
http://docs.ceph

Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-05 Thread Jason Dillaman
On Wed, Jun 5, 2019 at 11:26 AM CUZA Frédéric  wrote:
>
> Thank you all for you quick answer.
> I think that will solve our problem.

You might have hijacked another thread?

> This is what we came up with this :
> rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
> export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
> /etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
>
> This rbd image is a test with only 5Gb of datas inside of it.
>
> Unfortunately the command seems to be stuck and nothing happens, both ports 
> 7800 / 6789 / 22.
>
> We can't find no logs on any monitors.
>
> Thanks !
>
> -Message d'origine-
> De : ceph-users  De la part de Jason 
> Dillaman
> Envoyé : 04 June 2019 14:14
> À : 解决 
> Cc : ceph-users 
> Objet : Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]
>
> On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
> >
> > Hi all,
> > We use ceph(luminous) + openstack(queens) in my test
> > environment。The virtual machine does not start properly after the
> > disaster test and the image of virtual machine can not create snap.The
> > procedure is as follows:
> > #!/usr/bin/env python
> >
> > import rados
> > import rbd
> > with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
> > with cluster.open_ioctx('vms') as ioctx:
> > rbd_inst = rbd.RBD()
> > print "start open rbd image"
> > with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') 
> > as image:
> > print "start create snapshot"
> > image.create_snap('myimage_snap1')
> >
> > when i run it ,it show readonlyimage,as follows:
> >
> > start open rbd image
> > start create snapshot
> > Traceback (most recent call last):
> >   File "testpool.py", line 17, in 
> > image.create_snap('myimage_snap1')
> >   File "rbd.pyx", line 1790, in rbd.Image.create_snap
> > (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15
> > 682)
> > rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1
> > from 10df4634-4401-45ca-9c57-f349b78da475_disk
> >
> > but i run it with admin instead of nova,it is ok.
> >
> > "ceph auth list"  as follow
> >
> > installed auth entries:
> >
> > osd.1
> > key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.10
> > key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.11
> > key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.2
> > key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.4
> > key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.5
> > key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.7
> > key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > osd.8
> > key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
> > caps: [mon] allow profile osd
> > caps: [osd] allow *
> > client.admin
> > key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
> > auid: 0
> > caps: [mds] allow
> > caps: [mgr] allow *
> > caps: [mon] allow *
> > caps: [osd] allow *
> > client.cinder
> > key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow
> > rwx pool=vms-cache, allow rx pool=images, allow rx pool=images-cache
> > client.cinder-backup
> > key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=backups, allow rwx pool=backups-cache client.glance
> > key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=images, allow rwx pool=images-cache client.nova
> > key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
> > caps: [mon] allow r
> > caps: [osd] allow class-read object_prefix rbd_children, allow rwx
> > pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow
> > rwx pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache
> > client.radosgw.gateway
> > key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
> > auid: 0
> > caps: [mon] allow rwx
> > caps: [osd] allow rwx
> > mgr.172.30.126.26
> > key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
> > caps: [mds] allow *
> > caps: [mon] allow profile mgr
> > caps: [osd] allow *
> > mgr.172.30.126.27
> > key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
> > caps: [mds] allow *
> > caps: [mon] allow profile mgr
> > caps: [osd] allow *
> > mgr.172.30.126.28
> > key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
> > caps: [mds] allow *
> > caps: [mon] allow profile mgr
> > caps: [osd] allow *
> >
> >
> > Can s

Re: [ceph-users] Multiple rbd images from different clusters

2019-06-05 Thread CUZA Frédéric
Hi,

Thank you all for you quick answer.
I think that will solve our problem.

This is what we came up with this :
rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
/etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test

This rbd image is a test with only 5Gb of datas inside of it.

Unfortunately the command seems to be stuck and nothing happens, both ports 
7800 / 6789 / 22.

We can't find no logs on any monitors.

Thanks !

-Message d'origine-
De : ceph-users  De la part de Jason Dillaman
Envoyé : 04 June 2019 14:11
À : Burkhard Linke 
Cc : ceph-users 
Objet : Re: [ceph-users] Multiple rbd images from different clusters

On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:
>
> On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke 
>  wrote:
> >
> > Hi,
> >
> > On 6/4/19 10:12 AM, CUZA Frédéric wrote:
> >
> > Hi everyone,
> >
> >
> >
> > We want to migrate datas from one cluster (Hammer) to a new one (Mimic). We 
> > do not wish to upgrade the actual cluster as all the hardware is EOS and we 
> > upgrade the configuration of the servers.
> >
> > We can’t find a “proper” way to mount two rbd images from two different 
> > cluster on the same host.
> >
> > Does anyone know what is the “good” procedure to achieve this ?
>
> Copy your "/etc/ceph/ceph.conf" and associated keyrings for both 
> clusters to a single machine (preferably running a Mimic "rbd" client) 
> under "/etc/ceph/.conf" and 
> "/etc/ceph/.client..keyring".
>
> You can then use "rbd -c  export --export-format 2 
>  - | rbd -c  import --export-format=2 - 
> ". The "--export-format=2" option will also copy all 
> associated snapshots with the images. If you don't want/need the 
> snapshots, just drop that optional.

That "-c" should be "--cluster" if specifying by name, otherwise with "-c" it's 
the full path to the two different conf files.

> >
> > Just my 2 ct:
> >
> > the 'rbd' commands allows specifying a configuration file (-c). You need to 
> > setup two configuration files, one for each cluster. You can also use two 
> > different cluster names (--cluster option). AFAIK the name is only used to 
> > locate the configuration file. I'm not sure how well the kernel works with 
> > mapping RBDs from two different cluster.
> >
> >
> > If you only want to transfer RBDs from one cluster to another, you do not 
> > need to map and mount them; the 'rbd' command has the sub commands 'export' 
> > and 'import'. You can pipe them to avoid writing data to a local disk. This 
> > should be the fastest way to transfer the RBDs.
> >
> >
> > Regards,
> >
> > Burkhard
> >
> > --
> > Dr. rer. nat. Burkhard Linke
> > Bioinformatics and Systems Biology
> > Justus-Liebig-University Giessen
> > 35392 Giessen, Germany
> > Phone: (+49) (0)641 9935810
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-05 Thread Jason Dillaman
On Wed, Jun 5, 2019 at 11:31 AM CUZA Frédéric  wrote:
>
> Hi,
>
> Thank you all for you quick answer.
> I think that will solve our problem.
>
> This is what we came up with this :
> rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
> export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
> /etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
>
> This rbd image is a test with only 5Gb of datas inside of it.
>
> Unfortunately the command seems to be stuck and nothing happens, both ports 
> 7800 / 6789 / 22.
>
> We can't find no logs on any monitors.

Try running "rbd -c /path/to/conf --keyring /path/to/keyring ls" or
"ceph -c /path/to/conf --keyring /path/to/keyring health" just to test
connectivity first.

> Thanks !
>
> -Message d'origine-
> De : ceph-users  De la part de Jason 
> Dillaman
> Envoyé : 04 June 2019 14:11
> À : Burkhard Linke 
> Cc : ceph-users 
> Objet : Re: [ceph-users] Multiple rbd images from different clusters
>
> On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:
> >
> > On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
> >  wrote:
> > >
> > > Hi,
> > >
> > > On 6/4/19 10:12 AM, CUZA Frédéric wrote:
> > >
> > > Hi everyone,
> > >
> > >
> > >
> > > We want to migrate datas from one cluster (Hammer) to a new one (Mimic). 
> > > We do not wish to upgrade the actual cluster as all the hardware is EOS 
> > > and we upgrade the configuration of the servers.
> > >
> > > We can’t find a “proper” way to mount two rbd images from two different 
> > > cluster on the same host.
> > >
> > > Does anyone know what is the “good” procedure to achieve this ?
> >
> > Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
> > clusters to a single machine (preferably running a Mimic "rbd" client)
> > under "/etc/ceph/.conf" and
> > "/etc/ceph/.client..keyring".
> >
> > You can then use "rbd -c  export --export-format 2
> >  - | rbd -c  import --export-format=2 -
> > ". The "--export-format=2" option will also copy all
> > associated snapshots with the images. If you don't want/need the
> > snapshots, just drop that optional.
>
> That "-c" should be "--cluster" if specifying by name, otherwise with "-c" 
> it's the full path to the two different conf files.
>
> > >
> > > Just my 2 ct:
> > >
> > > the 'rbd' commands allows specifying a configuration file (-c). You need 
> > > to setup two configuration files, one for each cluster. You can also use 
> > > two different cluster names (--cluster option). AFAIK the name is only 
> > > used to locate the configuration file. I'm not sure how well the kernel 
> > > works with mapping RBDs from two different cluster.
> > >
> > >
> > > If you only want to transfer RBDs from one cluster to another, you do not 
> > > need to map and mount them; the 'rbd' command has the sub commands 
> > > 'export' and 'import'. You can pipe them to avoid writing data to a local 
> > > disk. This should be the fastest way to transfer the RBDs.
> > >
> > >
> > > Regards,
> > >
> > > Burkhard
> > >
> > > --
> > > Dr. rer. nat. Burkhard Linke
> > > Bioinformatics and Systems Biology
> > > Justus-Liebig-University Giessen
> > > 35392 Giessen, Germany
> > > Phone: (+49) (0)641 9935810
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Jason
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Single threaded IOPS on SSD pool.

2019-06-05 Thread Eneko Lacunza

Hi,

El 5/6/19 a las 16:53, vita...@yourcmc.ru escribió:

Ok, average network latency from VM to OSD's ~0.4ms.


It's rather bad, you can improve the latency by 0.3ms just by 
upgrading the network.



Single threaded performance ~500-600 IOPS - or average latency of 1.6ms
Is that comparable to what other are seeing?


Good "reference" numbers are 0.5ms for reads (~2000 iops) and 1ms for 
writes (~1000 iops).


I confirm that the most powerful thing to do is disabling CPU 
powersave (governor=performance + cpupower -D 0). You usually get 2x 
single thread iops at once.


We have a small cluster with 4 OSD host, each with 1 SSD INTEL 
SSDSC2KB019T8 (D3-S4510 1.8T), connected with a 10G network (shared with 
VMs, not a busy cluster). Volumes are replica 3:


Network latency from one node to the other 3:
10 packets transmitted, 10 received, 0% packet loss, time 9166ms
rtt min/avg/max/mdev = 0.042/0.064/0.088/0.013 ms

10 packets transmitted, 10 received, 0% packet loss, time 9190ms
rtt min/avg/max/mdev = 0.047/0.072/0.110/0.017 ms

10 packets transmitted, 10 received, 0% packet loss, time 9219ms
rtt min/avg/max/mdev = 0.061/0.078/0.099/0.011 ms

You fio test on a 4-core VM:

$ fio fio-job-randr.ini
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1

fio-3.12
Starting 1 process
test: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=10.3MiB/s][r=2636 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=4056: Wed Jun  5 17:14:33 2019
  Description  : [fio random 4k reads]
  read: IOPS=2386, BW=9544KiB/s (9773kB/s)(559MiB/60001msec)
    slat (nsec): min=0, max=616576, avg=10847.27, stdev=3253.55
    clat (nsec): min=0, max=10346k, avg=406536.60, stdev=145643.92
 lat (nsec): min=0, max=10354k, avg=417653.11, stdev=145740.26
    clat percentiles (usec):
 |  1.00th=[   37],  5.00th=[  202], 10.00th=[  258], 20.00th=[ 318],
 | 30.00th=[  351], 40.00th=[  383], 50.00th=[  416], 60.00th=[ 445],
 | 70.00th=[  474], 80.00th=[  502], 90.00th=[  545], 95.00th=[ 578],
 | 99.00th=[  701], 99.50th=[  742], 99.90th=[ 1004], 99.95th=[ 1500],
 | 99.99th=[ 3752]
   bw (  KiB/s): min=    0, max=10640, per=100.00%, avg=9544.13, 
stdev=486.02, samples=120
   iops    : min=    0, max= 2660, avg=2386.03, stdev=121.50, 
samples=120

  lat (usec)   : 2=0.01%, 50=2.94%, 100=0.17%, 250=6.20%, 500=70.34%
  lat (usec)   : 750=19.92%, 1000=0.33%
  lat (msec)   : 2=0.07%, 4=0.03%, 10=0.01%, 20=0.01%
  cpu  : usr=1.01%, sys=3.44%, ctx=143387, majf=0, minf=16
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%

 issued rwts: total=143163,0,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=9544KiB/s (9773kB/s), 9544KiB/s-9544KiB/s 
(9773kB/s-9773kB/s), io=559MiB (586MB), run=60001-60001msec


Disk stats (read/write):
    dm-0: ios=154244/120, merge=0/0, ticks=63120/12, in_queue=63128, 
util=96.98%, aggrios=154244/58, aggrmerge=0/62, aggrticks=63401/40, 
aggrin_queue=62800, aggrutil=96.42%
  sda: ios=154244/58, merge=0/62, ticks=63401/40, in_queue=62800, 
util=96.42%



So if I read correctly, about 2500 IOPS read. I see governor=performance 
(out of the box on Proxmox VE I think). We touched cpupower, at least 
not from beyond what does our distribution (Proxmox VE).


For reference, the same test with random write (KVM disk cache is 
write-back):


$ fio fio-job-randw.ini
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=1

fio-3.12
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=35.5MiB/s][w=9077 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=4278: Wed Jun  5 17:35:51 2019
  Description  : [fio random 4k writes]
  write: IOPS=9809, BW=38.3MiB/s (40.2MB/s)(2299MiB/60001msec); 0 zone 
resets

    slat (nsec): min=0, max=856527, avg=13669.16, stdev=5257.21
    clat (nsec): min=0, max=256305k, avg=86123.12, stdev=913448.71
 lat (nsec): min=0, max=256328k, avg=100145.33, stdev=913512.45
    clat percentiles (usec):
 |  1.00th=[   37],  5.00th=[   41], 10.00th=[   46], 20.00th=[   54],
 | 30.00th=[   60], 40.00th=[   65], 50.00th=[   71], 60.00th=[   78],
 | 70.00th=[   86], 80.00th=[   96], 90.00th=[  119], 95.00th=[ 151],
 | 99.00th=[  251], 99.50th=[  297], 99.90th=[  586], 99.95th=[ 857],
 | 99.99th=[ 4490]
   bw (  KiB/s): min=    0, max=52392, per=100.00%, avg=39243.27, 
stdev=3553.88, samples=119
   iops    : min=    0, max=13098, avg=9810.81, stdev=888.47, 
samples=119

  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=15.44%
  lat (usec)   : 100=67.16%, 250=16.36%, 500=0.90%, 750=0.06%, 1000=0.03%
  lat (msec

[ceph-users] Changing the release cadence

2019-06-05 Thread Sage Weil
Hi everyone,

Since luminous, we have had the follow release cadence and policy:   
 - release every 9 months
 - maintain backports for the last two releases
 - enable upgrades to move either 1 or 2 releases heads
   (e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; ...)

This has mostly worked out well, except that the mimic release received 
less attention that we wanted due to the fact that multiple downstream 
Ceph products (from Red Has and SUSE) decided to based their next release 
on nautilus.  Even though upstream every release is an "LTS" release, as a 
practical matter mimic got less attention than luminous or nautilus.

We've had several requests/proposals to shift to a 12 month cadence. This 
has several advantages:

 - Stable/conservative clusters only have to be upgraded every 2 years
   (instead of every 18 months)
 - Yearly releases are more likely to intersect with downstream
   distribution release (e.g., Debian).  In the past there have been 
   problems where the Ceph releases included in consecutive releases of a 
   distro weren't easily upgradeable.
 - Vendors that make downstream Ceph distributions/products tend to
   release yearly.  Aligning with those vendors means they are more likely 
   to productize *every* Ceph release.  This will help make every Ceph 
   release an "LTS" release (not just in name but also in terms of 
   maintenance attention).

So far the balance of opinion seems to favor a shift to a 12 month 
cycle[1], especially among developers, so it seems pretty likely we'll 
make that shift.  (If you do have strong concerns about such a move, now 
is the time to raise them.)

That brings us to an important decision: what time of year should we 
release?  Once we pick the timing, we'll be releasing at that time *every 
year* for each release (barring another schedule shift, which we want to 
avoid), so let's choose carefully!

A few options:

 - November: If we release Octopus 9 months from the Nautilus release
   (planned for Feb, released in Mar) then we'd target this November.  We 
   could shift to a 12 months candence after that.
 - February: That's 12 months from the Nautilus target.
 - March: That's 12 months from when Nautilus was *actually* released.

November is nice in the sense that we'd wrap things up before the 
holidays.  It's less good in that users may not be inclined to install the 
new release when many developers will be less available in December.

February kind of sucked in that the scramble to get the last few things
done happened during the holidays.  OTOH, we should be doing what we can
to avoid such scrambles, so that might not be something we should factor
in.  March may be a bit more balanced, with a solid 3 months before when
people are productive, and 3 months after before they disappear on holiday
to address any post-release issues.

People tend to be somewhat less available over the summer months due to
holidays etc, so an early or late summer release might also be less than
ideal.

Thoughts?  If we can narrow it down to a few options maybe we could do a
poll to gauge user preferences.

Thanks!
sage


[1] https://twitter.com/larsmb/status/1130010208971952129

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-05 Thread Jorge Garcia
We have been testing a new installation of ceph (mimic 13.2.2) mostly 
using cephfs (for now). The current test is just setting up a filesystem 
for backups of our other filesystems. After rsyncing data for a few 
days, we started getting this from ceph -s:


health: HEALTH_WARN
    1 MDSs report slow metadata IOs
    1 MDSs behind on trimming

I have been googling for solutions and reading the docs and the 
ceph-users list, but I haven't found a way to get rid of these messages 
and get back to HEALTH_OK. Some of the things I have tried (from 
suggestions around the internet):


- Increasing the amount of RAM on the MDS server (Currently 192 GB)
- Increasing mds_log_max_segments (Currently 256)
- Increasing mds_cache_memory_limit

The message still reports a HEALTH_WARN. Currently, the filesystem is 
idle, no I/O happening. Not sure what to try next. Any suggestions?


Thanks in advance!

Jorge

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple rbd images from different clusters

2019-06-05 Thread Jordan Share
One thing to keep in mind when pipelining rbd export/import is that the 
default is just a raw image dump.


So if you have a large, but not very full, RBD, you will dump all those 
zeroes into the pipeline.


In our case, it was actually faster to write to a (sparse) temp file and 
read it in again afterwards than to pipeline.


However, we are not using --export-format 2, which I now suspect would 
mitigate this.


Jordan


On 6/5/2019 8:30 AM, CUZA Frédéric wrote:

Hi,

Thank you all for you quick answer.
I think that will solve our problem.

This is what we came up with this :
rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
/etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test

This rbd image is a test with only 5Gb of datas inside of it.

Unfortunately the command seems to be stuck and nothing happens, both ports 
7800 / 6789 / 22.

We can't find no logs on any monitors.

Thanks !

-Message d'origine-
De : ceph-users  De la part de Jason Dillaman
Envoyé : 04 June 2019 14:11
À : Burkhard Linke 
Cc : ceph-users 
Objet : Re: [ceph-users] Multiple rbd images from different clusters

On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:


On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
 wrote:


Hi,

On 6/4/19 10:12 AM, CUZA Frédéric wrote:

Hi everyone,



We want to migrate datas from one cluster (Hammer) to a new one (Mimic). We do 
not wish to upgrade the actual cluster as all the hardware is EOS and we 
upgrade the configuration of the servers.

We can’t find a “proper” way to mount two rbd images from two different cluster 
on the same host.

Does anyone know what is the “good” procedure to achieve this ?


Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
clusters to a single machine (preferably running a Mimic "rbd" client)
under "/etc/ceph/.conf" and
"/etc/ceph/.client..keyring".

You can then use "rbd -c  export --export-format 2
 - | rbd -c  import --export-format=2 -
". The "--export-format=2" option will also copy all
associated snapshots with the images. If you don't want/need the
snapshots, just drop that optional.


That "-c" should be "--cluster" if specifying by name, otherwise with "-c" it's 
the full path to the two different conf files.



Just my 2 ct:

the 'rbd' commands allows specifying a configuration file (-c). You need to 
setup two configuration files, one for each cluster. You can also use two 
different cluster names (--cluster option). AFAIK the name is only used to 
locate the configuration file. I'm not sure how well the kernel works with 
mapping RBDs from two different cluster.


If you only want to transfer RBDs from one cluster to another, you do not need 
to map and mount them; the 'rbd' command has the sub commands 'export' and 
'import'. You can pipe them to avoid writing data to a local disk. This should 
be the fastest way to transfer the RBDs.


Regards,

Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Jason




--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove ceph-mgr from a node

2019-06-05 Thread Vandeir Eduardo
I am trying to resolve some kind of inconsistency.

My ceph -s:
  services:
mon: 1 daemons, quorum cephback2 (age 22h)
mgr: cephback2(active, since 28m), standbys: cephback1
osd: 6 osds: 6 up (since 22h), 6 in (since 24h); 125 remapped pgs

But when I do

ceph mgr module enable dashboard

It starts ceph-mgr listening on port 8443 in cephback1, instead of cephback2

See:
root@cephback1:/etc/ceph# lsof -i -P -n|grep ceph-mgr|grep LISTEN
ceph-mgr  6832ceph   27u  IPv6  54536  0t0  TCP *:8443 (LISTEN)

root@cephback2:/etc/ceph# lsof -i -P -n|grep ceph-mgr|grep LISTEN
ceph-mgr  78871ceph   25u  IPv4 939321  0t0  TCP *:6812 (LISTEN)
ceph-mgr  78871ceph   26u  IPv4 939335  0t0  TCP *:6813 (LISTEN)

Shouldnt ceph-mgr, listening on port 8443, be started at cephback2,
the active one?

Output of 'ceph mgr services'
root@cephback1:/etc/ceph# ceph mgr services
{
"dashboard": "https://cephback2.xxx.xx:8443/";
}

If I try to access https://cephback1.xxx.xx:8443, it redirect the browser to
https://cephback2.xxx.xx:8443, what, obviously doesnt work.

Seems like there is some kind of inconsistency between the active
ceph-mgr node and where the dashboard is to be started...

On Wed, Jun 5, 2019 at 11:47 AM Marc Roos  wrote:
>
>
> What is wrong with?
>
> service ceph-mgr@c stop
> systemctl disable ceph-mgr@c
>
>
> -Original Message-
> From: Vandeir Eduardo [mailto:vandeir.edua...@gmail.com]
> Sent: woensdag 5 juni 2019 16:44
> To: ceph-users
> Subject: [ceph-users] How to remove ceph-mgr from a node
>
> Hi guys,
>
> sorry, but I'm not finding in documentation how to remove ceph-mgr from
> a node. Is it possible?
>
> Thanks.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG scrub stamps reset to 0.000000 in 14.2.1

2019-06-05 Thread Jonas Jelten
Hi!

I'm also affected by this:

HEALTH_WARN 13 pgs not deep-scrubbed in time; 13 pgs not scrubbed in time
PG_NOT_DEEP_SCRUBBED 13 pgs not deep-scrubbed in time
pg 6.b1 not deep-scrubbed since 0.00
pg 7.ac not deep-scrubbed since 0.00
pg 7.a0 not deep-scrubbed since 0.00
pg 6.96 not deep-scrubbed since 0.00
pg 7.92 not deep-scrubbed since 0.00
pg 6.86 not deep-scrubbed since 0.00
pg 7.74 not deep-scrubbed since 0.00
pg 7.75 not deep-scrubbed since 0.00
pg 7.49 not deep-scrubbed since 0.00
pg 7.47 not deep-scrubbed since 0.00
pg 6.2a not deep-scrubbed since 0.00
pg 6.26 not deep-scrubbed since 0.00
pg 6.b not deep-scrubbed since 0.00
PG_NOT_SCRUBBED 13 pgs not scrubbed in time
pg 6.b1 not scrubbed since 0.00
pg 7.ac not scrubbed since 0.00
pg 7.a0 not scrubbed since 0.00
pg 6.96 not scrubbed since 0.00
pg 7.92 not scrubbed since 0.00
pg 6.86 not scrubbed since 0.00
pg 7.74 not scrubbed since 0.00
pg 7.75 not scrubbed since 0.00
pg 7.49 not scrubbed since 0.00
pg 7.47 not scrubbed since 0.00
pg 6.2a not scrubbed since 0.00
pg 6.26 not scrubbed since 0.00
pg 6.b not scrubbed since 0.00


A week ago this status was:


HEALTH_WARN 6 pgs not deep-scrubbed in time; 6 pgs not scrubbed in time
PG_NOT_DEEP_SCRUBBED 6 pgs not deep-scrubbed in time
pg 7.b1 not deep-scrubbed since 0.00
pg 7.7e not deep-scrubbed since 0.00
pg 6.6e not deep-scrubbed since 0.00
pg 7.8 not deep-scrubbed since 0.00
pg 7.40 not deep-scrubbed since 0.00
pg 6.f5 not deep-scrubbed since 0.00
PG_NOT_SCRUBBED 6 pgs not scrubbed in time
pg 7.b1 not scrubbed since 0.00
pg 7.7e not scrubbed since 0.00
pg 6.6e not scrubbed since 0.00
pg 7.8 not scrubbed since 0.00
pg 7.40 not scrubbed since 0.00
pg 6.f5 not scrubbed since 0.00


Is this a known problem already? I can't find a bug report.


Cheers

-- Jonas



On 16/05/2019 01.13, Brett Chancellor wrote:
> After upgrading from 14.2.0 to 14.2.1, I've noticed PGs are frequently 
> resetting their scrub and deep scrub time stamps
> to 0.00.  It's extra strange because the peers show timestamps for deep 
> scrubs.
> 
> ## First entry from a pg list at 7pm
> $ grep 11.2f2 ~/pgs-active.7pm 
> 11.2f2     691        0         0       0 2897477632           0          0 
> 2091 active+clean    3h  7378'12291 
>  8048:36261    [1,6,37]p1    [1,6,37]p1 2019-05-14 21:01:29.172460 2019-05-14 
> 21:01:29.172460 
> 
> ## Next Entry 3 minutes later
> $ ceph pg ls active |grep 11.2f2
> 11.2f2     695        0         0       0 2914713600           0          0 
> 2091 active+clean    6s  7378'12291 
>  8049:36330    [1,6,37]p1    [1,6,37]p1                   0.00            
>        0.00 
> 
> ## PG Query
> {
>     "state": "active+clean",
>     "snap_trimq": "[]",
>     "snap_trimq_len": 0,
>     "epoch": 8049,
>     "up": [
>         1,
>         6,
>         37
>     ],
>     "acting": [
>         1,
>         6,
>         37
>     ],
>     "acting_recovery_backfill": [
>         "1",
>         "6",
>         "37"
>     ],
>     "info": {
>         "pgid": "11.2f2",
>         "last_update": "7378'12291",
>         "last_complete": "7378'12291",
>         "log_tail": "1087'10200",
>         "last_user_version": 12291,
>         "last_backfill": "MAX",
>         "last_backfill_bitwise": 1,
>         "purged_snaps": [],
>         "history": {
>             "epoch_created": 1549,
>             "epoch_pool_created": 216,
>             "last_epoch_started": 6148,
>             "last_interval_started": 6147,
>             "last_epoch_clean": 6148,
>             "last_interval_clean": 6147,
>             "last_epoch_split": 6147,
>             "last_epoch_marked_full": 0,
>             "same_up_since": 6126,
>             "same_interval_since": 6147,
>             "same_primary_since": 6126,
>             "last_scrub": "7378'12291",
>             "last_scrub_stamp": "0.00",
>             "last_deep_scrub": "6103'12186",
>             "last_deep_scrub_stamp": "0.00",
>             "last_clean_scrub_stamp": "2019-05-15 23:08:17.014575"
>         },
>         "stats": {
>             "version": "7378'12291",
>             "reported_seq": "36700",
>             "reported_epoch": "8049",
>             "state": "active+clean",
>             "last_fresh": "2019-05-15 23:08:17.014609",
>             "last_change": "2019-05-15 23:08:17.014609",
>             "last_active": "2019-05-15 23:08:17.014609",
>             "last_peered": "2019-05-15 23:08:17.014609",
>             "last_clean": "2019-05-15 23:08:17.014609",
>             "last_became_active": "2019-05-15 19:25:01.484322",
>             "last_became_peered": "2019-05-15 19:25:01.484322",
>             "last_unstale": "2019-05-15 23:08:17.014609",
>      

Re: [ceph-users] PG scrub stamps reset to 0.000000 in 14.2.1

2019-06-05 Thread Gregory Farnum
On Wed, Jun 5, 2019 at 10:10 AM Jonas Jelten  wrote:
>
> Hi!
>
> I'm also affected by this:
>
> HEALTH_WARN 13 pgs not deep-scrubbed in time; 13 pgs not scrubbed in time
> PG_NOT_DEEP_SCRUBBED 13 pgs not deep-scrubbed in time
> pg 6.b1 not deep-scrubbed since 0.00
> pg 7.ac not deep-scrubbed since 0.00
> pg 7.a0 not deep-scrubbed since 0.00
> pg 6.96 not deep-scrubbed since 0.00
> pg 7.92 not deep-scrubbed since 0.00
> pg 6.86 not deep-scrubbed since 0.00
> pg 7.74 not deep-scrubbed since 0.00
> pg 7.75 not deep-scrubbed since 0.00
> pg 7.49 not deep-scrubbed since 0.00
> pg 7.47 not deep-scrubbed since 0.00
> pg 6.2a not deep-scrubbed since 0.00
> pg 6.26 not deep-scrubbed since 0.00
> pg 6.b not deep-scrubbed since 0.00
> PG_NOT_SCRUBBED 13 pgs not scrubbed in time
> pg 6.b1 not scrubbed since 0.00
> pg 7.ac not scrubbed since 0.00
> pg 7.a0 not scrubbed since 0.00
> pg 6.96 not scrubbed since 0.00
> pg 7.92 not scrubbed since 0.00
> pg 6.86 not scrubbed since 0.00
> pg 7.74 not scrubbed since 0.00
> pg 7.75 not scrubbed since 0.00
> pg 7.49 not scrubbed since 0.00
> pg 7.47 not scrubbed since 0.00
> pg 6.2a not scrubbed since 0.00
> pg 6.26 not scrubbed since 0.00
> pg 6.b not scrubbed since 0.00
>
>
> A week ago this status was:
>
>
> HEALTH_WARN 6 pgs not deep-scrubbed in time; 6 pgs not scrubbed in time
> PG_NOT_DEEP_SCRUBBED 6 pgs not deep-scrubbed in time
> pg 7.b1 not deep-scrubbed since 0.00
> pg 7.7e not deep-scrubbed since 0.00
> pg 6.6e not deep-scrubbed since 0.00
> pg 7.8 not deep-scrubbed since 0.00
> pg 7.40 not deep-scrubbed since 0.00
> pg 6.f5 not deep-scrubbed since 0.00
> PG_NOT_SCRUBBED 6 pgs not scrubbed in time
> pg 7.b1 not scrubbed since 0.00
> pg 7.7e not scrubbed since 0.00
> pg 6.6e not scrubbed since 0.00
> pg 7.8 not scrubbed since 0.00
> pg 7.40 not scrubbed since 0.00
> pg 6.f5 not scrubbed since 0.00
>
>
> Is this a known problem already? I can't find a bug report.

https://tracker.ceph.com/issues/40073

Fix is in progress!

>
>
> Cheers
>
> -- Jonas
>
>
>
> On 16/05/2019 01.13, Brett Chancellor wrote:
> > After upgrading from 14.2.0 to 14.2.1, I've noticed PGs are frequently 
> > resetting their scrub and deep scrub time stamps
> > to 0.00.  It's extra strange because the peers show timestamps for deep 
> > scrubs.
> >
> > ## First entry from a pg list at 7pm
> > $ grep 11.2f2 ~/pgs-active.7pm
> > 11.2f2 6910 0   0 2897477632   0  0 
> > 2091 active+clean3h  7378'12291
> >  8048:36261[1,6,37]p1[1,6,37]p1 2019-05-14 21:01:29.172460 
> > 2019-05-14 21:01:29.172460
> >
> > ## Next Entry 3 minutes later
> > $ ceph pg ls active |grep 11.2f2
> > 11.2f2 6950 0   0 2914713600   0  0 
> > 2091 active+clean6s  7378'12291
> >  8049:36330[1,6,37]p1[1,6,37]p1   0.00  
> >  0.00
> >
> > ## PG Query
> > {
> > "state": "active+clean",
> > "snap_trimq": "[]",
> > "snap_trimq_len": 0,
> > "epoch": 8049,
> > "up": [
> > 1,
> > 6,
> > 37
> > ],
> > "acting": [
> > 1,
> > 6,
> > 37
> > ],
> > "acting_recovery_backfill": [
> > "1",
> > "6",
> > "37"
> > ],
> > "info": {
> > "pgid": "11.2f2",
> > "last_update": "7378'12291",
> > "last_complete": "7378'12291",
> > "log_tail": "1087'10200",
> > "last_user_version": 12291,
> > "last_backfill": "MAX",
> > "last_backfill_bitwise": 1,
> > "purged_snaps": [],
> > "history": {
> > "epoch_created": 1549,
> > "epoch_pool_created": 216,
> > "last_epoch_started": 6148,
> > "last_interval_started": 6147,
> > "last_epoch_clean": 6148,
> > "last_interval_clean": 6147,
> > "last_epoch_split": 6147,
> > "last_epoch_marked_full": 0,
> > "same_up_since": 6126,
> > "same_interval_since": 6147,
> > "same_primary_since": 6126,
> > "last_scrub": "7378'12291",
> > "last_scrub_stamp": "0.00",
> > "last_deep_scrub": "6103'12186",
> > "last_deep_scrub_stamp": "0.00",
> > "last_clean_scrub_stamp": "2019-05-15 23:08:17.014575"
> > },
> > "stats": {
> > "version": "7378'12291",
> > "reported_seq": "36700",
> > "reported_epoch": "8049",
> > "state": "active+clean",
> > "last_fresh": "2019-05-15 23:08:17.014609",
> > "last_change": "2019-05-15 23:08:17.014609",
> >

Re: [ceph-users] Multiple rbd images from different clusters

2019-06-05 Thread Jason Dillaman
On Wed, Jun 5, 2019 at 12:59 PM Jordan Share  wrote:
>
> One thing to keep in mind when pipelining rbd export/import is that the
> default is just a raw image dump.
>
> So if you have a large, but not very full, RBD, you will dump all those
> zeroes into the pipeline.
>
> In our case, it was actually faster to write to a (sparse) temp file and
> read it in again afterwards than to pipeline.
>
> However, we are not using --export-format 2, which I now suspect would
> mitigate this.

It's supposed to help since it's only using diffs -- never the full
image export.

> Jordan
>
>
> On 6/5/2019 8:30 AM, CUZA Frédéric wrote:
> > Hi,
> >
> > Thank you all for you quick answer.
> > I think that will solve our problem.
> >
> > This is what we came up with this :
> > rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
> > export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
> > /etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
> >
> > This rbd image is a test with only 5Gb of datas inside of it.
> >
> > Unfortunately the command seems to be stuck and nothing happens, both ports 
> > 7800 / 6789 / 22.
> >
> > We can't find no logs on any monitors.
> >
> > Thanks !
> >
> > -Message d'origine-
> > De : ceph-users  De la part de Jason 
> > Dillaman
> > Envoyé : 04 June 2019 14:11
> > À : Burkhard Linke 
> > Cc : ceph-users 
> > Objet : Re: [ceph-users] Multiple rbd images from different clusters
> >
> > On Tue, Jun 4, 2019 at 8:07 AM Jason Dillaman  wrote:
> >>
> >> On Tue, Jun 4, 2019 at 4:45 AM Burkhard Linke
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 6/4/19 10:12 AM, CUZA Frédéric wrote:
> >>>
> >>> Hi everyone,
> >>>
> >>>
> >>>
> >>> We want to migrate datas from one cluster (Hammer) to a new one (Mimic). 
> >>> We do not wish to upgrade the actual cluster as all the hardware is EOS 
> >>> and we upgrade the configuration of the servers.
> >>>
> >>> We can’t find a “proper” way to mount two rbd images from two different 
> >>> cluster on the same host.
> >>>
> >>> Does anyone know what is the “good” procedure to achieve this ?
> >>
> >> Copy your "/etc/ceph/ceph.conf" and associated keyrings for both
> >> clusters to a single machine (preferably running a Mimic "rbd" client)
> >> under "/etc/ceph/.conf" and
> >> "/etc/ceph/.client..keyring".
> >>
> >> You can then use "rbd -c  export --export-format 2
> >>  - | rbd -c  import --export-format=2 -
> >> ". The "--export-format=2" option will also copy all
> >> associated snapshots with the images. If you don't want/need the
> >> snapshots, just drop that optional.
> >
> > That "-c" should be "--cluster" if specifying by name, otherwise with "-c" 
> > it's the full path to the two different conf files.
> >
> >>>
> >>> Just my 2 ct:
> >>>
> >>> the 'rbd' commands allows specifying a configuration file (-c). You need 
> >>> to setup two configuration files, one for each cluster. You can also use 
> >>> two different cluster names (--cluster option). AFAIK the name is only 
> >>> used to locate the configuration file. I'm not sure how well the kernel 
> >>> works with mapping RBDs from two different cluster.
> >>>
> >>>
> >>> If you only want to transfer RBDs from one cluster to another, you do not 
> >>> need to map and mount them; the 'rbd' command has the sub commands 
> >>> 'export' and 'import'. You can pipe them to avoid writing data to a local 
> >>> disk. This should be the fastest way to transfer the RBDs.
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Burkhard
> >>>
> >>> --
> >>> Dr. rer. nat. Burkhard Linke
> >>> Bioinformatics and Systems Biology
> >>> Justus-Liebig-University Giessen
> >>> 35392 Giessen, Germany
> >>> Phone: (+49) (0)641 9935810
> >>>
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >> --
> >> Jason
> >
> >
> >
> > --
> > Jason
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] stuck stale+undersized+degraded PG after removing 3 OSDs

2019-06-05 Thread Sameh
Hello cephers,

I was trying to reproduce a production situation involving a stuck stale PG.

While playing with a test cluster, I aggressively removed 3 OSDs at once
from the cluster. One OSD per host. All pools are size 3.

After re-adding them, I ended up in this situation (PG unfound, or acting on one
OSD, or another, depending on which command you run):
https://rentry.co/6zwof

I am not sure of the next steps to unblock this. Marking OSD 11 down didn't
help.


Cheers,

-- 
Sameh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-05 Thread Daniel Baumann
On 6/5/19 5:57 PM, Sage Weil wrote:
> So far the balance of opinion seems to favor a shift to a 12 month 
> cycle [...] it seems pretty likely we'll make that shift.

thanks, much appreciated (from an cluster operating point of view).

> Thoughts?

GNOME and a few others are doing April and October releases which seems
balanced and to be good timing for most people; personally I prefer
spring rather than autum for upgrades, hence.. would suggest April.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stuck stale+undersized+degraded PG after removing 3 OSDs

2019-06-05 Thread Alex Gorbachev
On Wed, Jun 5, 2019 at 1:36 PM Sameh  wrote:
>
> Hello cephers,
>
> I was trying to reproduce a production situation involving a stuck stale PG.
>
> While playing with a test cluster, I aggressively removed 3 OSDs at once
> from the cluster. One OSD per host. All pools are size 3.
>
> After re-adding them, I ended up in this situation (PG unfound, or acting on 
> one
> OSD, or another, depending on which command you run):
> https://rentry.co/6zwof
>
> I am not sure of the next steps to unblock this. Marking OSD 11 down didn't
> help.
>
>
> Cheers,


I get this in a lab sometimes, and
do

ceph osd set noout

and reboot the node with the stuck PG.

In production, we remove OSDs one by one.


--
Alex Gorbachev
Intelligent Systems Services Inc.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stuck stale+undersized+degraded PG after removing 3 OSDs

2019-06-05 Thread Sameh
Le (On) Wed, Jun 05, 2019 at 01:57:52PM -0400, Alex Gorbachev ecrivit (wrote):
> 
> 
> I get this in a lab sometimes, and
> do
> 
> ceph osd set noout
> 
> and reboot the node with the stuck PG.

Thank you for your feedback.

I tried to do that, even rebooting all the nodes, but nothing changed.


Cheers,

-- 
Sameh Ghane
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] stuck stale+undersized+degraded PG after removing 3 OSDs

2019-06-05 Thread Alex Gorbachev
On Wed, Jun 5, 2019 at 2:30 PM Sameh  wrote:
>
> Le (On) Wed, Jun 05, 2019 at 01:57:52PM -0400, Alex Gorbachev ecrivit (wrote):
> >
> >
> > I get this in a lab sometimes, and
> > do
> >
> > ceph osd set noout
> >
> > and reboot the node with the stuck PG.
>
> Thank you for your feedback.
>
> I tried to do that, even rebooting all the nodes, but nothing changed.
>
>
> Cheers,
>
> --
> Sameh Ghane

How about temporarily setting min_size to 1 on the affected pool?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-05 Thread Alexandre DERUMIER
Hi,


>>- November: If we release Octopus 9 months from the Nautilus release 
>>(planned for Feb, released in Mar) then we'd target this November. We 
>>could shift to a 12 months candence after that. 

For the 2 last debian releases, the freeze was around january-february,
november seem to be a good time for ceph release.

- Mail original -
De: "Sage Weil" 
À: "ceph-users" , "ceph-devel" 
, d...@ceph.io
Envoyé: Mercredi 5 Juin 2019 17:57:52
Objet: Changing the release cadence

Hi everyone, 

Since luminous, we have had the follow release cadence and policy: 
- release every 9 months 
- maintain backports for the last two releases 
- enable upgrades to move either 1 or 2 releases heads 
(e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; ...) 

This has mostly worked out well, except that the mimic release received 
less attention that we wanted due to the fact that multiple downstream 
Ceph products (from Red Has and SUSE) decided to based their next release 
on nautilus. Even though upstream every release is an "LTS" release, as a 
practical matter mimic got less attention than luminous or nautilus. 

We've had several requests/proposals to shift to a 12 month cadence. This 
has several advantages: 

- Stable/conservative clusters only have to be upgraded every 2 years 
(instead of every 18 months) 
- Yearly releases are more likely to intersect with downstream 
distribution release (e.g., Debian). In the past there have been 
problems where the Ceph releases included in consecutive releases of a 
distro weren't easily upgradeable. 
- Vendors that make downstream Ceph distributions/products tend to 
release yearly. Aligning with those vendors means they are more likely 
to productize *every* Ceph release. This will help make every Ceph 
release an "LTS" release (not just in name but also in terms of 
maintenance attention). 

So far the balance of opinion seems to favor a shift to a 12 month 
cycle[1], especially among developers, so it seems pretty likely we'll 
make that shift. (If you do have strong concerns about such a move, now 
is the time to raise them.) 

That brings us to an important decision: what time of year should we 
release? Once we pick the timing, we'll be releasing at that time *every 
year* for each release (barring another schedule shift, which we want to 
avoid), so let's choose carefully! 

A few options: 

- November: If we release Octopus 9 months from the Nautilus release 
(planned for Feb, released in Mar) then we'd target this November. We 
could shift to a 12 months candence after that. 
- February: That's 12 months from the Nautilus target. 
- March: That's 12 months from when Nautilus was *actually* released. 

November is nice in the sense that we'd wrap things up before the 
holidays. It's less good in that users may not be inclined to install the 
new release when many developers will be less available in December. 

February kind of sucked in that the scramble to get the last few things 
done happened during the holidays. OTOH, we should be doing what we can 
to avoid such scrambles, so that might not be something we should factor 
in. March may be a bit more balanced, with a solid 3 months before when 
people are productive, and 3 months after before they disappear on holiday 
to address any post-release issues. 

People tend to be somewhat less available over the summer months due to 
holidays etc, so an early or late summer release might also be less than 
ideal. 

Thoughts? If we can narrow it down to a few options maybe we could do a 
poll to gauge user preferences. 

Thanks! 
sage 


[1] https://twitter.com/larsmb/status/1130010208971952129 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] balancer module makes OSD distribution worse

2019-06-05 Thread Josh Haft
Hi everyone,

On my 13.2.5 cluster, I recently enabled the ceph balancer module in
crush-compat mode. A couple manual 'eval' and 'execute' runs showed
the score improving, so I set the following and enabled the auto
balancer.

mgr/balancer/crush_compat_metrics:bytes # from
https://github.com/ceph/ceph/pull/20665
mgr/balancer/max_misplaced:0.01
mgr/balancer/mode:crush-compat

Log messages from the mgr showed lower scores with each iteration, so
I thought things were moving in the right direction.

Initially my highest-utilized OSD was at 79% and MAXVAR was 1.17. I
let the balancer do its thing for 5 days, at which point my highest
utilized OSD was just over 90% and MAXVAR was about 1.28.

I do have pretty low PG-per-OSD counts (average of about 60 - that's
next on my list), but I explicitly asked the balancer to use the bytes
metric. Was I just being impatient? Is it expected that usage would go
up overall for a time before starting to trend downward? Is my low PG
count affecting this somehow? I would have expected things to move in
the opposite direction pretty quickly as they do with 'ceph osd
reweight-by-utilization'.

Thoughts?

Regards,
Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] balancer module makes OSD distribution worse

2019-06-05 Thread Gregory Farnum
I think the mimic balancer doesn't include omap data when trying to
balance the cluster. (Because it doesn't get usable omap stats from
the cluster anyway; in Nautilus I think it does.) Are you using RGW or
CephFS?
-Greg

On Wed, Jun 5, 2019 at 1:01 PM Josh Haft  wrote:
>
> Hi everyone,
>
> On my 13.2.5 cluster, I recently enabled the ceph balancer module in
> crush-compat mode. A couple manual 'eval' and 'execute' runs showed
> the score improving, so I set the following and enabled the auto
> balancer.
>
> mgr/balancer/crush_compat_metrics:bytes # from
> https://github.com/ceph/ceph/pull/20665
> mgr/balancer/max_misplaced:0.01
> mgr/balancer/mode:crush-compat
>
> Log messages from the mgr showed lower scores with each iteration, so
> I thought things were moving in the right direction.
>
> Initially my highest-utilized OSD was at 79% and MAXVAR was 1.17. I
> let the balancer do its thing for 5 days, at which point my highest
> utilized OSD was just over 90% and MAXVAR was about 1.28.
>
> I do have pretty low PG-per-OSD counts (average of about 60 - that's
> next on my list), but I explicitly asked the balancer to use the bytes
> metric. Was I just being impatient? Is it expected that usage would go
> up overall for a time before starting to trend downward? Is my low PG
> count affecting this somehow? I would have expected things to move in
> the opposite direction pretty quickly as they do with 'ceph osd
> reweight-by-utilization'.
>
> Thoughts?
>
> Regards,
> Josh
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-05 Thread Chris Taylor


It seems like since the change to the 9 months cadence it has been bumpy 
for the Debian based installs. Changing to a 12 month cadence sounds 
like a good idea. Perhaps some Debian maintainers can suggest a good 
month for them to get the packages in time for their release cycle.



On 2019-06-05 12:16 pm, Alexandre DERUMIER wrote:

Hi,



- November: If we release Octopus 9 months from the Nautilus release
(planned for Feb, released in Mar) then we'd target this November. We
could shift to a 12 months candence after that.


For the 2 last debian releases, the freeze was around january-february,
november seem to be a good time for ceph release.

- Mail original -
De: "Sage Weil" 
À: "ceph-users" , "ceph-devel"
, d...@ceph.io
Envoyé: Mercredi 5 Juin 2019 17:57:52
Objet: Changing the release cadence

Hi everyone,

Since luminous, we have had the follow release cadence and policy:
- release every 9 months
- maintain backports for the last two releases
- enable upgrades to move either 1 or 2 releases heads
(e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; 
...)


This has mostly worked out well, except that the mimic release received
less attention that we wanted due to the fact that multiple downstream
Ceph products (from Red Has and SUSE) decided to based their next 
release
on nautilus. Even though upstream every release is an "LTS" release, as 
a

practical matter mimic got less attention than luminous or nautilus.

We've had several requests/proposals to shift to a 12 month cadence. 
This

has several advantages:

- Stable/conservative clusters only have to be upgraded every 2 years
(instead of every 18 months)
- Yearly releases are more likely to intersect with downstream
distribution release (e.g., Debian). In the past there have been
problems where the Ceph releases included in consecutive releases of a
distro weren't easily upgradeable.
- Vendors that make downstream Ceph distributions/products tend to
release yearly. Aligning with those vendors means they are more likely
to productize *every* Ceph release. This will help make every Ceph
release an "LTS" release (not just in name but also in terms of
maintenance attention).

So far the balance of opinion seems to favor a shift to a 12 month
cycle[1], especially among developers, so it seems pretty likely we'll
make that shift. (If you do have strong concerns about such a move, now
is the time to raise them.)

That brings us to an important decision: what time of year should we
release? Once we pick the timing, we'll be releasing at that time 
*every
year* for each release (barring another schedule shift, which we want 
to

avoid), so let's choose carefully!

A few options:

- November: If we release Octopus 9 months from the Nautilus release
(planned for Feb, released in Mar) then we'd target this November. We
could shift to a 12 months candence after that.
- February: That's 12 months from the Nautilus target.
- March: That's 12 months from when Nautilus was *actually* released.

November is nice in the sense that we'd wrap things up before the
holidays. It's less good in that users may not be inclined to install 
the

new release when many developers will be less available in December.

February kind of sucked in that the scramble to get the last few things
done happened during the holidays. OTOH, we should be doing what we can
to avoid such scrambles, so that might not be something we should 
factor

in. March may be a bit more balanced, with a solid 3 months before when
people are productive, and 3 months after before they disappear on 
holiday

to address any post-release issues.

People tend to be somewhat less available over the summer months due to
holidays etc, so an early or late summer release might also be less 
than

ideal.

Thoughts? If we can narrow it down to a few options maybe we could do a
poll to gauge user preferences.

Thanks!
sage


[1] https://twitter.com/larsmb/status/1130010208971952129

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cls_rgw.cc:3461: couldn't find tag in name index tag

2019-06-05 Thread EDH - Manuel Rios Fernandez
Hi

 

Checking our cluster logs we found tons of this lines in the osd.

 

One osd


/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/cls/rgw/cls_rgw.cc:3461: couldn't find tag in
name index tag=48efb8c3-693c-4fe0-bbe4-fdc16f590a82.9710765.5817269

 

Other osd


/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x8
6_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.1/rp
m/el7/BUILD/ceph-14.2.1/src/cls/rgw/cls_rgw.cc:979:
rgw_bucket_complete_op():
entry.name=_multipart_MBS-25c5afb5-f8f1-43cc-91ee-f49a3258012b/CBB_SRVCLASS2
/CBB_DiskImage/Disk_----/Volume_NTFS_000
0----$/20190605210028/102.cbrevision.2~65Mi-_pt5OPiV
6ULDxpScrmPlrD7yEz.208 entry.instance= entry.meta.category=1

 

All 44 ssd got lines like those with different information but refer to the
same cls_rgw.cc 

 

Of course is related to rgw or rgw index I think so but .

 

Are this entries ok? If yes how can we disable it? 

 

Best Regards

Manuel

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-05 Thread Linh Vu
I think 12 months cycle is much better from the cluster operations perspective. 
I also like March as a release month as well.

From: ceph-users  on behalf of Sage Weil 

Sent: Thursday, 6 June 2019 1:57 AM
To: ceph-us...@ceph.com; ceph-de...@vger.kernel.org; d...@ceph.io
Subject: [ceph-users] Changing the release cadence

Hi everyone,

Since luminous, we have had the follow release cadence and policy:
 - release every 9 months
 - maintain backports for the last two releases
 - enable upgrades to move either 1 or 2 releases heads
   (e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; ...)

This has mostly worked out well, except that the mimic release received
less attention that we wanted due to the fact that multiple downstream
Ceph products (from Red Has and SUSE) decided to based their next release
on nautilus.  Even though upstream every release is an "LTS" release, as a
practical matter mimic got less attention than luminous or nautilus.

We've had several requests/proposals to shift to a 12 month cadence. This
has several advantages:

 - Stable/conservative clusters only have to be upgraded every 2 years
   (instead of every 18 months)
 - Yearly releases are more likely to intersect with downstream
   distribution release (e.g., Debian).  In the past there have been
   problems where the Ceph releases included in consecutive releases of a
   distro weren't easily upgradeable.
 - Vendors that make downstream Ceph distributions/products tend to
   release yearly.  Aligning with those vendors means they are more likely
   to productize *every* Ceph release.  This will help make every Ceph
   release an "LTS" release (not just in name but also in terms of
   maintenance attention).

So far the balance of opinion seems to favor a shift to a 12 month
cycle[1], especially among developers, so it seems pretty likely we'll
make that shift.  (If you do have strong concerns about such a move, now
is the time to raise them.)

That brings us to an important decision: what time of year should we
release?  Once we pick the timing, we'll be releasing at that time *every
year* for each release (barring another schedule shift, which we want to
avoid), so let's choose carefully!

A few options:

 - November: If we release Octopus 9 months from the Nautilus release
   (planned for Feb, released in Mar) then we'd target this November.  We
   could shift to a 12 months candence after that.
 - February: That's 12 months from the Nautilus target.
 - March: That's 12 months from when Nautilus was *actually* released.

November is nice in the sense that we'd wrap things up before the
holidays.  It's less good in that users may not be inclined to install the
new release when many developers will be less available in December.

February kind of sucked in that the scramble to get the last few things
done happened during the holidays.  OTOH, we should be doing what we can
to avoid such scrambles, so that might not be something we should factor
in.  March may be a bit more balanced, with a solid 3 months before when
people are productive, and 3 months after before they disappear on holiday
to address any post-release issues.

People tend to be somewhat less available over the summer months due to
holidays etc, so an early or late summer release might also be less than
ideal.

Thoughts?  If we can narrow it down to a few options maybe we could do a
poll to gauge user preferences.

Thanks!
sage


[1] https://twitter.com/larsmb/status/1130010208971952129

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to fix ceph MDS HEALTH_WARN

2019-06-05 Thread Yan, Zheng
On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia  wrote:
>
> We have been testing a new installation of ceph (mimic 13.2.2) mostly
> using cephfs (for now). The current test is just setting up a filesystem
> for backups of our other filesystems. After rsyncing data for a few
> days, we started getting this from ceph -s:
>
> health: HEALTH_WARN
>  1 MDSs report slow metadata IOs
>  1 MDSs behind on trimming
>
> I have been googling for solutions and reading the docs and the
> ceph-users list, but I haven't found a way to get rid of these messages
> and get back to HEALTH_OK. Some of the things I have tried (from
> suggestions around the internet):
>
> - Increasing the amount of RAM on the MDS server (Currently 192 GB)
> - Increasing mds_log_max_segments (Currently 256)
> - Increasing mds_cache_memory_limit
>
> The message still reports a HEALTH_WARN. Currently, the filesystem is
> idle, no I/O happening. Not sure what to try next. Any suggestions?
>

maybe mds is trimming its log. please check if mds' cpu usage and
whole cluster's IO stats.

> Thanks in advance!
>
> Jorge
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-05 Thread 解决
Hello CUZA,
Are you rbd/disk_test in same ceph cluster? 
you export rbd/disk_test with one user while import rbd/disk_test with 
another one?


At 2019-06-05 23:25:45, "CUZA Frédéric"  wrote:
>Thank you all for you quick answer.
>I think that will solve our problem.
>
>This is what we came up with this :
>rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
>export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
>/etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
>
>This rbd image is a test with only 5Gb of datas inside of it.
>
>Unfortunately the command seems to be stuck and nothing happens, both ports 
>7800 / 6789 / 22.
>
>We can't find no logs on any monitors.
>
>Thanks !
>
>-Message d'origine-
>De : ceph-users  De la part de Jason 
>Dillaman
>Envoyé : 04 June 2019 14:14
>À : 解决 
>Cc : ceph-users 
>Objet : Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]
>
>On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
>>
>> Hi all,
>> We use ceph(luminous) + openstack(queens) in my test 
>> environment。The virtual machine does not start properly after the 
>> disaster test and the image of virtual machine can not create snap.The 
>> procedure is as follows:
>> #!/usr/bin/env python
>>
>> import rados
>> import rbd
>> with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
>> with cluster.open_ioctx('vms') as ioctx:
>> rbd_inst = rbd.RBD()
>> print "start open rbd image"
>> with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') 
>> as image:
>> print "start create snapshot"
>> image.create_snap('myimage_snap1')
>>
>> when i run it ,it show readonlyimage,as follows:
>>
>> start open rbd image
>> start create snapshot
>> Traceback (most recent call last):
>>   File "testpool.py", line 17, in 
>> image.create_snap('myimage_snap1')
>>   File "rbd.pyx", line 1790, in rbd.Image.create_snap 
>> (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15
>> 682)
>> rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1 
>> from 10df4634-4401-45ca-9c57-f349b78da475_disk
>>
>> but i run it with admin instead of nova,it is ok.
>>
>> "ceph auth list"  as follow
>>
>> installed auth entries:
>>
>> osd.1
>> key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.10
>> key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.11
>> key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.2
>> key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.4
>> key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.5
>> key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.7
>> key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.8
>> key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> client.admin
>> key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
>> auid: 0
>> caps: [mds] allow
>> caps: [mgr] allow *
>> caps: [mon] allow *
>> caps: [osd] allow *
>> client.cinder
>> key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow 
>> rwx pool=vms-cache, allow rx pool=images, allow rx pool=images-cache 
>> client.cinder-backup
>> key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=backups, allow rwx pool=backups-cache client.glance
>> key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=images, allow rwx pool=images-cache client.nova
>> key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow 
>> rwx pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache 
>> client.radosgw.gateway
>> key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
>> auid: 0
>> caps: [mon] allow rwx
>> caps: [osd] allow rwx
>> mgr.172.30.126.26
>> key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>> mgr.172.30.126.27
>> key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>> mgr.172.30.126.28
>> key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>>
>>
>> Can someone explain it to m

Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]

2019-06-05 Thread 解决


Hello CUZA,
Are you rbd/disk_test in same ceph cluster? 
you export rbd/disk_test with one user while import rbd/disk_test with 
another one?






At 2019-06-05 23:25:45, "CUZA Frédéric"  wrote:
>Thank you all for you quick answer.
>I think that will solve our problem.
>
>This is what we came up with this :
>rbd -c /etc/ceph/Oceph.conf --keyring /etc/ceph/Oceph.client.admin.keyring 
>export rbd/disk_test - | rbd -c /etc/ceph/Nceph.conf --keyring 
>/etc/ceph/Nceph.client.admin.keyring import - rbd/disk_test
>
>This rbd image is a test with only 5Gb of datas inside of it.
>
>Unfortunately the command seems to be stuck and nothing happens, both ports 
>7800 / 6789 / 22.
>
>We can't find no logs on any monitors.
>
>Thanks !
>
>-Message d'origine-
>De : ceph-users  De la part de Jason 
>Dillaman
>Envoyé : 04 June 2019 14:14
>À : 解决 
>Cc : ceph-users 
>Objet : Re: [ceph-users] rbd.ReadOnlyImage: [errno 30]
>
>On Tue, Jun 4, 2019 at 4:55 AM 解决  wrote:
>>
>> Hi all,
>> We use ceph(luminous) + openstack(queens) in my test 
>> environment。The virtual machine does not start properly after the 
>> disaster test and the image of virtual machine can not create snap.The 
>> procedure is as follows:
>> #!/usr/bin/env python
>>
>> import rados
>> import rbd
>> with rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id='nova') as cluster:
>> with cluster.open_ioctx('vms') as ioctx:
>> rbd_inst = rbd.RBD()
>> print "start open rbd image"
>> with rbd.Image(ioctx, '10df4634-4401-45ca-9c57-f349b78da475_disk') 
>> as image:
>> print "start create snapshot"
>> image.create_snap('myimage_snap1')
>>
>> when i run it ,it show readonlyimage,as follows:
>>
>> start open rbd image
>> start create snapshot
>> Traceback (most recent call last):
>>   File "testpool.py", line 17, in 
>> image.create_snap('myimage_snap1')
>>   File "rbd.pyx", line 1790, in rbd.Image.create_snap 
>> (/builddir/build/BUILD/ceph-12.2.5/build/src/pybind/rbd/pyrex/rbd.c:15
>> 682)
>> rbd.ReadOnlyImage: [errno 30] error creating snapshot myimage_snap1 
>> from 10df4634-4401-45ca-9c57-f349b78da475_disk
>>
>> but i run it with admin instead of nova,it is ok.
>>
>> "ceph auth list"  as follow
>>
>> installed auth entries:
>>
>> osd.1
>> key: AQBL7uRcfuyxEBAAoK8JrQWMU6EEf/g83zKJjg==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.10
>> key: AQCV7uRcdsB9IBAAHbHHCaylVUZIPKFX20polQ==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.11
>> key: AQCW7uRcRIMRIhAAbXfLbQwijEO5ZQFWFZaO5w==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.2
>> key: AQBL7uRcfFMWDBAAo7kjQobGBbIHYfZkx45pOw==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.4
>> key: AQBk7uRc97CPOBAAK9IBJICvchZPc5p80bISsg==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.5
>> key: AQBk7uRcOdqaORAAkQeEtYsE6rLWLPhYuCTdHA==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.7
>> key: AQB97uRc+1eRJxAA34DImQIMFjzHSXZ25djp0Q==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> osd.8
>> key: AQB97uRcFilBJhAAXzSzNJsgwpobC8654Xo7Sw==
>> caps: [mon] allow profile osd
>> caps: [osd] allow *
>> client.admin
>> key: AQAU7uRcNia+BBAA09mOYdX+yJWbLCjcuMih0A==
>> auid: 0
>> caps: [mds] allow
>> caps: [mgr] allow *
>> caps: [mon] allow *
>> caps: [osd] allow *
>> client.cinder
>> key: AQBp7+RcOzPHGxAA7azgyayVu2RRNWJ7JxSJEg==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow 
>> rwx pool=vms-cache, allow rx pool=images, allow rx pool=images-cache 
>> client.cinder-backup
>> key: AQBq7+RcVOwGNRAAiwJ59ZvAUc0H4QkVeN82vA==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=backups, allow rwx pool=backups-cache client.glance
>> key: AQDf7uRc32hDBBAAkGucQEVTWqnIpNvihXf/Ng==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=images, allow rwx pool=images-cache client.nova
>> key: AQDN7+RcqDABIxAAXnFcVjBp/S5GkgOy0wqB1Q==
>> caps: [mon] allow r
>> caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
>> pool=volumes, allow rwx pool=volumes-cache, allow rwx pool=vms, allow 
>> rwx pool=vms-cache, allow rwx pool=images, allow rwx pool=images-cache 
>> client.radosgw.gateway
>> key: AQAU7uRccP06CBAA6zLFtDQoTstl8CNclYRugQ==
>> auid: 0
>> caps: [mon] allow rwx
>> caps: [osd] allow rwx
>> mgr.172.30.126.26
>> key: AQAr7uRclc52MhAA+GWCQEVnAHB01tMFpgJtTQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>> mgr.172.30.126.27
>> key: AQAs7uRclkD2OBAAW/cUhcZEebZnQulqVodiXQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>> mgr.172.30.126.28
>> key: AQAu7uRcT9OLBBAAZbEjb/N1NnZpIgfaAcThyQ==
>> caps: [mds] allow *
>> caps: [mon] allow profile mgr
>> caps: [osd] allow *
>>
>>
>> Can someone explain i

Re: [ceph-users] Changing the release cadence

2019-06-05 Thread Dietmar Rieder
+1
Operators view: 12 months cycle is definitely better than 9. March seem
to be a reasonable compromise.

Best
  Dietmar

On 6/6/19 2:31 AM, Linh Vu wrote:
> I think 12 months cycle is much better from the cluster operations
> perspective. I also like March as a release month as well. 
> 
> *From:* ceph-users  on behalf of Sage
> Weil 
> *Sent:* Thursday, 6 June 2019 1:57 AM
> *To:* ceph-us...@ceph.com; ceph-de...@vger.kernel.org; d...@ceph.io
> *Subject:* [ceph-users] Changing the release cadence
>  
> Hi everyone,
> 
> Since luminous, we have had the follow release cadence and policy:  
>  - release every 9 months
>  - maintain backports for the last two releases
>  - enable upgrades to move either 1 or 2 releases heads
>    (e.g., luminous -> mimic or nautilus; mimic -> nautilus or octopus; ...)
> 
> This has mostly worked out well, except that the mimic release received
> less attention that we wanted due to the fact that multiple downstream
> Ceph products (from Red Has and SUSE) decided to based their next release
> on nautilus.  Even though upstream every release is an "LTS" release, as a
> practical matter mimic got less attention than luminous or nautilus.
> 
> We've had several requests/proposals to shift to a 12 month cadence. This
> has several advantages:
> 
>  - Stable/conservative clusters only have to be upgraded every 2 years
>    (instead of every 18 months)
>  - Yearly releases are more likely to intersect with downstream
>    distribution release (e.g., Debian).  In the past there have been
>    problems where the Ceph releases included in consecutive releases of a
>    distro weren't easily upgradeable.
>  - Vendors that make downstream Ceph distributions/products tend to
>    release yearly.  Aligning with those vendors means they are more likely
>    to productize *every* Ceph release.  This will help make every Ceph
>    release an "LTS" release (not just in name but also in terms of
>    maintenance attention).
> 
> So far the balance of opinion seems to favor a shift to a 12 month
> cycle[1], especially among developers, so it seems pretty likely we'll
> make that shift.  (If you do have strong concerns about such a move, now
> is the time to raise them.)
> 
> That brings us to an important decision: what time of year should we
> release?  Once we pick the timing, we'll be releasing at that time *every
> year* for each release (barring another schedule shift, which we want to
> avoid), so let's choose carefully!
> 
> A few options:
> 
>  - November: If we release Octopus 9 months from the Nautilus release
>    (planned for Feb, released in Mar) then we'd target this November.  We
>    could shift to a 12 months candence after that.
>  - February: That's 12 months from the Nautilus target.
>  - March: That's 12 months from when Nautilus was *actually* released.
> 
> November is nice in the sense that we'd wrap things up before the
> holidays.  It's less good in that users may not be inclined to install the
> new release when many developers will be less available in December.
> 
> February kind of sucked in that the scramble to get the last few things
> done happened during the holidays.  OTOH, we should be doing what we can
> to avoid such scrambles, so that might not be something we should factor
> in.  March may be a bit more balanced, with a solid 3 months before when
> people are productive, and 3 months after before they disappear on holiday
> to address any post-release issues.
> 
> People tend to be somewhat less available over the summer months due to
> holidays etc, so an early or late summer release might also be less than
> ideal.
> 
> Thoughts?  If we can narrow it down to a few options maybe we could do a
> poll to gauge user preferences.
> 
> Thanks!
> sage
> 
> 
> [1]
> https://protect-au.mimecast.com/s/N1l6CROAEns1RN1Zu9Jwts?domain=twitter.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com