[ceph-users] PG Calculation query

2017-03-27 Thread nokia ceph
Hello,

We are facing some performance issue with rados bench marking on a 5 node
cluster with PG num 4096 vs 8192.

As per the PG calculation  below is our specification

Size   OSD   % Data Targets PG count
5 340 100 100 8192
5 340 100 50 4096


With 8192 PG count we got good performance with 4096 compared to 8192

With PG count - 4096 -->>


Filesize
256000
512000
1024000
2048000
4096000
12288000
Write Bandwidth MB/sec 1448.38 2503.98 3941.42 5354.7 5333.9 5271.16
Read Bandwidth MB/sec 2924.83 3417.9 4236.65 4469.4 4602.65 4584.6
WRITE Average Latency seconds 0.088355 0.102214 0.129855 0.191155 0.377685
1.13953
WRITE Maximum Latency  seconds 0.280164 0.485391 1.15953 13.5175 27.9876
86.3103
READ Average Latency seconds 0.0437188 0.0747644 0.120604 0.228535 0.436566
1.30415
READ Maximum Latency  seconds 1.13067 3.21548 2.99734 4.08429 9.0224 16.6047

Average IOPS..

#grep "op/s" cephio_0%.txt | awk 'NF { print $(NF - 1) }'| awk '{ total +=
$0 } END { print total/NR }'

7517.49  -->>


With PG count - 8192 -->>


Filesize
256000
512000
1024000
2048000
4096000
12288000
Write Bandwidth MB/sec  534.749 1020.49 1864.58 3100.92 4717.23 5251.76
Read Bandwidth MB/sec  1615.56 2764.25 4061.55 4265.39 4229.38 4042.18
WRITE Average Latency seconds  0.239263 0.250769 0.27448 0.328981 0.427056
1.14352
WRITE Maximum Latency  seconds 9.21752 10.3353 10.8132 11.2135 12.5497
44.8133
READ Average Latency seconds 0.0791822 0.0925167 0.12583 0.239571 0.475198
1.47916
READ Maximum Latency  seconds 2.01021 2.29139 3.60456 3.8435 7.43755 37.6106


#grep "op/s" cephio_0%.txt | awk 'NF { print $(NF - 1) }'| awk '{ total +=
$0 } END { print total/NR }'
4970.26


With 4096 PG - Average IOPS - 7517
With 8192 PG - Average IOPS - 4970


For smaller bits with 8192, the performance is badly affected. As per our
test we are not adding any nodes in future. We mostly select 'Targets per
OSD' as 100 instead of 200/300.

Awaiting for comments to how to suit the best PG count as per the cluster
size or how to choose appropriate PG count.

ENV:-

Kraken - 11.2.0 - bluestore EC 4+1
RHEL 7.3
3.10.0-514.10.2.el7.x86_64
5 node - 5x68 - 340 OSD

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions on rbd-mirror

2017-03-27 Thread Dongsheng Yang

Hi Fulvio,

On 03/24/2017 07:19 PM, Fulvio Galeazzi wrote:
Hallo, apologies for my (silly) questions, I did try to find some doc 
on rbd-mirror but was unable to, apart from a number of pages 
explaining how to install it.


My environment is CenOS7 and Ceph 10.2.5.

Can anyone help me understand a few minor things:

 - is there a cleaner way to configure the user which will be used for
   rbd-mirror, other than editing the ExecStart in file 
/usr/lib/systemd/system/ceph-rbd-mirror@.service ?

   For example some line in ceph.conf... looks like the username
   defaults to the cluster name, am I right?


It should just be "ceph", no matter what the cluster name is, if I read 
the code correctly.


 - is it possible to throttle mirroring? Sure, it's a crazy thing to do
   for "cinder" pools, but may make sense for slowly changing ones, like
   a "glance" pool.


The rbd core team is working on this. Jason, right?


 - is it possible to set per-pool default features? I read about
"rbd default features = ###"
   but this is a global setting. (Ok, I can still restrict pools to be
   mirrored with "ceph auth" for the user doing mirroring)


"per-pool default features" sounds like a reasonable feature request.

About the "ceph auth" for mirroring, I am working on a rbd acl design,
will consider pool-level, namespace-level and image-level. Then I think
we can do a permission check on this.

Thanx
Yang



  Thanks!

Fulvio



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD image perf counters: usage, access

2017-03-27 Thread Masha Atakova

Hi Yang,

Thank you for your reply. This is very useful indeed that there are many 
ImageCtx objects for one image.


But in my setting, I don't have any particular ceph client connected to 
ceph (I could, but this is not the point). I'm trying to get metrics for 
particular image while not performing anything with it myself.


And I'm trying to get access to performance counters listed in the 
ImageCtx class, they don't seem to be reported by the perf tool.


Thanks!

On 27/03/17 12:29, Dongsheng Yang wrote:

Hi Masha
you can get the counters by perf dump command on the asok file of 
your client. such as that:

$ ceph --admin-daemon out/client.admin.9921.asok perf dump|grep rd
"rd": 656754,
"rd_bytes": 656754,
"rd_latency": {
"discard": 0,
"discard_bytes": 0,
"discard_latency": {
"omap_rd": 0,

But, note that, this is a counter of this one ImageCtx, but not the 
counter for this image. There are

possible several ImageCtxes reading or writing on the same image.

Yang

On 03/27/2017 12:23 PM, Masha Atakova wrote:


Hi everyone,

I was going around trying to figure out how to get ceph metrics on a 
more detailed level than daemons. Of course, I found and explored API 
for watching rados objects, but I'm more interested in getting 
metrics about RBD images. And while I could get list of objects for 
particular image, and then watch all of them, it doesn't seem like 
very efficient way to go about it.


I checked librbd API and there isn't anything helping with my goal.

So I went through the source code and found list of performance 
counters for image which are incremented by other parts of ceph when 
making corresponding operations: 
https://github.com/ceph/ceph/blob/master/src/librbd/ImageCtx.cc#L364


I have 2 questions about it:

1) is there any workaround to use those counters right now? maybe 
when compiling against ceph the code doing it. Looks like I need to 
be able to access particular ImageCtx object (instead of creating my 
own), and I just can't find appropriate class / part of the librbd 
allowing me to do so.


2) are there any plans on making those counters accessible via API 
like librbd or librados?


I see that these questions might be more appropriate for the devel 
list, but:


- it seems to me that question of getting ceph metrics is more 
interesting for those who use ceph


- I couldn't subscribe to it with an error provided below.

Thanks!

majord...@vger.kernel.org:
SMTP error from remote server for MAIL FROM command, host: vger.kernel.org 
(209.132.180.67) reason: 553 5.7.1 Hello [74.208.4.201], for your MAIL FROM 
address  policy analysis reported: Your address is not 
liked source for email


--- The header of the original message is following. ---

Received: from [192.168.1.10] ([223.206.146.181]) by mail.gmx.com (mrgmxus001
  [74.208.5.15]) with ESMTPSA (Nemesis) id 0M92q3-1d0LS03yov-00CTwW for
  ; Mon, 27 Mar 2017 05:55:46 +0200
To:majord...@vger.kernel.org
From: Masha Atakova
Subject: subscribe ceph-devel
Message-ID:<174d9bc0-b50d-fc80-ede8-5ba9d472e...@mail.com>
Date: Mon, 27 Mar 2017 10:55:43 +0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
  Thunderbird/45.7.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:Lau7llt7/MuJt+nRLjXIhY91IuCvCBJGtqDzxLgqkh2ERVkWeep
  5CDyh9GHW7QSodn80xWCPOOD2kvvnr6YxrB5R9SZ1iloI9VO2YoTXAauDq4mtWh+abwUOiY
  wQgj6YvUcLjfUinsh0t68Q9m3h3ufZIoKIeWhKFGbsRALqsvZjgWBVlaAR/V5Vt4O/wFJGG
  YULQ6/t4oDSsBuy4agFdQ==
X-UI-Out-Filterresults: notjunk:1;V01:K0:xLdjozptxu8=:nO7vxZvAbidrXk7gcv7Wqc
  Bjr14pXiTEv8gVIlRTZ78cNDEQthT557sAgBBRnJkDGXkP1efvEN2QqsZAzfa52Og4ysSFXub
  BPSiDOI0wkzxQMu1QHqWzvURobFX9LxrctwYB3k9nrOtHFgJwm0eQWfV1QKg7i0ESzT244u2c
  2xKpGGrhNUspJtEep97xjY3DyDvR3ApYx9x+RO9ZQAE0Is9AO0mBYqDR3NqrF1KzabJWuCA7I
  yu1y9N0QILgr/WmUf74qxeh1k20n+7yYuYPzgIl9Cm2vyrVu2ONUTJMpN2p+iUit8hhUsTuYQ
  /TNde22Q5OOCz+oGVhWq04J+CBP23VrEkent4kw2vhejDjQD/F2J4o2XkfkPt7ZqpMreGWBfB
  jtpfz4jHyp+voLlldhw7+cKUGY4ux8dihtlaCm9N3FQ2qvQ9CTsFuLsTNHNe7uRx5oeZgBFFh
  6t1OVBLlRR1wwSMDbx6vE5UTx47vbAtu5I/vyryQ1jVnzyQitjWE6iLMEC8faatMquOxJreoF
  4ALLNVStuHEkaGC0zimjQ5YkiFe6nHqxwsaYU7Vcy0j9GXTkiakh6kwluOyLqy5Q1e1FHPfSG
  /swFoOHGvb07bK81+G1OLT7nIIArC+NrsHGmsrycXpw9gvZGubLYoYSgRskhJ1F+QxCzspFK0
  XOgA5Ko3M3djFYkMM0S+xHHyVIIpUr4qQXv1sKuWUY63wlalu3JLwWn7t8CBhC2R0s/3ec0WT
  WD+iDs0hWe0INwfX+BNVWIuyzim7qKg8wbG95YWyAI9J9dyx7lv4VETd2Zf5raU1TgNFB/6OP
  RQrUx3O


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD image perf counters: usage, access

2017-03-27 Thread Dongsheng Yang


On 03/27/2017 04:06 PM, Masha Atakova wrote:


Hi Yang,


Hi Masha,


Thank you for your reply. This is very useful indeed that there are 
many ImageCtx objects for one image.


But in my setting, I don't have any particular ceph client connected 
to ceph (I could, but this is not the point). I'm trying to get 
metrics for particular image while not performing anything with it 
myself.




The perf counter you mentioned in your first mail, is just for one 
particular image client, that means,

these perf counter will disappear as the client disconnected.


And I'm trying to get access to performance counters listed in the 
ImageCtx class, they don't seem to be reported by the perf tool.




Do you mean get the perf counters via api? At first this counter is only 
for a particular ImageCtx (connected client), then you can

read the counters by the perf dump command in my last mail I think.


If you want to get the performance counter for an image (no matter how 
many ImageCtx, connected or disconnected),

maybe you need to wait this one:
http://pad.ceph.com/p/ceph-top

Yang


Thanks!

On 27/03/17 12:29, Dongsheng Yang wrote:

Hi Masha
you can get the counters by perf dump command on the asok file of 
your client. such as that:

$ ceph --admin-daemon out/client.admin.9921.asok perf dump|grep rd
"rd": 656754,
"rd_bytes": 656754,
"rd_latency": {
"discard": 0,
"discard_bytes": 0,
"discard_latency": {
"omap_rd": 0,

But, note that, this is a counter of this one ImageCtx, but not the 
counter for this image. There are

possible several ImageCtxes reading or writing on the same image.

Yang

On 03/27/2017 12:23 PM, Masha Atakova wrote:


Hi everyone,

I was going around trying to figure out how to get ceph metrics on a 
more detailed level than daemons. Of course, I found and explored 
API for watching rados objects, but I'm more interested in getting 
metrics about RBD images. And while I could get list of objects for 
particular image, and then watch all of them, it doesn't seem like 
very efficient way to go about it.


I checked librbd API and there isn't anything helping with my goal.

So I went through the source code and found list of performance 
counters for image which are incremented by other parts of ceph when 
making corresponding operations: 
https://github.com/ceph/ceph/blob/master/src/librbd/ImageCtx.cc#L364


I have 2 questions about it:

1) is there any workaround to use those counters right now? maybe 
when compiling against ceph the code doing it. Looks like I need to 
be able to access particular ImageCtx object (instead of creating my 
own), and I just can't find appropriate class / part of the librbd 
allowing me to do so.


2) are there any plans on making those counters accessible via API 
like librbd or librados?


I see that these questions might be more appropriate for the devel 
list, but:


- it seems to me that question of getting ceph metrics is more 
interesting for those who use ceph


- I couldn't subscribe to it with an error provided below.

Thanks!

majord...@vger.kernel.org:
SMTP error from remote server for MAIL FROM command, host: vger.kernel.org 
(209.132.180.67) reason: 553 5.7.1 Hello [74.208.4.201], for your MAIL FROM 
address  policy analysis reported: Your address is not 
liked source for email


--- The header of the original message is following. ---

Received: from [192.168.1.10] ([223.206.146.181]) by mail.gmx.com (mrgmxus001
  [74.208.5.15]) with ESMTPSA (Nemesis) id 0M92q3-1d0LS03yov-00CTwW for
  ; Mon, 27 Mar 2017 05:55:46 +0200
To:majord...@vger.kernel.org
From: Masha Atakova
Subject: subscribe ceph-devel
Message-ID:<174d9bc0-b50d-fc80-ede8-5ba9d472e...@mail.com>
Date: Mon, 27 Mar 2017 10:55:43 +0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
  Thunderbird/45.7.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:Lau7llt7/MuJt+nRLjXIhY91IuCvCBJGtqDzxLgqkh2ERVkWeep
  5CDyh9GHW7QSodn80xWCPOOD2kvvnr6YxrB5R9SZ1iloI9VO2YoTXAauDq4mtWh+abwUOiY
  wQgj6YvUcLjfUinsh0t68Q9m3h3ufZIoKIeWhKFGbsRALqsvZjgWBVlaAR/V5Vt4O/wFJGG
  YULQ6/t4oDSsBuy4agFdQ==
X-UI-Out-Filterresults: notjunk:1;V01:K0:xLdjozptxu8=:nO7vxZvAbidrXk7gcv7Wqc
  Bjr14pXiTEv8gVIlRTZ78cNDEQthT557sAgBBRnJkDGXkP1efvEN2QqsZAzfa52Og4ysSFXub
  BPSiDOI0wkzxQMu1QHqWzvURobFX9LxrctwYB3k9nrOtHFgJwm0eQWfV1QKg7i0ESzT244u2c
  2xKpGGrhNUspJtEep97xjY3DyDvR3ApYx9x+RO9ZQAE0Is9AO0mBYqDR3NqrF1KzabJWuCA7I
  yu1y9N0QILgr/WmUf74qxeh1k20n+7yYuYPzgIl9Cm2vyrVu2ONUTJMpN2p+iUit8hhUsTuYQ
  /TNde22Q5OOCz+oGVhWq04J+CBP23VrEkent4kw2vhejDjQD/F2J4o2XkfkPt7ZqpMreGWBfB
  jtpfz4jHyp+voLlldhw7+cKUGY4ux8dihtlaCm9N3FQ2qvQ9CTsFuLsTNHNe7uRx5oeZgBFFh
  6t1OVBLlRR1wwSMDbx6vE5UTx47vbAtu5I/vyryQ1jVnzyQitjWE6iLMEC8faatMquOxJreoF
  4ALLNVStuHEkaGC0zimjQ5YkiFe6nHqxwsaYU7Vcy0j9GXTkiakh6kwluOyLqy5Q1e1FHPfSG
  /swFoOHGvb07bK81+G1OLT7nIIArC+NrsHGmsrycXpw9gvZGubLYoYSgRskhJ1F+QxCzspFK0
  XOgA5Ko3M3djFYkMM0

[ceph-users] Kraken + Bluestore

2017-03-27 Thread Ashley Merrick
Hi,

Does anyone have any cluster of a decent scale running on Kraken and bluestore?

How are you finding it? Have you had any big issues arise?

Was it running non bluestore before and have you noticed any improvement? Read 
? Write? IOPS?

,Ashley
Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New hardware for OSDs

2017-03-27 Thread Mattia Belluco
Hello all,
we are currently in the process of buying new hardware to expand an
existing Ceph cluster that already has 1200 osds.
We are currently using 24 * 4 TB SAS drives per osd with an SSD journal
shared among 4 osds. For the upcoming expansion we were thinking of
switching to either 6 or 8 TB hard drives (9 or 12 per host) in order to
drive down space and cost requirements.

Has anyone any experience in mid-sized/large-sized deployment using such
hard drives? Our main concern is the rebalance time but we might be
overlooking some other aspects.

We currently use the cluster as storage for openstack services: Glance,
Cinder and VMs' ephemeral disks.

Thanks in advance for any advice.

Mattia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recompiling source code - to find exact RPM

2017-03-27 Thread nokia ceph
Hey Brad,

Many thanks for the explanation...

> ~~~
> WARNING: the following dangerous and experimental features are enabled:
> ~~~

> Can I ask why you want to disable this warning?

We using bluestore with kraken, we are aware that this is in tech preview.
To hide these warning compiled like this.

Thanks


On Mon, Mar 27, 2017 at 5:04 AM, Brad Hubbard  wrote:

>
>
> On Fri, Mar 24, 2017 at 6:49 PM, nokia ceph 
> wrote:
> > Brad, cool now we are on the same track :)
> >
> > So whatever change we made after this location src/* as it mapped to
> > respective rpm correct?
> >
> > For eg:-
> > src/osd/* -- ceph-osd
> > src/common - ceph-common
> > src/mon  - ceph-mon
> > src/mgr   - ceph-mgr
>
> I think this is true in most, if not all, cases.
>
> >
> > Since we are using bluestore with kraken, I though to disable the below
> > warning while triggering `ceph -s`
> >
> > ~~~
> > WARNING: the following dangerous and experimental features are enabled:
> > ~~~
>
> Can I ask why you want to disable this warning?
>
> >
> > Here I made a comment in this file
> >
> >>vim src/common/ceph_context.cc
> > 307 //  if (!cct->_experimental_features.empty())
> > 308 //  lderr(cct) << "WARNING: the following dangerous and
> experimental
> > features are enabled: "
> > 309 // << cct->_experimental_features << dendl;
>
> Right.
>
> >
> > As per my assumption, the change should reflect in this binary
> > "ceph-common"
>
> libceph-common specifically.
>
> >
> > But when I closely looked on librados library as these warning showing
> here
> > also.
> > #strings -a /usr/lib64/librados.so.2 | grep dangerous
> > WARNING: the following dangerous and experimental features are enabled:
> -->
> >
> > Then I conclude for this change ceph-common and librados were required.
> >
> > Please correct me if I'm wrong.
>
> So I looked at this on current master built on Fedora and see the
> following.
>
> $ for lib in $(find . \! -type l -type f -name lib\*); do strings
> $lib|grep "following dangerous and experimenta l"; if [ $? -eq 0 ]; then
> echo $lib; fi; done
> WARNING: the following dangerous and experimental features are enabled:
> ./libcephd.a
> WARNING: the following dangerous and experimental features are enabled:
> ./libceph-common.so.0
> WARNING: the following dangerous and experimental features are enabled:
> ./libcommon.a
>
> So in my case the only shared object that has this string is
> libceph-common.
> However, that library is dynamically linked to libceph-common.
>
> $ ldd librados.so.2.0.0|grep libceph-common
> libceph-common.so.0 => 
> /home/brad/working/src/ceph/build/lib/libceph-common.so.0
> (0x7faa2cf42000)
>
> I checked a rhel version and sure enough the string is there, because in
> that
> version on rhel/CentOS we statically linked libcommon.a into librados IIRC.
>
> # ldd librados.so.2.0.0|grep libceph-common
> #
>
> So if the string shows up in your librados then I'd suggest it is also
> statically linked ([1] we only changed this fairly recently) and you will
> need
> to replace it to reflect your change.
>
> [1] https://github.com/ceph/ceph/commit/8f7643792c9e6a3d1ba4a06ca7d09b
> 0de9af1443
>
> >
> > On Fri, Mar 24, 2017 at 5:41 AM, Brad Hubbard 
> wrote:
> >>
> >> Oh wow, I completely misunderstood your question.
> >>
> >> Yes, src/osd/PG.cc and src/osd/PG.h are compiled into the ceph-osd
> binary
> >> which
> >> is included in the ceph-osd rpm as you said in your OP.
> >>
> >> On Fri, Mar 24, 2017 at 3:10 AM, nokia ceph 
> >> wrote:
> >> > Hello Piotr,
> >> >
> >> > I didn't understand, could you please elaborate about this procedure
> as
> >> > mentioned in the last update.  It would be really helpful if you share
> >> > any
> >> > useful link/doc to understand what you actually meant. Yea correct,
> >> > normally
> >> > we do this procedure but it takes more time. But here my intention is
> to
> >> > how
> >> > to find out the rpm which caused the change. I think we are in
> opposite
> >> > direction.
> >> >
> >> >>> But wouldn't be faster and/or more convenient if you would just
> >> >>> recompile
> >> >>> binaries in-place (or use network symlinks)
> >> >
> >> > Thanks
> >> >
> >> >
> >> >
> >> > On Thu, Mar 23, 2017 at 6:47 PM, Piotr Dałek <
> piotr.da...@corp.ovh.com>
> >> > wrote:
> >> >>
> >> >> On 03/23/2017 02:02 PM, nokia ceph wrote:
> >> >>
> >> >>> Hello Piotr,
> >> >>>
> >> >>> We do customizing ceph code for our testing purpose. It's a part of
> >> >>> our
> >> >>> R&D :)
> >> >>>
> >> >>> Recompiling source code will create 38 rpm's out of these I need to
> >> >>> find
> >> >>> which one is the correct rpm which I made change in the source code.
> >> >>> That's
> >> >>> what I'm try to figure out.
> >> >>
> >> >>
> >> >> Yes, I understand that. But wouldn't be faster and/or more convenient
> >> >> if
> >> >> you would just recompile binaries in-place (or use network symlinks)
> >> >> instead
> >> >> of packaging entire Ceph and (re)installing its packages each 

Re: [ceph-users] New hardware for OSDs

2017-03-27 Thread Christian Balzer

Hello,

On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote:

> Hello all,
> we are currently in the process of buying new hardware to expand an
> existing Ceph cluster that already has 1200 osds.

That's quite sizable, is the expansion driven by the need for more space
(big data?) or to increase IOPS (or both)?

> We are currently using 24 * 4 TB SAS drives per osd with an SSD journal
> shared among 4 osds. For the upcoming expansion we were thinking of
> switching to either 6 or 8 TB hard drives (9 or 12 per host) in order to
> drive down space and cost requirements.
> 
> Has anyone any experience in mid-sized/large-sized deployment using such
> hard drives? Our main concern is the rebalance time but we might be
> overlooking some other aspects.
> 

If you researched the ML archives, you should already know to stay well
away from SMR HDDs. 

Both HGST and Seagate have large Enterprise HDDs that have
journals/caches (MediaCache in HGST speak IIRC) that drastically improve
write IOPS compared to plain HDDs.
Even with SSD journals you will want to consider those, as these new HDDs
will see at least twice the action than your current ones. 

Rebalance time is a concern of course, especially if your cluster like
most HDD based ones has these things throttled down to not impede actual
client I/O.

To get a rough idea, take a look at:
https://www.memset.com/tools/raid-calculator/

For Ceph with replication 3 and the typical PG distribution, assume 100
disks and the RAID6 with hotspares numbers are relevant.
For rebuild speed, consult your experience, you must have had a few
failures. ^o^

For example with a recovery speed of 100MB/s, a 1TB disk (used data with
Ceph actually) looks decent at 1:16000 DLO/y. 
At 5TB though it enters scary land

Christian

> We currently use the cluster as storage for openstack services: Glance,
> Cinder and VMs' ephemeral disks.
> 
> Thanks in advance for any advice.
> 
> Mattia
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] leveldb takes a lot of space

2017-03-27 Thread Wido den Hollander

> Op 26 maart 2017 om 9:44 schreef Niv Azriel :
> 
> 
> after network issues, ceph cluster fails.
> leveldb grows and takes a lot of space
> ceph mon cant write to leveldb because there is not enough space on
> filesystem.
> (there is a lot of ldb file on /var/lib/ceph/mon)
> 

It is normal that the database will grow as the MON will keep all historic 
OSDMaps when one or more PGs are not active+clean

> ceph compact on start is not helping.
> my erasure-code is too big.
> 
> how to fix it?

Make sure you have enough space available on your MONs, that is the main 
advise. Under normal operations <2GB should be enough, but it can grow much 
bigger.

On most clusters I design I make sure there is at least 200GB of space 
available on each MON on a fast DC-grade SSD.

Wido

> thanks in advanced
> 
> ceph version: jewel
> os : ubuntu16.04
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New hardware for OSDs

2017-03-27 Thread Wido den Hollander

> Op 27 maart 2017 om 13:22 schreef Christian Balzer :
> 
> 
> 
> Hello,
> 
> On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote:
> 
> > Hello all,
> > we are currently in the process of buying new hardware to expand an
> > existing Ceph cluster that already has 1200 osds.
> 
> That's quite sizable, is the expansion driven by the need for more space
> (big data?) or to increase IOPS (or both)?
> 
> > We are currently using 24 * 4 TB SAS drives per osd with an SSD journal
> > shared among 4 osds. For the upcoming expansion we were thinking of
> > switching to either 6 or 8 TB hard drives (9 or 12 per host) in order to
> > drive down space and cost requirements.
> > 
> > Has anyone any experience in mid-sized/large-sized deployment using such
> > hard drives? Our main concern is the rebalance time but we might be
> > overlooking some other aspects.
> > 
> 
> If you researched the ML archives, you should already know to stay well
> away from SMR HDDs. 
> 

Amen! Just don't. Stay away from SMR with Ceph.

> Both HGST and Seagate have large Enterprise HDDs that have
> journals/caches (MediaCache in HGST speak IIRC) that drastically improve
> write IOPS compared to plain HDDs.
> Even with SSD journals you will want to consider those, as these new HDDs
> will see at least twice the action than your current ones. 
> 

I also have good experiences with bcache on NVM-E device in Ceph clusters. A 
single Intel P3600/P3700 which is the caching device for bcache.

> Rebalance time is a concern of course, especially if your cluster like
> most HDD based ones has these things throttled down to not impede actual
> client I/O.
> 
> To get a rough idea, take a look at:
> https://www.memset.com/tools/raid-calculator/
> 
> For Ceph with replication 3 and the typical PG distribution, assume 100
> disks and the RAID6 with hotspares numbers are relevant.
> For rebuild speed, consult your experience, you must have had a few
> failures. ^o^
> 
> For example with a recovery speed of 100MB/s, a 1TB disk (used data with
> Ceph actually) looks decent at 1:16000 DLO/y. 
> At 5TB though it enters scary land
> 

Yes, those recoveries will take a long time. Let's say your 6TB drive is filled 
for 80% you need to rebalance 4.8TB

4.8TB / 100MB/sec = 13 hours rebuild time

13 hours is a long time. And you will probably not have 100MB/sec sustained, I 
think that 50MB/sec is much more realistic.

That means that a single disk failure will take >24 hours to recover from a 
rebuild.

I don't like very big disks that much. Not in RAID, not in Ceph.

Wido

> Christian
> 
> > We currently use the cluster as storage for openstack services: Glance,
> > Cinder and VMs' ephemeral disks.
> > 
> > Thanks in advance for any advice.
> > 
> > Mattia
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > 
> 
> 
> -- 
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object store backup tool recommendations

2017-03-27 Thread Blair Bethwaite
Thanks for the useful reply Robin and sorry for not getting back sooner...

> On Fri, Mar 03, 2017 at 18:01:00 +, Robin H. Johnson wrote:
> On Fri, Mar 03, 2017 at 10:55:06 +1100, Blair Bethwaite wrote:
>> Does anyone have any recommendations for good tools to perform
>> file-system/tree backups and restores to/from a RGW object store (Swift or
>> S3 APIs)? Happy to hear about both FOSS and commercial options please.
> This isn't Ceph specific, but is something that has come up for me, and
> I did a lot of research into it for the Gentoo distribution to use on
> it's infrastructure.

> The wiki page with all of our needs & contenders is here:
> https://wiki.gentoo.org/wiki/Project:Infrastructure/Backups_v3

That's a useful resource.

> TL;DR: restic is probably the closest fit to your needs, but do evaluate
> it carefully.

Yes I agree, restic does look like a decent fit and we are planning to
trial it soon. Though it took me a while to find that it does in fact
support object storage, as that info is buried in the usage docs and
(I thought somewhat bizarrely) not mentioned as a prominent feature.

Anybody else have recommendations? I'm surprised there were not more
suggestions, perhaps (OpenStack-)Swift users will have some
suggestions...

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object store backup tool recommendations

2017-03-27 Thread Blair Bethwaite
I suppose the other option here, which I initially dismissed because
Red Hat are not supporting it, is to have a CephFS dir/tree bound to a
cache-tier fronted EC pool. Is anyone having luck with such a setup?

On 3 March 2017 at 21:40, Blair Bethwaite  wrote:
> Hi Marc,
>
> Whilst I agree CephFS would probably help compared to your present solution,
> what I'm looking for something that can talk to a the RadosGW restful object
> storage APIs, so that the backing storage can be durable and low-cost, i.e.,
> on an erasure coded pool. In this case we're looking to backup a Lustre
> filesystem.
>
> Cheers,
>
> On 3 March 2017 at 21:29, Marc Roos  wrote:
>>
>>
>> Hi Blair,
>>
>> We are also thinking of using ceph for 'backup'. At the moment we are
>> using rsync and hardlinks on a drbd setup. But I think when using cephfs
>> things could speed up, because file information is gotten from the mds
>> daemon, so this should save on one rsync file lookup, and we expect that
>> we can run more tasks in parallel.
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Blair Bethwaite [mailto:blair.bethwa...@gmail.com]
>> Sent: vrijdag 3 maart 2017 0:55
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] object store backup tool recommendations
>>
>> Hi all,
>>
>> Does anyone have any recommendations for good tools to perform
>> file-system/tree backups and restores to/from a RGW object store (Swift
>> or S3 APIs)? Happy to hear about both FOSS and commercial options
>> please.
>>
>> I'm interested in:
>> 1) tools known to work or not work at all for a basic file-based data
>> backup
>>
>> Plus these extras:
>> 2) preserves/restores correct file metadata (e.g. owner, group, acls
>> etc)
>> 3) preserves/restores xattrs
>> 4) backs up empty directories and files
>> 5) supports some sort of snapshot/versioning/differential functionality,
>> i.e., will keep a copy or diff or last N versions of a file or whole
>> backup set, e.g., so that one can restore yesterday's file/s or last
>> week's but not have to keep two full copies to achieve it
>> 6) is readily able to restore individual files
>> 7) can encrypt/decrypt client side
>>
>> 8) anything else I should be considering
>>
>> --
>>
>> Cheers,
>> ~Blairo
>>
>>
>
>
>
> --
> Cheers,
> ~Blairo



-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs cannot match up with fast OSD map changes (epochs) during recovery

2017-03-27 Thread Wido den Hollander

> Op 27 maart 2017 om 8:41 schreef Muthusamy Muthiah 
> :
> 
> 
> Hi Wido,
> 
> Yes slow map update was happening and CPU hitting 100%.

So it indeed seems you are CPU bound at that moment. That's indeed a problem 
when you have a lot of map changes to work through on the OSDs.

It's recommended to have 1 CPU core per OSD as during recovery/boot this power 
is needed badly by the OSDs.

> We also tried to set noup flag to true so that the cluster osdmap remained
> in same version . This made each OSD updated to the current map slowly . At
> one point we lost patience due to critical timelines and re-insalled the
> cluster. However we plan to do this recovery again and find optimum
> procedure for recovery .

The noup flag can indeed 'help' here to prevent new maps from being produced.

> Sage was commenting that there is another solution available in Luminous
> which would recover the OSDs at much faster rate than the current one by
> skipping some maps instead of going in sequential way.

I am not aware of those improvements. Sage (or another dev) would need to 
comment on that.

Wido

> 
> Thanks,
> Muthu
> 
> On 20 March 2017 at 22:13, Wido den Hollander  wrote:
> 
> >
> > > Op 18 maart 2017 om 10:39 schreef Muthusamy Muthiah <
> > muthiah.muthus...@gmail.com>:
> > >
> > >
> > > Hi,
> > >
> > > We had similar issue on one of the 5 node cluster cluster again during
> > > recovery(200/335 OSDs are to be recovered)  , we see a lot of differences
> > > in the OSDmap epocs between OSD which is booting and the current one same
> > > is below,
> > >
> > > -  In the current situation the OSD are trying to register with
> > an
> > > old OSDMAP version *7620 * but he current version in the cluster is
> > > higher  *13102
> > > *version – as a result it takes longer for OSD to update to this version
> > ..
> > >
> >
> > Do you see these OSDs eating 100% CPU at that moment? Eg, could it be that
> > the CPUs are not fast enough to process all the map updates quick enough.
> >
> > iirc map updates are not processed multi-threaded.
> >
> > Wido
> >
> > >
> > > We also see 2017-03-18 09:19:04.628206 7f2056735700 0 --
> > > 10.139.4.69:6836/777372 >> - conn(0x7f20c1bfa800 :6836
> > > s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 l=0).fault with nothing to
> > > send and in the half accept state just closed messages on many osds which
> > > are recovering.
> > >
> > > Suggestions would be helpful.
> > >
> > >
> > > Thanks,
> > >
> > > Muthu
> > >
> > > On 13 February 2017 at 18:14, Wido den Hollander  wrote:
> > >
> > > >
> > > > > Op 13 februari 2017 om 12:57 schreef Muthusamy Muthiah <
> > > > muthiah.muthus...@gmail.com>:
> > > > >
> > > > >
> > > > > Hi All,
> > > > >
> > > > > We also have same issue on one of our platforms which was upgraded
> > from
> > > > > 11.0.2 to 11.2.0 . The issue occurs on one node alone where CPU hits
> > 100%
> > > > > and OSDs of that node marked down. Issue not seen on cluster which
> > was
> > > > > installed from scratch with 11.2.0.
> > > > >
> > > >
> > > > How many maps is this OSD behind?
> > > >
> > > > Does it help if you set the nodown flag for a moment to let it catch
> > up?
> > > >
> > > > Wido
> > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *[r...@cn3.c7.vna ~] # systemctl start ceph-osd@315.service
> > > > >  [r...@cn3.c7.vna ~] # cd /var/log/ceph/
> > > > > [r...@cn3.c7.vna ceph] # tail -f *osd*315.log 2017-02-13
> > 11:29:46.752897
> > > > > 7f995c79b940  0 
> > > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_
> > > > 64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/
> > > > centos7/MACHINE_SIZE/huge/release/11.2.0/rpm/el7/BUILD/
> > > > ceph-11.2.0/src/cls/hello/cls_hello.cc:296:
> > > > > loading cls_hello 2017-02-13 11:29:46.753065 7f995c79b940  0
> > _get_class
> > > > not
> > > > > permitted to load kvs 2017-02-13 11:29:46.757571 7f995c79b940  0
> > > > _get_class
> > > > > not permitted to load lua 2017-02-13 11:29:47.058720 7f995c79b940  0
> > > > > osd.315 44703 crush map has features 288514119978713088, adjusting
> > msgr
> > > > > requires for clients 2017-02-13 11:29:47.058728 7f995c79b940  0
> > osd.315
> > > > > 44703 crush map has features 288514394856620032 was 8705, adjusting
> > msgr
> > > > > requires for mons 2017-02-13 11:29:47.058732 7f995c79b940  0 osd.315
> > > > 44703
> > > > > crush map has features 288531987042664448, adjusting msgr requires
> > for
> > > > osds
> > > > > 2017-02-13 11:29:48.343979 7f995c79b940  0 osd.315 44703 load_pgs
> > > > > 2017-02-13 11:29:55.913550 7f995c79b940  0 osd.315 44703 load_pgs
> > opened
> > > > > 130 pgs 2017-02-13 11:29:55.913604 7f995c79b940  0 osd.315 44703
> > using 1
> > > > op
> > > > > queue with priority op cut off at 64. 2017-02-13 11:29:55.914102
> > > > > 7f995c79b940 -1 osd.315 44703 log_to_monitors {def

[ceph-users] 答复: leveldb takes a lot of space

2017-03-27 Thread Chenyehua
@ Niv Azriel  : What is your leveldb version and has it been fixed now?
@ Wido den Hollander : I also meet a similar problem:
the size of my leveldb is about 
17GB(300+ osds), there are a lot of sst files(each sst file is 2MB) in 
/var/lib/ceph/mon. (a network abnormal situation once happened)
The leveldb version is 1.2 (ubuntu 
12.04, ceph 0.94.5)

-邮件原件-
发件人: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] 代表 Wido den Hollander
发送时间: 2017年3月27日 19:30
收件人: ceph-users@lists.ceph.com; Niv Azriel
主题: Re: [ceph-users] leveldb takes a lot of space


> Op 26 maart 2017 om 9:44 schreef Niv Azriel :
>
>
> after network issues, ceph cluster fails.
> leveldb grows and takes a lot of space ceph mon cant write to leveldb
> because there is not enough space on filesystem.
> (there is a lot of ldb file on /var/lib/ceph/mon)
>

It is normal that the database will grow as the MON will keep all historic 
OSDMaps when one or more PGs are not active+clean

> ceph compact on start is not helping.
> my erasure-code is too big.
>
> how to fix it?

Make sure you have enough space available on your MONs, that is the main 
advise. Under normal operations <2GB should be enough, but it can grow much 
bigger.

On most clusters I design I make sure there is at least 200GB of space 
available on each MON on a fast DC-grade SSD.

Wido

> thanks in advanced
>
> ceph version: jewel
> os : ubuntu16.04
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-
本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出
的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、
或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本
邮件!
This e-mail and its attachments contain confidential information from H3C, 
which is
intended only for the person or entity whose address is listed above. Any use 
of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender
by phone or email immediately and delete it!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions on rbd-mirror

2017-03-27 Thread Jason Dillaman
On Mon, Mar 27, 2017 at 4:00 AM, Dongsheng Yang
 wrote:
> Hi Fulvio,
>
> On 03/24/2017 07:19 PM, Fulvio Galeazzi wrote:
>
> Hallo, apologies for my (silly) questions, I did try to find some doc on
> rbd-mirror but was unable to, apart from a number of pages explaining how to
> install it.
>
> My environment is CenOS7 and Ceph 10.2.5.
>
> Can anyone help me understand a few minor things:
>
>  - is there a cleaner way to configure the user which will be used for
>rbd-mirror, other than editing the ExecStart in file
> /usr/lib/systemd/system/ceph-rbd-mirror@.service ?
>For example some line in ceph.conf... looks like the username
>defaults to the cluster name, am I right?
>
>
> It should just be "ceph", no matter what the cluster name is, if I read the
> code correctly.

The user id is passed in via the systemd instance name. For example,
if you wanted to use the "mirror" user id to connect to the local
cluster, you would run "systemctl enable ceph-rbd-mirror@mirror".

>  - is it possible to throttle mirroring? Sure, it's a crazy thing to do
>for "cinder" pools, but may make sense for slowly changing ones, like
>a "glance" pool.
>
>
> The rbd core team is working on this. Jason, right?

This is in our backlog of desired items for the rbd-mirror daemon.
Having different settings for different pools was not in our original
plan, but this is something that also came up during the Vault
conference last week. I've added an additional backlog item to cover
per-pool settings.

>  - is it possible to set per-pool default features? I read about
> "rbd default features = ###"
>but this is a global setting. (Ok, I can still restrict pools to be
>mirrored with "ceph auth" for the user doing mirroring)
>
>
> "per-pool default features" sounds like a reasonable feature request.
>
> About the "ceph auth" for mirroring, I am working on a rbd acl design,
> will consider pool-level, namespace-level and image-level. Then I think
> we can do a permission check on this.

Right now, the best way to achieve that is by using different configs
/ user ids for different services. For example, if OpenStack glance
used "glance" and cinder user "cinder", the ceph.conf's
"[client.glance]" section could have different default features as
compared to a "[client.cinder]" section.

> Thanx
> Yang
>
>
>
>   Thanks!
>
> Fulvio
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Questions on rbd-mirror

2017-03-27 Thread Dongsheng Yang

Jason,
do you think it's good idea to introduce a rbd_config object to
record some configurations of per-pool, such as default_features.

That means, we can set some configurations differently in different
pool. In this way, we can also handle the per-pool setting in rbd-mirror.

Thanx
Yang

On 27/03/2017, 21:20, Jason Dillaman wrote:

On Mon, Mar 27, 2017 at 4:00 AM, Dongsheng Yang
 wrote:

Hi Fulvio,

On 03/24/2017 07:19 PM, Fulvio Galeazzi wrote:

Hallo, apologies for my (silly) questions, I did try to find some doc on
rbd-mirror but was unable to, apart from a number of pages explaining how to
install it.

My environment is CenOS7 and Ceph 10.2.5.

Can anyone help me understand a few minor things:

  - is there a cleaner way to configure the user which will be used for
rbd-mirror, other than editing the ExecStart in file
/usr/lib/systemd/system/ceph-rbd-mirror@.service ?
For example some line in ceph.conf... looks like the username
defaults to the cluster name, am I right?


It should just be "ceph", no matter what the cluster name is, if I read the
code correctly.

The user id is passed in via the systemd instance name. For example,
if you wanted to use the "mirror" user id to connect to the local
cluster, you would run "systemctl enable ceph-rbd-mirror@mirror".


  - is it possible to throttle mirroring? Sure, it's a crazy thing to do
for "cinder" pools, but may make sense for slowly changing ones, like
a "glance" pool.


The rbd core team is working on this. Jason, right?

This is in our backlog of desired items for the rbd-mirror daemon.
Having different settings for different pools was not in our original
plan, but this is something that also came up during the Vault
conference last week. I've added an additional backlog item to cover
per-pool settings.


  - is it possible to set per-pool default features? I read about
 "rbd default features = ###"
but this is a global setting. (Ok, I can still restrict pools to be
mirrored with "ceph auth" for the user doing mirroring)


"per-pool default features" sounds like a reasonable feature request.

About the "ceph auth" for mirroring, I am working on a rbd acl design,
will consider pool-level, namespace-level and image-level. Then I think
we can do a permission check on this.

Right now, the best way to achieve that is by using different configs
/ user ids for different services. For example, if OpenStack glance
used "glance" and cinder user "cinder", the ceph.conf's
"[client.glance]" section could have different default features as
compared to a "[client.cinder]" section.


Thanx
Yang



   Thanks!

 Fulvio



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] libjemalloc.so.1 not used?

2017-03-27 Thread Engelmann Florian
Hi,

we are testing Ceph as block storage (XFS based OSDs) running in a hyper 
converged setup with KVM as hypervisor. We are using NVMe SSD only (Intel DC 
P5320) and I would like to use jemalloc on Ubuntu xenial (current kernel  
4.4.0-64-generic). I tried to use /etc/default/ceph and uncommented:


# /etc/default/ceph
#
# Environment file for ceph daemon systemd unit files.
#

# Increase tcmalloc cache size
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728

## use jemalloc instead of tcmalloc
#
# jemalloc is generally faster for small IO workloads and when
# ceph-osd is backed by SSDs.  However, memory usage is usually
# higher by 200-300mb.
#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1

and it looks like the OSDs are using jemalloc:

lsof |grep -e "ceph-osd.*8074.*malloc"
ceph-osd   8074   ceph  mem   REG  252,0   
294776 659213 /usr/lib/libtcmalloc.so.4.2.6
ceph-osd   8074   ceph  mem   REG  252,0   
219816 658861 /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
ceph-osd   8074  8116 ceph  mem   REG  252,0   
294776 659213 /usr/lib/libtcmalloc.so.4.2.6
ceph-osd   8074  8116 ceph  mem   REG  252,0   
219816 658861 /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
ceph-osd   8074  8117 ceph  mem   REG  252,0   
294776 659213 /usr/lib/libtcmalloc.so.4.2.6
ceph-osd   8074  8117 ceph  mem   REG  252,0   
219816 658861 /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
ceph-osd   8074  8118 ceph  mem   REG  252,0   
294776 659213 /usr/lib/libtcmalloc.so.4.2.6
ceph-osd   8074  8118 ceph  mem   REG  252,0   
219816 658861 /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
[...]

But perf top shows something different:

Samples: 11M of event 'cycles:pp', Event count (approx.): 603904862529620   

   
Overhead  Shared Object Symbol  

   
   1.86%  libtcmalloc.so.4.2.6  [.] operator new[]
   1.73%  [kernel]  [k] mem_cgroup_iter
   1.34%  libstdc++.so.6.0.21   [.] std::__ostream_insert >
   1.29%  libpthread-2.23.so[.] pthread_mutex_lock
   1.10%  [kernel]  [k] __switch_to
   0.97%  libpthread-2.23.so[.] pthread_mutex_unlock
   0.94%  [kernel]  [k] 
native_queued_spin_lock_slowpath
   0.92%  [kernel]  [k] update_cfs_shares
   0.90%  libc-2.23.so  [.] __memcpy_avx_unaligned
   0.87%  libtcmalloc.so.4.2.6  [.] operator delete[]
   0.80%  ceph-osd  [.] ceph::buffer::ptr::release
   0.80%  [kernel]  [k] mem_cgroup_zone_lruvec


Do my OSDs use jemalloc or don't they?

All the best,
Florian




EveryWare AG
Florian Engelmann
Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

T  +41 44 466 60 00
F  +41 44 466 60 10

florian.engelm...@everyware.ch
www.everyware.ch


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New hardware for OSDs

2017-03-27 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Wido den Hollander
> Sent: 27 March 2017 12:35
> To: ceph-users@lists.ceph.com; Christian Balzer 
> Subject: Re: [ceph-users] New hardware for OSDs
> 
> 
> > Op 27 maart 2017 om 13:22 schreef Christian Balzer :
> >
> >
> >
> > Hello,
> >
> > On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote:
> >
> > > Hello all,
> > > we are currently in the process of buying new hardware to expand an
> > > existing Ceph cluster that already has 1200 osds.
> >
> > That's quite sizable, is the expansion driven by the need for more
> > space (big data?) or to increase IOPS (or both)?
> >
> > > We are currently using 24 * 4 TB SAS drives per osd with an SSD
> > > journal shared among 4 osds. For the upcoming expansion we were
> > > thinking of switching to either 6 or 8 TB hard drives (9 or 12 per
> > > host) in order to drive down space and cost requirements.
> > >
> > > Has anyone any experience in mid-sized/large-sized deployment using
> > > such hard drives? Our main concern is the rebalance time but we
> > > might be overlooking some other aspects.
> > >
> >
> > If you researched the ML archives, you should already know to stay
> > well away from SMR HDDs.
> >
> 
> Amen! Just don't. Stay away from SMR with Ceph.
> 
> > Both HGST and Seagate have large Enterprise HDDs that have
> > journals/caches (MediaCache in HGST speak IIRC) that drastically
> > improve write IOPS compared to plain HDDs.
> > Even with SSD journals you will want to consider those, as these new
> > HDDs will see at least twice the action than your current ones.
> >

I've got a mixture of WD Red Pro 6TB and HGST He8 8TB drives. Recovery for
~70% full disks takes around 3-4 hours, this is for a cluster containing 60
OSD's. I'm usually seeing recovery speeds up around 1GB/s or more.

Depends on your workload, mine is for archiving/backups so big disks are a
must. I wouldn't recommend using them for more active workloads unless you
are planning a beefy cache tier or some other sort of caching solution.

The He8 (and He10) drives also use a fair bit less power due to less
friction, but I think this only applies to the sata model. My 12x3.5 8TB
node with CPU...etc uses ~140W at idle. Hoping to get this down further with
a new Xeon-D design on next expansion phase.

The only thing I will say about big disks is beware of cold FS
inodes/dentry's and PG splitting. The former isn't a problem if you will
only be actively accessing a small portion of your data, but I see increases
in latency if I access cold data even with VFS cache pressure set to 1.
Currently investigating using bcache under the OSD to try and cache this.

PG splitting becomes a problem when the disks start to fill up, playing with
the split/merge thresholds may help, but you have to be careful you don't
end up with massive splits when they do finally happen, as otherwise OSD's
start timing out.

> 
> I also have good experiences with bcache on NVM-E device in Ceph clusters.
> A single Intel P3600/P3700 which is the caching device for bcache.
> 
> > Rebalance time is a concern of course, especially if your cluster like
> > most HDD based ones has these things throttled down to not impede
> > actual client I/O.
> >
> > To get a rough idea, take a look at:
> > https://www.memset.com/tools/raid-calculator/
> >
> > For Ceph with replication 3 and the typical PG distribution, assume
> > 100 disks and the RAID6 with hotspares numbers are relevant.
> > For rebuild speed, consult your experience, you must have had a few
> > failures. ^o^
> >
> > For example with a recovery speed of 100MB/s, a 1TB disk (used data
> > with Ceph actually) looks decent at 1:16000 DLO/y.
> > At 5TB though it enters scary land
> >
> 
> Yes, those recoveries will take a long time. Let's say your 6TB drive is
filled for
> 80% you need to rebalance 4.8TB
> 
> 4.8TB / 100MB/sec = 13 hours rebuild time
> 
> 13 hours is a long time. And you will probably not have 100MB/sec
> sustained, I think that 50MB/sec is much more realistic.

Are we talking backfill or recovery here? Recovery will go at the combined
speed of all the disks in the cluster. If the OP's cluster is already at
1200 OSD's, a single disk will be a tiny percentage per OSD to recover. But
yes, backfill will probably crawl along at 50MB/s, but is this a problem?

> 
> That means that a single disk failure will take >24 hours to recover from
a
> rebuild.
> 
> I don't like very big disks that much. Not in RAID, not in Ceph.
> 
> Wido
> 
> > Christian
> >
> > > We currently use the cluster as storage for openstack services:
> > > Glance, Cinder and VMs' ephemeral disks.
> > >
> > > Thanks in advance for any advice.
> > >
> > > Mattia
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> >
> > --
> > Christian BalzerNetwork/S

Re: [ceph-users] New hardware for OSDs

2017-03-27 Thread Mattia Belluco
I mistakenly answered to Wido instead of the whole Mailing list ( weird
ml settings I suppose)

Here it is my message:


Thanks for replying so quickly. I commented inline.

On 03/27/2017 01:34 PM, Wido den Hollander wrote:
> 
>> Op 27 maart 2017 om 13:22 schreef Christian Balzer :
>>
>>
>>
>> Hello,
>>
>> On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote:
>>
>>> Hello all,
>>> we are currently in the process of buying new hardware to expand an
>>> existing Ceph cluster that already has 1200 osds.
>>
>> That's quite sizable, is the expansion driven by the need for more space
>> (big data?) or to increase IOPS (or both)?
>>
>>> We are currently using 24 * 4 TB SAS drives per osd with an SSD journal
>>> shared among 4 osds. For the upcoming expansion we were thinking of
>>> switching to either 6 or 8 TB hard drives (9 or 12 per host) in order to
>>> drive down space and cost requirements.
>>>
>>> Has anyone any experience in mid-sized/large-sized deployment using such
>>> hard drives? Our main concern is the rebalance time but we might be
>>> overlooking some other aspects.
>>>
>>
>> If you researched the ML archives, you should already know to stay well
>> away from SMR HDDs. 
>>
> 
> Amen! Just don't. Stay away from SMR with Ceph.
> 
We were planning on using regular enterprise disks. No SMR :)
We are bit puzzled about the possible performance gain of the 4k native
ones but that's about it.

>> Both HGST and Seagate have large Enterprise HDDs that have
>> journals/caches (MediaCache in HGST speak IIRC) that drastically improve
>> write IOPS compared to plain HDDs.
>> Even with SSD journals you will want to consider those, as these new HDDs
>> will see at least twice the action than your current ones. 
>>
> 
> I also have good experiences with bcache on NVM-E device in Ceph clusters. A 
> single Intel P3600/P3700 which is the caching device for bcache.
> 
No experience with those but I am a bit skeptical in including new
solutions in the current cluster as the current setup seems to work
quite well (no IOPS problem).
Those could be a nice solution for a new cluster, though.


>> Rebalance time is a concern of course, especially if your cluster like
>> most HDD based ones has these things throttled down to not impede actual
>> client I/O.
>>
>> To get a rough idea, take a look at:
>> https://www.memset.com/tools/raid-calculator/
>>
>> For Ceph with replication 3 and the typical PG distribution, assume 100
>> disks and the RAID6 with hotspares numbers are relevant.
>> For rebuild speed, consult your experience, you must have had a few
>> failures. ^o^
>>
>> For example with a recovery speed of 100MB/s, a 1TB disk (used data with
>> Ceph actually) looks decent at 1:16000 DLO/y. 
>> At 5TB though it enters scary land
>>
> 
> Yes, those recoveries will take a long time. Let's say your 6TB drive is 
> filled for 80% you need to rebalance 4.8TB
> 
> 4.8TB / 100MB/sec = 13 hours rebuild time
> 
> 13 hours is a long time. And you will probably not have 100MB/sec sustained, 
> I think that 50MB/sec is much more realistic.
> 
> That means that a single disk failure will take >24 hours to recover from a 
> rebuild.
> 
> I don't like very big disks that much. Not in RAID, not in Ceph.
I don't think I am followinj the calculations. Maybe I need to provide a
few more details on our current network configuration:
each host (24 disks/osds) has 4 * 10 Gbit interfaces, 2 for client I/O
and 2 for the recovery network.
Rebalancing an OSD that was 50% full (2000GB) with the current setup
tool a little less than 30 mins. It would still take 1.5 hour to
rebalance 6 TB of data but that should still be reasonable,no?
What am I overlooking here?

>From our perspective having 9 * 8TB noded should provide a better
recovery time than the current 24 * 4TB ones if a whole node goes down
provide the rebalance is shared among several hundreds osds.

Thanks for any additional input.
Mattia


> 
> Wido
> 
>> Christian
>>
[snip]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osds down after upgrade hammer to jewel

2017-03-27 Thread Jaime Ibar

Hi all,

I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.

The ceph cluster has 3 servers (one mon and one mds each) and another 6 
servers with

12 osds each.
The monitoring and mds have been succesfully upgraded to latest jewel 
release, however

after upgrade the first osd server(12 osds), ceph is not aware of them and
are marked as down

ceph -s

 cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
 health HEALTH_WARN
[...]
12/72 in osds are down
noout flag(s) set
 osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
flags noout
[...]

ceph osd tree

3   3.64000 osd.3  down  1.0 1.0
 8   3.64000 osd.8  down  1.0 1.0
14   3.64000 osd.14 down  1.0 1.0
18   3.64000 osd.18 down  1.0  1.0
21   3.64000 osd.21 down  1.0  1.0
28   3.64000 osd.28 down  1.0  1.0
31   3.64000 osd.31 down  1.0  1.0
37   3.64000 osd.37 down  1.0  1.0
42   3.64000 osd.42 down  1.0  1.0
47   3.64000 osd.47 down  1.0  1.0
51   3.64000 osd.51 down  1.0  1.0
56   3.64000 osd.56 down  1.0  1.0

If I run this command with one of the down osd
ceph osd in 14
osd.14 is already in.
however ceph doesn't mark it as up and the cluster health remains
in degraded state.

Do I have to upgrade all the osds to jewel first?
Any help as I'm running out of ideas?

Thanks
Jaime

--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
Tel: +353-1-896-3725

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw global quotas - how to set in jewel?

2017-03-27 Thread Graham Allan
I'm following up to myself here, but I'd love to hear if anyone knows 
how the global quotas can be set in jewel's radosgw. I haven't found 
anything which has an effect - the documentation says to use:


radosgw-admin region-map get > regionmap.json
...edit the json file
radosgw-admin region-map set < regionmap.json

but this has no effect on jewel. There doesn't seem to be any analogous 
function in the "period"-related commands which I think would be the 
right place to look for jewel.


Am I missing something, or should I open a bug?

Graham

On 03/21/2017 03:18 PM, Graham Allan wrote:

On 03/17/2017 11:47 AM, Casey Bodley wrote:


On 03/16/2017 03:47 PM, Graham Allan wrote:

This might be a dumb question, but I'm not at all sure what the
"global quotas" in the radosgw region map actually do.

It is like a default quota which is applied to all users or buckets,
without having to set them individually, or is it a blanket/aggregate
quota applied across all users and buckets in the region/zonegroup?

Graham


They're defaults that are applied in the absence of quota settings on
specific users/buckets, not aggregate quotas. I agree that the
documentation in http://docs.ceph.com/docs/master/radosgw/admin/ is not
clear about the relationship between 'default quotas' and 'global
quotas' - they're basically the same thing, except for their scope.


Thanks, that's great to know, and exactly what I hoped it would do. It
seemed most likely but not 100% obvious!

My next question is how to set/enable the master quota, since I'm not
sure that the documented procedure still works for jewel. Although
radosgw-admin doesn't acknowledge the "region-map" command in its help
output any more, it does accept it, however the "region-map set" appears
to have no effect.

I think I should be using the radosgw-admin period commands, but it's
not clear to me how I can update the quotas within the period_config

G.


--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-27 Thread Hall, Eric
In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), using 
libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph hosts, 
we occasionally see hung processes (usually during boot, but otherwise as 
well), with errors reported in the instance logs as shown below.  Configuration 
is vanilla, based on openstack/ceph docs.

Neither the compute hosts nor the ceph hosts appear to be overloaded in terms 
of memory or network bandwidth, none of the 67 osds are over 80% full, nor do 
any of them appear to be overwhelmed in terms of IO.  Compute hosts and ceph 
cluster are connected via a relatively quiet 1Gb network, with an IBoE net 
between the ceph nodes.  Neither network appears overloaded.

I don’t see any related (to my eye) errors in client or server logs, even with 
20/20 logging from various components (rbd, rados, client, objectcacher, etc.)  
I’ve increased the qemu file descriptor limit (currently 64k... overkill for 
sure.)

I “feels” like a performance problem, but I can’t find any capacity issues or 
constraining bottlenecks. 

Any suggestions or insights into this situation are appreciated.  Thank you for 
your time,
--
Eric


[Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more than 120 
seconds.
[Fri Mar 24 20:30:40 2017]       Not tainted 3.13.0-52-generic #85-Ubuntu
[Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[Fri Mar 24 20:30:40 2017] jbd2/vda1-8     D 88043fd13180     0   226      
2 0x
[Fri Mar 24 20:30:40 2017]  88003728bbd8 0046 88042690 
88003728bfd8
[Fri Mar 24 20:30:40 2017]  00013180 00013180 88042690 
88043fd13a18
[Fri Mar 24 20:30:40 2017]  88043ffb9478 0002 811ef7c0 
88003728bc50
[Fri Mar 24 20:30:40 2017] Call Trace:
[Fri Mar 24 20:30:40 2017]  [] ? generic_block_bmap+0x50/0x50
[Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
[Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
[Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
[Fri Mar 24 20:30:40 2017]  [] ? generic_block_bmap+0x50/0x50
[Fri Mar 24 20:30:40 2017]  [] 
out_of_line_wait_on_bit+0x77/0x90
[Fri Mar 24 20:30:40 2017]  [] ? 
autoremove_wake_function+0x40/0x40
[Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
[Fri Mar 24 20:30:40 2017]  [] 
jbd2_journal_commit_transaction+0x185d/0x1ab0
[Fri Mar 24 20:30:40 2017]  [] ? 
try_to_del_timer_sync+0x4f/0x70
[Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
[Fri Mar 24 20:30:40 2017]  [] ? 
prepare_to_wait_event+0x100/0x100
[Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
[Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
[Fri Mar 24 20:30:40 2017]  [] ? 
kthread_create_on_node+0x1c0/0x1c0
[Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
[Fri Mar 24 20:30:40 2017]  [] ? 
kthread_create_on_node+0x1c0/0x1c0



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph OSD network with IPv6 SLAAC networks?

2017-03-27 Thread Richard Hesse
Has anyone run their Ceph OSD cluster network on IPv6 using SLAAC? I know
that ceph supports IPv6, but I'm not sure how it would deal with the
address rotation in SLAAC, permanent vs outgoing address, etc. It would be
very nice for me, as I wouldn't have to run any kind of DHCP server or use
static addressing -- just configure RA's and go.

On that note, does anyone have any experience with running ceph in a mixed
v4 and v6 environment?

Thanks,
-richard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] disk timeouts in libvirt/qemu VMs...

2017-03-27 Thread Peter Maloney
I can't guarantee it's the same as my issue, but from that it sounds the
same.

Jewel 10.2.4, 10.2.5 tested
hypervisors are proxmox qemu-kvm, using librbd
3 ceph nodes with mon+osd on each

-faster journals, more disks, bcache, rbd_cache, fewer VMs on ceph, iops
and bw limits on client side, jumbo frames, etc. all improve/smooth out
performance and mitigate the hangs, but don't prevent it.
-hangs are usually associated with blocked requests (I set the complaint
time to 5s to see them)
-hangs are very easily caused by rbd snapshot + rbd export-diff to do
incremental backup (one snap persistent, plus one more during backup)
-when qemu VM io hangs, I have to kill -9 the qemu process for it to
stop. Some broken VMs don't appear to be hung until I try to live
migrate them (live migrating all VMs helped test solutions)

Finally I have a workaround... disable exclusive-lock, object-map, and
fast-diff rbd features (and restart clients via live migrate).
(object-map and fast-diff appear to have no effect on dif or export-diff
... so I don't miss them). I'll file a bug at some point (after I move
all VMs back and see if it is still stable). And one other user on IRC
said this solved the same problem (also using rbd snapshots).

And strangely, they don't seem to hang if I put back those features,
until a few days later (making testing much less easy...but now I'm very
sure removing them prevents the issue)

I hope this works for you (and maybe gets some attention from devs too),
so you don't waste months like me.

On 03/27/17 19:31, Hall, Eric wrote:
> In an OpenStack (mitaka) cloud, backed by a ceph cluster (10.2.6 jewel), 
> using libvirt/qemu (1.3.1/2.5) hypervisors on Ubuntu 14.04.5 compute and ceph 
> hosts, we occasionally see hung processes (usually during boot, but otherwise 
> as well), with errors reported in the instance logs as shown below.  
> Configuration is vanilla, based on openstack/ceph docs.
>
> Neither the compute hosts nor the ceph hosts appear to be overloaded in terms 
> of memory or network bandwidth, none of the 67 osds are over 80% full, nor do 
> any of them appear to be overwhelmed in terms of IO.  Compute hosts and ceph 
> cluster are connected via a relatively quiet 1Gb network, with an IBoE net 
> between the ceph nodes.  Neither network appears overloaded.
>
> I don’t see any related (to my eye) errors in client or server logs, even 
> with 20/20 logging from various components (rbd, rados, client, objectcacher, 
> etc.)  I’ve increased the qemu file descriptor limit (currently 64k... 
> overkill for sure.)
>
> I “feels” like a performance problem, but I can’t find any capacity issues or 
> constraining bottlenecks. 
>
> Any suggestions or insights into this situation are appreciated.  Thank you 
> for your time,
> --
> Eric
>
>
> [Fri Mar 24 20:30:40 2017] INFO: task jbd2/vda1-8:226 blocked for more than 
> 120 seconds.
> [Fri Mar 24 20:30:40 2017]   Not tainted 3.13.0-52-generic #85-Ubuntu
> [Fri Mar 24 20:30:40 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Fri Mar 24 20:30:40 2017] jbd2/vda1-8 D 88043fd13180 0   226 
>  2 0x
> [Fri Mar 24 20:30:40 2017]  88003728bbd8 0046 
> 88042690 88003728bfd8
> [Fri Mar 24 20:30:40 2017]  00013180 00013180 
> 88042690 88043fd13a18
> [Fri Mar 24 20:30:40 2017]  88043ffb9478 0002 
> 811ef7c0 88003728bc50
> [Fri Mar 24 20:30:40 2017] Call Trace:
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] io_schedule+0x9d/0x140
> [Fri Mar 24 20:30:40 2017]  [] sleep_on_buffer+0xe/0x20
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_bit+0x62/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> generic_block_bmap+0x50/0x50
> [Fri Mar 24 20:30:40 2017]  [] 
> out_of_line_wait_on_bit+0x77/0x90
> [Fri Mar 24 20:30:40 2017]  [] ? 
> autoremove_wake_function+0x40/0x40
> [Fri Mar 24 20:30:40 2017]  [] __wait_on_buffer+0x2a/0x30
> [Fri Mar 24 20:30:40 2017]  [] 
> jbd2_journal_commit_transaction+0x185d/0x1ab0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> try_to_del_timer_sync+0x4f/0x70
> [Fri Mar 24 20:30:40 2017]  [] kjournald2+0xbd/0x250
> [Fri Mar 24 20:30:40 2017]  [] ? 
> prepare_to_wait_event+0x100/0x100
> [Fri Mar 24 20:30:40 2017]  [] ? commit_timeout+0x10/0x10
> [Fri Mar 24 20:30:40 2017]  [] kthread+0xd2/0xf0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
> [Fri Mar 24 20:30:40 2017]  [] ret_from_fork+0x7c/0xb0
> [Fri Mar 24 20:30:40 2017]  [] ? 
> kthread_create_on_node+0x1c0/0x1c0
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Kraken release and RGW --> "S3 bucket lifecycle API has been added. Note that currently it only supports object expiration."

2017-03-27 Thread ceph . novice
Hi Cephers.

 

Couldn't find any special documentation about the "S3 object expiration" so I assume it should work "AWS S3 like" (?!?) ...  BUT ...

we have a test cluster based on 11.2.0 - Kraken and I set some object expiration dates via CyberDuck and DragonDisk, but the objects are still there, days after the applied date/time. Do I miss something?

 

Thanks & regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds down after upgrade hammer to jewel

2017-03-27 Thread George Mihaiescu
Make sure the OSD processes on the Jewel node are running. If you didn't change 
the ownership to user ceph, they won't start.


> On Mar 27, 2017, at 11:53, Jaime Ibar  wrote:
> 
> Hi all,
> 
> I'm upgrading ceph cluster from Hammer 0.94.9 to jewel 10.2.6.
> 
> The ceph cluster has 3 servers (one mon and one mds each) and another 6 
> servers with
> 12 osds each.
> The monitoring and mds have been succesfully upgraded to latest jewel 
> release, however
> after upgrade the first osd server(12 osds), ceph is not aware of them and
> are marked as down
> 
> ceph -s
> 
> cluster 4a158d27-f750-41d5-9e7f-26ce4c9d2d45
> health HEALTH_WARN
> [...]
>12/72 in osds are down
>noout flag(s) set
> osdmap e14010: 72 osds: 60 up, 72 in; 14641 remapped pgs
>flags noout
> [...]
> 
> ceph osd tree
> 
> 3   3.64000 osd.3  down  1.0 1.0
> 8   3.64000 osd.8  down  1.0 1.0
> 14   3.64000 osd.14 down  1.0 1.0
> 18   3.64000 osd.18 down  1.0  1.0
> 21   3.64000 osd.21 down  1.0  1.0
> 28   3.64000 osd.28 down  1.0  1.0
> 31   3.64000 osd.31 down  1.0  1.0
> 37   3.64000 osd.37 down  1.0  1.0
> 42   3.64000 osd.42 down  1.0  1.0
> 47   3.64000 osd.47 down  1.0  1.0
> 51   3.64000 osd.51 down  1.0  1.0
> 56   3.64000 osd.56 down  1.0  1.0
> 
> If I run this command with one of the down osd
> ceph osd in 14
> osd.14 is already in.
> however ceph doesn't mark it as up and the cluster health remains
> in degraded state.
> 
> Do I have to upgrade all the osds to jewel first?
> Any help as I'm running out of ideas?
> 
> Thanks
> Jaime
> 
> -- 
> 
> Jaime Ibar
> High Performance & Research Computing, IS Services
> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
> http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie
> Tel: +353-1-896-3725
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] libjemalloc.so.1 not used?

2017-03-27 Thread Alexandre DERUMIER
you need to recompile ceph with jemalloc, without have tcmalloc dev librairies.

LD_PRELOAD has never work for jemalloc and ceph


- Mail original -
De: "Engelmann Florian" 
À: "ceph-users" 
Envoyé: Lundi 27 Mars 2017 16:54:33
Objet: [ceph-users] libjemalloc.so.1 not used?

Hi, 

we are testing Ceph as block storage (XFS based OSDs) running in a hyper 
converged setup with KVM as hypervisor. We are using NVMe SSD only (Intel DC 
P5320) and I would like to use jemalloc on Ubuntu xenial (current kernel 
4.4.0-64-generic). I tried to use /etc/default/ceph and uncommented: 


# /etc/default/ceph 
# 
# Environment file for ceph daemon systemd unit files. 
# 

# Increase tcmalloc cache size 
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728 

## use jemalloc instead of tcmalloc 
# 
# jemalloc is generally faster for small IO workloads and when 
# ceph-osd is backed by SSDs. However, memory usage is usually 
# higher by 200-300mb. 
# 
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 

and it looks like the OSDs are using jemalloc: 

lsof |grep -e "ceph-osd.*8074.*malloc" 
ceph-osd 8074 ceph mem REG 252,0 294776 659213 /usr/lib/libtcmalloc.so.4.2.6 
ceph-osd 8074 ceph mem REG 252,0 219816 658861 
/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 
ceph-osd 8074 8116 ceph mem REG 252,0 294776 659213 
/usr/lib/libtcmalloc.so.4.2.6 
ceph-osd 8074 8116 ceph mem REG 252,0 219816 658861 
/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 
ceph-osd 8074 8117 ceph mem REG 252,0 294776 659213 
/usr/lib/libtcmalloc.so.4.2.6 
ceph-osd 8074 8117 ceph mem REG 252,0 219816 658861 
/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 
ceph-osd 8074 8118 ceph mem REG 252,0 294776 659213 
/usr/lib/libtcmalloc.so.4.2.6 
ceph-osd 8074 8118 ceph mem REG 252,0 219816 658861 
/usr/lib/x86_64-linux-gnu/libjemalloc.so.1 
[...] 

But perf top shows something different: 

Samples: 11M of event 'cycles:pp', Event count (approx.): 603904862529620 
Overhead Shared Object Symbol 
1.86% libtcmalloc.so.4.2.6 [.] operator new[] 
1.73% [kernel] [k] mem_cgroup_iter 
1.34% libstdc++.so.6.0.21 [.] std::__ostream_insert > 
1.29% libpthread-2.23.so [.] pthread_mutex_lock 
1.10% [kernel] [k] __switch_to 
0.97% libpthread-2.23.so [.] pthread_mutex_unlock 
0.94% [kernel] [k] native_queued_spin_lock_slowpath 
0.92% [kernel] [k] update_cfs_shares 
0.90% libc-2.23.so [.] __memcpy_avx_unaligned 
0.87% libtcmalloc.so.4.2.6 [.] operator delete[] 
0.80% ceph-osd [.] ceph::buffer::ptr::release 
0.80% [kernel] [k] mem_cgroup_zone_lruvec 


Do my OSDs use jemalloc or don't they? 

All the best, 
Florian 




EveryWare AG 
Florian Engelmann 
Systems Engineer 
Zurlindenstrasse 52a 
CH-8003 Zürich 

T +41 44 466 60 00 
F +41 44 466 60 10 

florian.engelm...@everyware.ch 
www.everyware.ch 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to check SMR vs PMR before buying disks?

2017-03-27 Thread Adam Carheden
What's the biggest PMR disk I can buy, and how do I tell if a disk is PMR?

I'm well aware that I shouldn't use SMR disks:
http://ceph.com/planet/do-not-use-smr-disks-with-ceph/

But newegg and the like don't seem to advertise SMR vs PMR and I can't
even find it on manufacturer's websites (at least not from Seagate).

Is there any way to tell? Is there a rule of thumb, such "as 4T+ is
probably SMR" or "enterprise usually means PMR"?

Thanks
-- 
Adam Carheden

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to check SMR vs PMR before buying disks?

2017-03-27 Thread Christian Balzer
On Mon, 27 Mar 2017 17:32:53 -0600 Adam Carheden wrote:

> What's the biggest PMR disk I can buy, and how do I tell if a disk is PMR?
> 
> I'm well aware that I shouldn't use SMR disks:
> http://ceph.com/planet/do-not-use-smr-disks-with-ceph/
> 
> But newegg and the like don't seem to advertise SMR vs PMR and I can't
> even find it on manufacturer's websites (at least not from Seagate).
> 
You need to work on your google/website scouring foo.

http://www.seagate.com/enterprise-storage/hard-disk-drives/archive-hdd/#features

Clearly says SMR there, I would assume "archive" is a good hint, too.

> Is there any way to tell? Is there a rule of thumb, such "as 4T+ is
> probably SMR" or "enterprise usually means PMR"?
> 
Size isn't conclusive, enterprise and non-archive more so.

http://www.seagate.com/enterprise-storage/hard-disk-drives/enterprise-capacity-3-5-hdd/

"Proven conventional PMR technology backed by highest field reliability
ratings and an MTBF of 2M hours"

HTH,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Rakuten Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New hardware for OSDs

2017-03-27 Thread Christian Balzer

Hello,

On Mon, 27 Mar 2017 16:09:09 +0100 Nick Fisk wrote:

> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Wido den Hollander
> > Sent: 27 March 2017 12:35
> > To: ceph-users@lists.ceph.com; Christian Balzer 
> > Subject: Re: [ceph-users] New hardware for OSDs
> > 
> >   
> > > Op 27 maart 2017 om 13:22 schreef Christian Balzer :
> > >
> > >
> > >
> > > Hello,
> > >
> > > On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote:
> > >  
> > > > Hello all,
> > > > we are currently in the process of buying new hardware to expand an
> > > > existing Ceph cluster that already has 1200 osds.  
> > >
> > > That's quite sizable, is the expansion driven by the need for more
> > > space (big data?) or to increase IOPS (or both)?
> > >  
> > > > We are currently using 24 * 4 TB SAS drives per osd with an SSD
> > > > journal shared among 4 osds. For the upcoming expansion we were
> > > > thinking of switching to either 6 or 8 TB hard drives (9 or 12 per
> > > > host) in order to drive down space and cost requirements.
> > > >
> > > > Has anyone any experience in mid-sized/large-sized deployment using
> > > > such hard drives? Our main concern is the rebalance time but we
> > > > might be overlooking some other aspects.
> > > >  
> > >
> > > If you researched the ML archives, you should already know to stay
> > > well away from SMR HDDs.
> > >  
> > 
> > Amen! Just don't. Stay away from SMR with Ceph.
> >   
> > > Both HGST and Seagate have large Enterprise HDDs that have
> > > journals/caches (MediaCache in HGST speak IIRC) that drastically
> > > improve write IOPS compared to plain HDDs.
> > > Even with SSD journals you will want to consider those, as these new
> > > HDDs will see at least twice the action than your current ones.
> > >  
> 
> I've got a mixture of WD Red Pro 6TB and HGST He8 8TB drives. Recovery for
> ~70% full disks takes around 3-4 hours, this is for a cluster containing 60
> OSD's. I'm usually seeing recovery speeds up around 1GB/s or more.
> 
Good data point.

How busy is your cluster at those times, client I/O impact?

> Depends on your workload, mine is for archiving/backups so big disks are a
> must. I wouldn't recommend using them for more active workloads unless you
> are planning a beefy cache tier or some other sort of caching solution.
> 
> The He8 (and He10) drives also use a fair bit less power due to less
> friction, but I think this only applies to the sata model. My 12x3.5 8TB
> node with CPU...etc uses ~140W at idle. Hoping to get this down further with
> a new Xeon-D design on next expansion phase.
> 
> The only thing I will say about big disks is beware of cold FS
> inodes/dentry's and PG splitting. The former isn't a problem if you will
> only be actively accessing a small portion of your data, but I see increases
> in latency if I access cold data even with VFS cache pressure set to 1.
> Currently investigating using bcache under the OSD to try and cache this.
> 

I've seen this kind of behavior on my (non-Ceph) mailbox servers. 
As in, the maximum SLAB space may not be large enough to hold all inodes
or the pagecache will eat into it over time when not constantly
referenced, despite cache pressure settings.

> PG splitting becomes a problem when the disks start to fill up, playing with
> the split/merge thresholds may help, but you have to be careful you don't
> end up with massive splits when they do finally happen, as otherwise OSD's
> start timing out.
> 
Getting this right (and predictable) is one of the darker arts with Ceph.
OTOH it will go away with Bluestore (just to be replaced by other oddities
no doubt).

> > 
> > I also have good experiences with bcache on NVM-E device in Ceph clusters.
> > A single Intel P3600/P3700 which is the caching device for bcache.
> >   
> > > Rebalance time is a concern of course, especially if your cluster like
> > > most HDD based ones has these things throttled down to not impede
> > > actual client I/O.
> > >
> > > To get a rough idea, take a look at:
> > > https://www.memset.com/tools/raid-calculator/
> > >
> > > For Ceph with replication 3 and the typical PG distribution, assume
> > > 100 disks and the RAID6 with hotspares numbers are relevant.
> > > For rebuild speed, consult your experience, you must have had a few
> > > failures. ^o^
> > >
> > > For example with a recovery speed of 100MB/s, a 1TB disk (used data
> > > with Ceph actually) looks decent at 1:16000 DLO/y.
> > > At 5TB though it enters scary land
> > >  
> > 
> > Yes, those recoveries will take a long time. Let's say your 6TB drive is  
> filled for
> > 80% you need to rebalance 4.8TB
> > 
> > 4.8TB / 100MB/sec = 13 hours rebuild time
> > 
> > 13 hours is a long time. And you will probably not have 100MB/sec
> > sustained, I think that 50MB/sec is much more realistic.  
> 
> Are we talking backfill or recovery here? Recovery will go at the combined
> speed of all the disks in the cluster. If the OP's cl

Re: [ceph-users] New hardware for OSDs

2017-03-27 Thread Christian Balzer

Hello,

On Mon, 27 Mar 2017 17:48:38 +0200 Mattia Belluco wrote:

> I mistakenly answered to Wido instead of the whole Mailing list ( weird
> ml settings I suppose)
> 
> Here it is my message:
> 
> 
> Thanks for replying so quickly. I commented inline.
> 
> On 03/27/2017 01:34 PM, Wido den Hollander wrote:
> >   
> >> Op 27 maart 2017 om 13:22 schreef Christian Balzer :
> >>
> >>
> >>
> >> Hello,
> >>
> >> On Mon, 27 Mar 2017 12:27:40 +0200 Mattia Belluco wrote:
> >>  
> >>> Hello all,
> >>> we are currently in the process of buying new hardware to expand an
> >>> existing Ceph cluster that already has 1200 osds.  
> >>
> >> That's quite sizable, is the expansion driven by the need for more space
> >> (big data?) or to increase IOPS (or both)?
> >>  
> >>> We are currently using 24 * 4 TB SAS drives per osd with an SSD journal
> >>> shared among 4 osds. For the upcoming expansion we were thinking of
> >>> switching to either 6 or 8 TB hard drives (9 or 12 per host) in order to
> >>> drive down space and cost requirements.
> >>>
> >>> Has anyone any experience in mid-sized/large-sized deployment using such
> >>> hard drives? Our main concern is the rebalance time but we might be
> >>> overlooking some other aspects.
> >>>  
> >>
> >> If you researched the ML archives, you should already know to stay well
> >> away from SMR HDDs. 
> >>  
> > 
> > Amen! Just don't. Stay away from SMR with Ceph.
> >   
> We were planning on using regular enterprise disks. No SMR :)
> We are bit puzzled about the possible performance gain of the 4k native
> ones but that's about it.
> 
AFAIK Linux will even with 512e (4K native, 512B emulation) drives do the
right thing [TM].

> >> Both HGST and Seagate have large Enterprise HDDs that have
> >> journals/caches (MediaCache in HGST speak IIRC) that drastically improve
> >> write IOPS compared to plain HDDs.
> >> Even with SSD journals you will want to consider those, as these new HDDs
> >> will see at least twice the action than your current ones. 
> >>  
> > 
> > I also have good experiences with bcache on NVM-E device in Ceph clusters. 
> > A single Intel P3600/P3700 which is the caching device for bcache.
> >   
> No experience with those but I am a bit skeptical in including new
> solutions in the current cluster as the current setup seems to work
> quite well (no IOPS problem).
> Those could be a nice solution for a new cluster, though.
> 
I have no experiences (or no current ones at last) with those either and a
new cluster (as in late this year or early next year) would likely to be
Bluestore based and thus have different needs, tuning knobs, etc.

> 
> >> Rebalance time is a concern of course, especially if your cluster like
> >> most HDD based ones has these things throttled down to not impede actual
> >> client I/O.
> >>
> >> To get a rough idea, take a look at:
> >> https://www.memset.com/tools/raid-calculator/
> >>
> >> For Ceph with replication 3 and the typical PG distribution, assume 100
> >> disks and the RAID6 with hotspares numbers are relevant.
> >> For rebuild speed, consult your experience, you must have had a few
> >> failures. ^o^
> >>
> >> For example with a recovery speed of 100MB/s, a 1TB disk (used data with
> >> Ceph actually) looks decent at 1:16000 DLO/y. 
> >> At 5TB though it enters scary land
> >>  
> > 
> > Yes, those recoveries will take a long time. Let's say your 6TB drive is 
> > filled for 80% you need to rebalance 4.8TB
> > 
> > 4.8TB / 100MB/sec = 13 hours rebuild time
> > 
> > 13 hours is a long time. And you will probably not have 100MB/sec 
> > sustained, I think that 50MB/sec is much more realistic.
> > 
> > That means that a single disk failure will take >24 hours to recover from a 
> > rebuild.
> > 
> > I don't like very big disks that much. Not in RAID, not in Ceph.  
> I don't think I am followinj the calculations. Maybe I need to provide a
> few more details on our current network configuration:
> each host (24 disks/osds) has 4 * 10 Gbit interfaces, 2 for client I/O
> and 2 for the recovery network.
> Rebalancing an OSD that was 50% full (2000GB) with the current setup
> tool a little less than 30 mins. It would still take 1.5 hour to
> rebalance 6 TB of data but that should still be reasonable,no?
> What am I overlooking here?
> 
We're playing devils advocate here, not knowing your configuration.
And most of all, if your cluster is busy or busier than usual, those times
will go up.

Your numbers suggest a recovery speed of around 1GB/s, which is very nice
and something I'd expect (hope) to see from such a large cluster. 

Plunging that into the calculator above with 5TB gives us a 1:6500 DLO/y,
not utterly frightening but also quite a bit lower than your current
example with 2TB at 1:4.

> From our perspective having 9 * 8TB noded should provide a better
> recovery time than the current 24 * 4TB ones if a whole node goes down
> provide the rebalance is shared among several hundreds osds.
> 
You'll have 25% less data per nod

Re: [ceph-users] RBD image perf counters: usage, access

2017-03-27 Thread Masha Atakova

Hi Yang,

> Do you mean get the perf counters via api? At first this counter is 
only for a particular ImageCtx (connected client), then you can

read the counters by the perf dump command in my last mail I think.
Yes, I did mean to get counters via API. And looks like I can adapt this 
admin-daemon command for my purposes. Thanks!


Having ceph-top would be just great and much more useful for me, yes. 
I'm glad there are some discussions about that and I didn't know about 
them. So thanks for pointing me out :)



On 27/03/17 15:38, Dongsheng Yang wrote:


On 03/27/2017 04:06 PM, Masha Atakova wrote:


Hi Yang,


Hi Masha,


Thank you for your reply. This is very useful indeed that there are 
many ImageCtx objects for one image.


But in my setting, I don't have any particular ceph client connected 
to ceph (I could, but this is not the point). I'm trying to get 
metrics for particular image while not performing anything with it 
myself.




The perf counter you mentioned in your first mail, is just for one 
particular image client, that means,

these perf counter will disappear as the client disconnected.


And I'm trying to get access to performance counters listed in the 
ImageCtx class, they don't seem to be reported by the perf tool.




Do you mean get the perf counters via api? At first this counter is 
only for a particular ImageCtx (connected client), then you can

read the counters by the perf dump command in my last mail I think.


If you want to get the performance counter for an image (no matter how 
many ImageCtx, connected or disconnected),

maybe you need to wait this one:
http://pad.ceph.com/p/ceph-top

Yang


Thanks!

On 27/03/17 12:29, Dongsheng Yang wrote:

Hi Masha
you can get the counters by perf dump command on the asok file 
of your client. such as that:

$ ceph --admin-daemon out/client.admin.9921.asok perf dump|grep rd
"rd": 656754,
"rd_bytes": 656754,
"rd_latency": {
"discard": 0,
"discard_bytes": 0,
"discard_latency": {
"omap_rd": 0,

But, note that, this is a counter of this one ImageCtx, but not the 
counter for this image. There are

possible several ImageCtxes reading or writing on the same image.

Yang

On 03/27/2017 12:23 PM, Masha Atakova wrote:


Hi everyone,

I was going around trying to figure out how to get ceph metrics on 
a more detailed level than daemons. Of course, I found and explored 
API for watching rados objects, but I'm more interested in getting 
metrics about RBD images. And while I could get list of objects for 
particular image, and then watch all of them, it doesn't seem like 
very efficient way to go about it.


I checked librbd API and there isn't anything helping with my goal.

So I went through the source code and found list of performance 
counters for image which are incremented by other parts of ceph 
when making corresponding operations: 
https://github.com/ceph/ceph/blob/master/src/librbd/ImageCtx.cc#L364


I have 2 questions about it:

1) is there any workaround to use those counters right now? maybe 
when compiling against ceph the code doing it. Looks like I need to 
be able to access particular ImageCtx object (instead of creating 
my own), and I just can't find appropriate class / part of the 
librbd allowing me to do so.


2) are there any plans on making those counters accessible via API 
like librbd or librados?


I see that these questions might be more appropriate for the devel 
list, but:


- it seems to me that question of getting ceph metrics is more 
interesting for those who use ceph


- I couldn't subscribe to it with an error provided below.

Thanks!

majord...@vger.kernel.org:
SMTP error from remote server for MAIL FROM command, host: vger.kernel.org 
(209.132.180.67) reason: 553 5.7.1 Hello [74.208.4.201], for your MAIL FROM 
address  policy analysis reported: Your address is not 
liked source for email


--- The header of the original message is following. ---

Received: from [192.168.1.10] ([223.206.146.181]) by mail.gmx.com (mrgmxus001
  [74.208.5.15]) with ESMTPSA (Nemesis) id 0M92q3-1d0LS03yov-00CTwW for
  ; Mon, 27 Mar 2017 05:55:46 +0200
To:majord...@vger.kernel.org
From: Masha Atakova
Subject: subscribe ceph-devel
Message-ID:<174d9bc0-b50d-fc80-ede8-5ba9d472e...@mail.com>
Date: Mon, 27 Mar 2017 10:55:43 +0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
  Thunderbird/45.7.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:Lau7llt7/MuJt+nRLjXIhY91IuCvCBJGtqDzxLgqkh2ERVkWeep
  5CDyh9GHW7QSodn80xWCPOOD2kvvnr6YxrB5R9SZ1iloI9VO2YoTXAauDq4mtWh+abwUOiY
  wQgj6YvUcLjfUinsh0t68Q9m3h3ufZIoKIeWhKFGbsRALqsvZjgWBVlaAR/V5Vt4O/wFJGG
  YULQ6/t4oDSsBuy4agFdQ==
X-UI-Out-Filterresults: notjunk:1;V01:K0:xLdjozptxu8=:nO7vxZvAbidrXk7gcv7Wqc
  Bjr14pXiTEv8gVIlRTZ78cNDEQthT557sAgBBRnJkDGXkP1efvEN2QqsZAzfa52Og4ysSFXub
  BPSiDOI0wkzxQMu1QHqWzvURobFX9LxrctwYB3k9nrOtHFgJwm0eQWfV1QKg7i0

Re: [ceph-users] Ceph OSD network with IPv6 SLAAC networks?

2017-03-27 Thread Richard Hesse
Nix the second question, as I understand it, ceph doesn't work in mixed
IPv6 and legacy IPv4 environments.

Still, would like to hear from people running it in SLAAC environments.

On Mon, Mar 27, 2017 at 12:49 PM, Richard Hesse 
wrote:

> Has anyone run their Ceph OSD cluster network on IPv6 using SLAAC? I know
> that ceph supports IPv6, but I'm not sure how it would deal with the
> address rotation in SLAAC, permanent vs outgoing address, etc. It would be
> very nice for me, as I wouldn't have to run any kind of DHCP server or use
> static addressing -- just configure RA's and go.
>
> On that note, does anyone have any experience with running ceph in a mixed
> v4 and v6 environment?
>
> Thanks,
> -richard
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-rest-api's behavior

2017-03-27 Thread Mika c
Hi Brad,
   Thanks for your help. I found that's my problem. Forget attach file name
with words ''keyring".

And sorry to bother you again. Is it possible to create a minimum privilege
client for the api to run?



Best wishes,
Mika


2017-03-24 19:32 GMT+08:00 Brad Hubbard :

> On Fri, Mar 24, 2017 at 8:20 PM, Mika c  wrote:
> > Hi Brad,
> >  Thanks for your reply. The environment already created keyring file
> and
> > put it in /etc/ceph but not working.
>
> What was it called?
>
> > I have to write config into ceph.conf like below.
> >
> > ---ceph.conf start---
> > [client.symphony]
> > log_file = /
> > var/log/ceph/rest-api.log
> >
> > keyring = /etc/ceph/ceph.client.symphony
> > public addr =
> > 0.0.0.0
> > :5
> > 000
> >
> > restapi base url = /api/v0.1
> > ---ceph.conf
> > end
> > ---
> >
> >
> > Another question, have I must setting capabilities for this client like
> > admin ?
> > But I just want to take some information like health or df.
> >
> > If this client setting
> > for a particular
> > capabilities
> > like..
> > ---
> > ---
> >
> > client.symphony
> >key: AQBP8NRYGehDKRAAzyChAvAivydLqRBsHeTPjg==
> >caps: [mon] allow r
> >caps: [osd] allow r
> > x
> > ---
> > ---
> > Error list:
> > Traceback (most recent call last):
> >  File "/usr/bin/ceph-rest-api", line 59, in 
> >rest,
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 495, in
> > generate_a
> > pp
> >addr, port = api_setup(app, conf, cluster, clientname, clientid, args)
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 146, in
> > api_setup
> >target=('osd', int(osdid)))
> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 84, in
> > get_command
> > _descriptions
> >raise EnvironmentError(ret, err)
> > EnvironmentError: [Errno -1] Can't get command descriptions:
> >
> >
> >
> >
> > Best wishes,
> > Mika
> >
> >
> > 2017-03-24 16:21 GMT+08:00 Brad Hubbard :
> >>
> >> On Fri, Mar 24, 2017 at 4:06 PM, Mika c  wrote:
> >> > Hi all,
> >> >  Same question with CEPH 10.2.3 and 11.2.0.
> >> >   Is this command only for client.admin ?
> >> >
> >> > client.symphony
> >> >key: AQD0tdRYjhABEhAAaG49VhVXBTw0MxltAiuvgg==
> >> >caps: [mon] allow *
> >> >caps: [osd] allow *
> >> >
> >> > Traceback (most recent call last):
> >> >  File "/usr/bin/ceph-rest-api", line 43, in 
> >> >rest,
> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 504,
> in
> >> > generate_a
> >> > pp
> >> >addr, port = api_setup(app, conf, cluster, clientname, clientid,
> >> > args)
> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 106,
> in
> >> > api_setup
> >> >app.ceph_cluster.connect()
> >> >  File "rados.pyx", line 811, in rados.Rados.connect
> >> > (/tmp/buildd/ceph-11.2.0/obj-x
> >> > 86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10178)
> >> > rados.ObjectNotFound: error connecting to the cluster
> >>
> >> # strace -eopen /bin/ceph-rest-api |& grep keyring
> >> open("/etc/ceph/ceph.client.restapi.keyring", O_RDONLY) = -1 ENOENT
> >> (No such file or directory)
> >> open("/etc/ceph/ceph.keyring", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >> open("/etc/ceph/keyring", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >> open("/etc/ceph/keyring.bin", O_RDONLY) = -1 ENOENT (No such file or
> >> directory)
> >>
> >> # ceph auth get-or-create client.restapi mon 'allow *' mds 'allow *'
> >> osd 'allow *' >/etc/ceph/ceph.client.restapi.keyring
> >>
> >> # /bin/ceph-rest-api
> >>  * Running on http://0.0.0.0:5000/
> >>
> >> >
> >> >
> >> >
> >> > Best wishes,
> >> > Mika
> >> >
> >> >
> >> > 2016-03-03 12:25 GMT+08:00 Shinobu Kinjo :
> >> >>
> >> >> Yes.
> >> >>
> >> >> On Wed, Jan 27, 2016 at 1:10 PM, Dan Mick  wrote:
> >> >> > Is the client.test-admin key in the keyring read by ceph-rest-api?
> >> >> >
> >> >> > On 01/22/2016 04:05 PM, Shinobu Kinjo wrote:
> >> >> >> Does anyone have any idea about that?
> >> >> >>
> >> >> >> Rgds,
> >> >> >> Shinobu
> >> >> >>
> >> >> >> - Original Message -
> >> >> >> From: "Shinobu Kinjo" 
> >> >> >> To: "ceph-users" 
> >> >> >> Sent: Friday, January 22, 2016 7:15:36 AM
> >> >> >> Subject: ceph-rest-api's behavior
> >> >> >>
> >> >> >> Hello,
> >> >> >>
> >> >> >> "ceph-rest-api" works greatly with client.admin.
> >> >> >> But with client.test-admin which I created just after building the
> >> >> >> Ceph
> >> >> >> cluster , it does not work.
> >> >> >>
> >> >> >>  ~$ ceph auth get-or-create client.test-admin mon 'allow *' mds
> >> >> >> 'allow
> >> >> >> *' osd 'allow *'
> >> >> >>
> >> >> >>  ~$ sudo ceph auth list
> >> >> >>  installed auth entries:
> >> >> >>...
> >> >> >>  client.test-admin
> >> >> >>   key: AQCOVaFWTYr2ORAAKwruANTLXqdHOchkVvRApg==
> >> >> >>   caps: [mds] allow *
> >> >> >>   caps: [mon] allow *
> 

Re: [ceph-users] ceph-rest-api's behavior

2017-03-27 Thread Brad Hubbard
I've copied Dan who may have some thoughts on this and has been
involved with this code.

On Tue, Mar 28, 2017 at 3:58 PM, Mika c  wrote:
> Hi Brad,
>Thanks for your help. I found that's my problem. Forget attach file name
> with words ''keyring".
>
> And sorry to bother you again. Is it possible to create a minimum privilege
> client for the api to run?
>
>
>
> Best wishes,
> Mika
>
>
> 2017-03-24 19:32 GMT+08:00 Brad Hubbard :
>>
>> On Fri, Mar 24, 2017 at 8:20 PM, Mika c  wrote:
>> > Hi Brad,
>> >  Thanks for your reply. The environment already created keyring file
>> > and
>> > put it in /etc/ceph but not working.
>>
>> What was it called?
>>
>> > I have to write config into ceph.conf like below.
>> >
>> > ---ceph.conf start---
>> > [client.symphony]
>> > log_file = /
>> > var/log/ceph/rest-api.log
>> >
>> > keyring = /etc/ceph/ceph.client.symphony
>> > public addr =
>> > 0.0.0.0
>> > :5
>> > 000
>> >
>> > restapi base url = /api/v0.1
>> > ---ceph.conf
>> > end
>> > ---
>> >
>> >
>> > Another question, have I must setting capabilities for this client like
>> > admin ?
>> > But I just want to take some information like health or df.
>> >
>> > If this client setting
>> > for a particular
>> > capabilities
>> > like..
>> > ---
>> > ---
>> >
>> > client.symphony
>> >key: AQBP8NRYGehDKRAAzyChAvAivydLqRBsHeTPjg==
>> >caps: [mon] allow r
>> >caps: [osd] allow r
>> > x
>> > ---
>> > ---
>> > Error list:
>> > Traceback (most recent call last):
>> >  File "/usr/bin/ceph-rest-api", line 59, in 
>> >rest,
>> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 495, in
>> > generate_a
>> > pp
>> >addr, port = api_setup(app, conf, cluster, clientname, clientid,
>> > args)
>> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 146, in
>> > api_setup
>> >target=('osd', int(osdid)))
>> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 84, in
>> > get_command
>> > _descriptions
>> >raise EnvironmentError(ret, err)
>> > EnvironmentError: [Errno -1] Can't get command descriptions:
>> >
>> >
>> >
>> >
>> > Best wishes,
>> > Mika
>> >
>> >
>> > 2017-03-24 16:21 GMT+08:00 Brad Hubbard :
>> >>
>> >> On Fri, Mar 24, 2017 at 4:06 PM, Mika c  wrote:
>> >> > Hi all,
>> >> >  Same question with CEPH 10.2.3 and 11.2.0.
>> >> >   Is this command only for client.admin ?
>> >> >
>> >> > client.symphony
>> >> >key: AQD0tdRYjhABEhAAaG49VhVXBTw0MxltAiuvgg==
>> >> >caps: [mon] allow *
>> >> >caps: [osd] allow *
>> >> >
>> >> > Traceback (most recent call last):
>> >> >  File "/usr/bin/ceph-rest-api", line 43, in 
>> >> >rest,
>> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 504,
>> >> > in
>> >> > generate_a
>> >> > pp
>> >> >addr, port = api_setup(app, conf, cluster, clientname, clientid,
>> >> > args)
>> >> >  File "/usr/lib/python2.7/dist-packages/ceph_rest_api.py", line 106,
>> >> > in
>> >> > api_setup
>> >> >app.ceph_cluster.connect()
>> >> >  File "rados.pyx", line 811, in rados.Rados.connect
>> >> > (/tmp/buildd/ceph-11.2.0/obj-x
>> >> > 86_64-linux-gnu/src/pybind/rados/pyrex/rados.c:10178)
>> >> > rados.ObjectNotFound: error connecting to the cluster
>> >>
>> >> # strace -eopen /bin/ceph-rest-api |& grep keyring
>> >> open("/etc/ceph/ceph.client.restapi.keyring", O_RDONLY) = -1 ENOENT
>> >> (No such file or directory)
>> >> open("/etc/ceph/ceph.keyring", O_RDONLY) = -1 ENOENT (No such file or
>> >> directory)
>> >> open("/etc/ceph/keyring", O_RDONLY) = -1 ENOENT (No such file or
>> >> directory)
>> >> open("/etc/ceph/keyring.bin", O_RDONLY) = -1 ENOENT (No such file or
>> >> directory)
>> >>
>> >> # ceph auth get-or-create client.restapi mon 'allow *' mds 'allow *'
>> >> osd 'allow *' >/etc/ceph/ceph.client.restapi.keyring
>> >>
>> >> # /bin/ceph-rest-api
>> >>  * Running on http://0.0.0.0:5000/
>> >>
>> >> >
>> >> >
>> >> >
>> >> > Best wishes,
>> >> > Mika
>> >> >
>> >> >
>> >> > 2016-03-03 12:25 GMT+08:00 Shinobu Kinjo :
>> >> >>
>> >> >> Yes.
>> >> >>
>> >> >> On Wed, Jan 27, 2016 at 1:10 PM, Dan Mick  wrote:
>> >> >> > Is the client.test-admin key in the keyring read by ceph-rest-api?
>> >> >> >
>> >> >> > On 01/22/2016 04:05 PM, Shinobu Kinjo wrote:
>> >> >> >> Does anyone have any idea about that?
>> >> >> >>
>> >> >> >> Rgds,
>> >> >> >> Shinobu
>> >> >> >>
>> >> >> >> - Original Message -
>> >> >> >> From: "Shinobu Kinjo" 
>> >> >> >> To: "ceph-users" 
>> >> >> >> Sent: Friday, January 22, 2016 7:15:36 AM
>> >> >> >> Subject: ceph-rest-api's behavior
>> >> >> >>
>> >> >> >> Hello,
>> >> >> >>
>> >> >> >> "ceph-rest-api" works greatly with client.admin.
>> >> >> >> But with client.test-admin which I created just after building
>> >> >> >> the
>> >> >> >> Ceph
>> >> >> >> cluster , it does not work.
>> >> >> >>
>> >> >> >>  ~$ ceph auth get-or-create client.test

Re: [ceph-users] XFS attempt to access beyond end of device

2017-03-27 Thread Marcus Furlong
On 22 March 2017 at 19:36, Brad Hubbard  wrote:
> On Wed, Mar 22, 2017 at 5:24 PM, Marcus Furlong  wrote:

>> [435339.965817] [ cut here ]
>> [435339.965874] WARNING: at fs/xfs/xfs_aops.c:1244
>> xfs_vm_releasepage+0xcb/0x100 [xfs]()
>> [435339.965876] Modules linked in: vfat fat uas usb_storage mpt3sas
>> mpt2sas raid_class scsi_transport_sas mptctl mptbase iptable_filter
>> dell_rbu team_mode_loadbalance team rpcrdma ib_isert iscsi_target_mod
>> ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
>> scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
>> rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp
>> intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul
>> ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper
>> cryptd ipmi_devintf iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas pcspkr
>> ipmi_ssif sb_edac edac_core sg mei_me mei lpc_ich shpchp ipmi_si
>> ipmi_msghandler wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd
>> grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200
>> i2c_algo_bit
>> [435339.965942]  crct10dif_pclmul crct10dif_common drm_kms_helper
>> crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm
>> bnx2x ahci libahci mlx5_core i2c_core libata mdio ptp megaraid_sas
>> nvme pps_core libcrc32c fjes dm_mirror dm_region_hash dm_log dm_mod
>> [435339.965991] CPU: 8 PID: 223 Comm: kswapd0 Not tainted
>> 3.10.0-514.10.2.el7.x86_64 #1
>> [435339.965993] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS
>> 2.3.4 11/08/2016
>> [435339.965994]   6ea9561d 881ffc2c7aa0
>> 816863ef
>> [435339.965998]  881ffc2c7ad8 81085940 ea00015d4e20
>> ea00015d4e00
>> [435339.966000]  880f4d7c5af8 881ffc2c7da0 ea00015d4e00
>> 881ffc2c7ae8
>> [435339.966003] Call Trace:
>> [435339.966010]  [] dump_stack+0x19/0x1b
>> [435339.966015]  [] warn_slowpath_common+0x70/0xb0
>> [435339.966018]  [] warn_slowpath_null+0x1a/0x20
>> [435339.966060]  [] xfs_vm_releasepage+0xcb/0x100 [xfs]
>> [435339.966120]  [] try_to_release_page+0x32/0x50
>> [435339.966128]  [] shrink_active_list+0x3d6/0x3e0
>> [435339.966133]  [] shrink_lruvec+0x3f1/0x770
>> [435339.966138]  [] shrink_zone+0x76/0x1a0
>> [435339.966143]  [] balance_pgdat+0x48c/0x5e0
>> [435339.966147]  [] kswapd+0x173/0x450
>> [435339.966155]  [] ? wake_up_atomic_t+0x30/0x30
>> [435339.966158]  [] ? balance_pgdat+0x5e0/0x5e0
>> [435339.966161]  [] kthread+0xcf/0xe0
>> [435339.966165]  [] ? kthread_create_on_node+0x140/0x140
>> [435339.966170]  [] ret_from_fork+0x58/0x90
>> [435339.966173]  [] ? kthread_create_on_node+0x140/0x140
>> [435339.966175] ---[ end trace 58233bbca77fd5e2 ]---
>
> With regards to the above stack trace,
> https://bugzilla.redhat.com/show_bug.cgi?id=1079818 was opened, and
> remains open, for the same stack. I would suggest discussing this
> issue with your kernel support organisation as it is likely unrelated
> to the sizing issue IIUC.

Hi Brad,

Thanks for clarifying that. That bug is not public. Is there any
workaround mentioned in it?

Cheers,
Marcus.

-- 
Marcus Furlong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] XFS attempt to access beyond end of device

2017-03-27 Thread Brad Hubbard
On Tue, Mar 28, 2017 at 4:22 PM, Marcus Furlong  wrote:
> On 22 March 2017 at 19:36, Brad Hubbard  wrote:
>> On Wed, Mar 22, 2017 at 5:24 PM, Marcus Furlong  wrote:
>
>>> [435339.965817] [ cut here ]
>>> [435339.965874] WARNING: at fs/xfs/xfs_aops.c:1244
>>> xfs_vm_releasepage+0xcb/0x100 [xfs]()
>>> [435339.965876] Modules linked in: vfat fat uas usb_storage mpt3sas
>>> mpt2sas raid_class scsi_transport_sas mptctl mptbase iptable_filter
>>> dell_rbu team_mode_loadbalance team rpcrdma ib_isert iscsi_target_mod
>>> ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
>>> scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
>>> rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp
>>> intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul
>>> ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper
>>> cryptd ipmi_devintf iTCO_wdt iTCO_vendor_support mxm_wmi dcdbas pcspkr
>>> ipmi_ssif sb_edac edac_core sg mei_me mei lpc_ich shpchp ipmi_si
>>> ipmi_msghandler wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd
>>> grace sunrpc ip_tables xfs sd_mod crc_t10dif crct10dif_generic mgag200
>>> i2c_algo_bit
>>> [435339.965942]  crct10dif_pclmul crct10dif_common drm_kms_helper
>>> crc32c_intel syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm
>>> bnx2x ahci libahci mlx5_core i2c_core libata mdio ptp megaraid_sas
>>> nvme pps_core libcrc32c fjes dm_mirror dm_region_hash dm_log dm_mod
>>> [435339.965991] CPU: 8 PID: 223 Comm: kswapd0 Not tainted
>>> 3.10.0-514.10.2.el7.x86_64 #1
>>> [435339.965993] Hardware name: Dell Inc. PowerEdge R730xd/072T6D, BIOS
>>> 2.3.4 11/08/2016
>>> [435339.965994]   6ea9561d 881ffc2c7aa0
>>> 816863ef
>>> [435339.965998]  881ffc2c7ad8 81085940 ea00015d4e20
>>> ea00015d4e00
>>> [435339.966000]  880f4d7c5af8 881ffc2c7da0 ea00015d4e00
>>> 881ffc2c7ae8
>>> [435339.966003] Call Trace:
>>> [435339.966010]  [] dump_stack+0x19/0x1b
>>> [435339.966015]  [] warn_slowpath_common+0x70/0xb0
>>> [435339.966018]  [] warn_slowpath_null+0x1a/0x20
>>> [435339.966060]  [] xfs_vm_releasepage+0xcb/0x100 [xfs]
>>> [435339.966120]  [] try_to_release_page+0x32/0x50
>>> [435339.966128]  [] shrink_active_list+0x3d6/0x3e0
>>> [435339.966133]  [] shrink_lruvec+0x3f1/0x770
>>> [435339.966138]  [] shrink_zone+0x76/0x1a0
>>> [435339.966143]  [] balance_pgdat+0x48c/0x5e0
>>> [435339.966147]  [] kswapd+0x173/0x450
>>> [435339.966155]  [] ? wake_up_atomic_t+0x30/0x30
>>> [435339.966158]  [] ? balance_pgdat+0x5e0/0x5e0
>>> [435339.966161]  [] kthread+0xcf/0xe0
>>> [435339.966165]  [] ? kthread_create_on_node+0x140/0x140
>>> [435339.966170]  [] ret_from_fork+0x58/0x90
>>> [435339.966173]  [] ? kthread_create_on_node+0x140/0x140
>>> [435339.966175] ---[ end trace 58233bbca77fd5e2 ]---
>>
>> With regards to the above stack trace,
>> https://bugzilla.redhat.com/show_bug.cgi?id=1079818 was opened, and
>> remains open, for the same stack. I would suggest discussing this
>> issue with your kernel support organisation as it is likely unrelated
>> to the sizing issue IIUC.
>
> Hi Brad,
>
> Thanks for clarifying that. That bug is not public. Is there any
> workaround mentioned in it?

No, there isn't. The upstream fix is
http://oss.sgi.com/pipermail/xfs/2016-July/050281.html

>
> Cheers,
> Marcus.
>
> --
> Marcus Furlong



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com