[ceph-users] Missing field "host" in logs sent to Graylog

2019-09-30 Thread CUZA Frédéric
Hi everyone,

We are facing a problem where we cannot read logs sent to graylog because it is 
missing one mandatory field.

GELF message  (received from 
) has empty mandatory "host" field.

Does anyone know what we are missing  ?
I know there was someone facing the same issue but it seems that he didn't had 
an answer.

We are running :
Ceph : 12.2.12
Graylog : 3.0

Thanks !

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Object Size for BlueStore OSD

2019-09-30 Thread Paul Emmerich
It's sometimes faster if you reduce the object size, but I wouldn't go
below 1 MB. Depends on your hardware and use case, 4 MB is a very good
default, though.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Sep 30, 2019 at 6:44 AM Lazuardi Nasution
 wrote:
>
> Hi,
>
> Is 4MB default RBD object size still relevant for BlueStore OSD? Any 
> guideline for best RBD object size for BlueStore OSD especially on high 
> performance media (SSD, NVME)?
>
> Best regards,
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multisite not deleting old data

2019-09-30 Thread Enrico Kern
Hello,

we run a Multisite Setup between Berlin (master) and Amsterdam (slave) with
mimic. We had some huge bucket of around 40TB which got deleted a while
ago. However the data seems not to be deleted on the slave:

from rados df:

berlin.rgw.buckets.data32 TiB 31638448  0 94915344
 0   00  4936153274 989 TiB   644842251 153 TiB

amsterdam.rgw.buckets.data70 TiB 28887118  0 86661354
   0   00   275985124 203 TiB   232226203   90 TiB

the bucket itself doesnt exists anymore on the slave and the master. Any
idea what todo? Syncing of new data seems to work. I tried manual resync of
all but it says everything is back in sync, just not getting rid of the
data.

-- 

*Enrico Kern*
Chief Information Officer

enrico.k...@glispa.com
+49 (0) 30 555713017 / +49 (0) 152 26814501
skype: flyersa
LinkedIn Profile 


 

*Glispa GmbH* | Berlin Office
Stromstr. 11-17  
Berlin, Germany, 10551  

Managing Director Or Ifrah
Registered in Berlin
AG Charlottenburg |

HRB
114678B
–
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus Ceph Status Pools & Usage

2019-09-30 Thread Paul Emmerich
It's just a display bug in ceph -s:

https://tracker.ceph.com/issues/40011

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Sun, Sep 29, 2019 at 4:41 PM Lazuardi Nasution
 wrote:
>
> Hi,
>
> I'm starting with Nautilus and do create and delete some pools. When I
> check with "ceph status" I find something weird with "pools" number
> when tall pools have been deleted. I the meaning of "pools" number
> different than Luminous? As there is no pool and PG, why there is
> usage on "ceph status"?
>
> Best regards,
>
>   cluster:
> id: e53af8e4-8ef7-48ad-ae4e-3d0486ba0d72
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum c08-ctrl,c09-ctrl,c10-ctrl (age 3m)
> mgr: c08-ctrl(active, since 7d), standbys: c09-ctrl, c10-ctrl
> osd: 88 osds: 88 up (since 7d), 88 in (since 7d)
>
>   data:
> pools:   7 pools, 0 pgs
> objects: 0 objects, 0 B
> usage:   1.6 TiB used, 262 TiB / 264 TiB avail
> pgs:
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is it possible not to list rgw names in ceph status output?

2019-09-30 Thread Aleksey Gutikov
In Nautilus ceph status writes "rgw: 50 daemons active" and then lists 
all 50 names of rgw daemons.

It takes significant space in terminal.
Is it possible to disable list of names and make output like in 
Luminous: only number of active daemons?



Thanks
Aleksei
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to limit radosgw user privilege to read only mode?

2019-09-30 Thread Charles Alva
Update, I managed to limit the user privilege by modifying the user's
op-mask to read as follows:
```
radosgw-admin user modify --uid= --op-mask=read
```

And to rollback its default privileges:
  ```
radosgw-admin user modify --uid= --op-mask="read,write,delete"
```


Kind regards,

Charles Alva
Sent from Gmail Mobile


On Sun, Sep 29, 2019 at 5:00 PM Charles Alva  wrote:

> Hi Cephalopods,
>
> I'm in the process of migrating radosgw Erasure Code pool from old cluster
> to Replica pool on new cluster. To avoid user write new object to old pool,
> I want to set the radosgw user privilege to read only.
>
> Could you guys please share how to limit radosgw user privilege to read
> only?
>
> I could not find any clear explanation and example in the Ceph
> radosgw-admin docs. Is it by changing the user's caps or op_mask? Or
> setting the civetweb option to only allow HTTP HEAD and GET methods?
>
> Kind regards,
>
> Charles Alva
> Sent from Gmail Mobile
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 3,30,300 GB constraint of block.db size on SSD

2019-09-30 Thread Igor Fedotov

Hi Massimo,

On 9/29/2019 9:13 AM, Massimo Sgaravatto wrote:

In my ceph cluster I am use spinning disks for bluestore OSDs and SSDs 
just for the  block.db.


If I have got it right, right now:

a) only 3,30,300GB can be used on the SSD rocksdb spillover to slow 
device, so you don't have any benefit with e.g. 250 GB reserved on the 
SSD for block.db wrt a configuration where only 30 GB on the SSD
Generally this is correct except peak points when DB might temporary 
need some extra space. For compaction or other interim purposes. I've 
observed up to 2x increase in the lab. So allocating some extra space 
might be useful.
b) because of a), the recommendation reported in the doc saying that 
te block.db size should not be smaller than 4% of block is basically 
wrong.

I'd say this is very conservative estimate IMO.


Are there plans to change that in next releases ?
I am asking because I am going to buy new hardware and I'd like to 
understand if I should keep considering this 'constraint' when 
choosing the size of the SSD disks


Yes, we're working on a more intelligent DB space utilization scheme. 
Which will allow interim volume sizes to be useful.


Here is a PR which is pending final (hopefully) review: 
https://github.com/ceph/ceph/pull/29687




Thanks, Massimo


Thanks,

Igor


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nautilus Ceph Status Pools & Usage

2019-09-30 Thread Lazuardi Nasution
Hi Paul,

Thank you for this straight explanation. It is very helpful while waiting
for the fix.

Best regards,

On Mon, Sep 30, 2019, 16:38 Paul Emmerich  wrote:

> It's just a display bug in ceph -s:
>
> https://tracker.ceph.com/issues/40011
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Sun, Sep 29, 2019 at 4:41 PM Lazuardi Nasution
>  wrote:
> >
> > Hi,
> >
> > I'm starting with Nautilus and do create and delete some pools. When I
> > check with "ceph status" I find something weird with "pools" number
> > when tall pools have been deleted. I the meaning of "pools" number
> > different than Luminous? As there is no pool and PG, why there is
> > usage on "ceph status"?
> >
> > Best regards,
> >
> >   cluster:
> > id: e53af8e4-8ef7-48ad-ae4e-3d0486ba0d72
> > health: HEALTH_OK
> >
> >   services:
> > mon: 3 daemons, quorum c08-ctrl,c09-ctrl,c10-ctrl (age 3m)
> > mgr: c08-ctrl(active, since 7d), standbys: c09-ctrl, c10-ctrl
> > osd: 88 osds: 88 up (since 7d), 88 in (since 7d)
> >
> >   data:
> > pools:   7 pools, 0 pgs
> > objects: 0 objects, 0 B
> > usage:   1.6 TiB used, 262 TiB / 264 TiB avail
> > pgs:
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cluster network down

2019-09-30 Thread Lars Täuber
Hi!

What happens when the cluster network goes down completely?
Is the cluster silently using the public network without interruption, or does 
the admin has to act?

Thanks
Lars
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] please fix ceph-iscsi yum repo

2019-09-30 Thread Jason Dillaman
On Fri, Sep 27, 2019 at 5:18 AM Matthias Leopold
 wrote:
>
>
> Hi,
>
> I was positively surprised to to see ceph-iscsi-3.3 available today.
> Unfortunately there's an error when trying to install it from yum repo:
>
> ceph-iscsi-3.3-1.el7.noarch.rp FAILED
> 100%
> [==]
>   0.0 B/s | 200 kB  --:--:-- ETA
> http://download.ceph.com/ceph-iscsi/3/rpm/el7/noarch/ceph-iscsi-3.3-1.el7.noarch.rpm:
> [Errno -1] Package does not match intended download. Suggestion: run yum
> --enablerepo=ceph-iscsi clean metadata
>
> "yum --enablerepo=ceph-iscsi clean metadata" does not fix it
>
> I know there are other ways to install it, but since I'm close to
> putting my iscsi gateway into production I want to be "clean" (and I'm a
> bit impatient, sorry...)

This should hopefully be fixed already. Let us know if you are still
having issues.

> thx
> matthias
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster network down

2019-09-30 Thread Burkhard Linke

Hi,

On 9/30/19 2:46 PM, Lars Täuber wrote:

Hi!

What happens when the cluster network goes down completely?
Is the cluster silently using the public network without interruption, or does 
the admin has to act?


The cluster network is used for OSD heartbeats and backfilling/recovery 
traffic. If the heartbeats do not work anymore, the OSDs will start to 
report the other OSDs as down, resulting in a completely confused cluster...



I would avoid an extra cluster network unless it is absolutely necessary.


Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster network down

2019-09-30 Thread Lars Täuber
Mon, 30 Sep 2019 14:49:48 +0200
Burkhard Linke  ==> 
ceph-users@lists.ceph.com :
> Hi,
> 
> On 9/30/19 2:46 PM, Lars Täuber wrote:
> > Hi!
> >
> > What happens when the cluster network goes down completely?
> > Is the cluster silently using the public network without interruption, or 
> > does the admin has to act?  
> 
> The cluster network is used for OSD heartbeats and backfilling/recovery 
> traffic. If the heartbeats do not work anymore, the OSDs will start to 
> report the other OSDs as down, resulting in a completely confused cluster...
> 
> 
> I would avoid an extra cluster network unless it is absolutely necessary.
> 
> 
> Regards,
> 
> Burkhard

I don't remember where I read it, but it was told that the cluster is migrating 
its complete traffic over to the public network when the cluster networks goes 
down. So this seems not to be the case?

Thanks
Lars
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD Object Size for BlueStore OSD

2019-09-30 Thread Lazuardi Nasution
Hi Paul,

I have done some RBD benchmark on 7 nodes of 10 SATA HDDs and 3 nodes of
SATA SSD with various object size where the results are shared on URL below.

https://drive.google.com/drive/folders/1tTqCR9Tu-jSjVDl1Ls4rTev6gQlT8-03?usp=sharing

Any thoughts?

Best regards,

On Mon, Sep 30, 2019 at 4:10 PM Paul Emmerich 
wrote:

> It's sometimes faster if you reduce the object size, but I wouldn't go
> below 1 MB. Depends on your hardware and use case, 4 MB is a very good
> default, though.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Mon, Sep 30, 2019 at 6:44 AM Lazuardi Nasution
>  wrote:
> >
> > Hi,
> >
> > Is 4MB default RBD object size still relevant for BlueStore OSD? Any
> guideline for best RBD object size for BlueStore OSD especially on high
> performance media (SSD, NVME)?
> >
> > Best regards,
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cluster network down

2019-09-30 Thread Janne Johansson
>
> I don't remember where I read it, but it was told that the cluster is
> migrating its complete traffic over to the public network when the cluster
> networks goes down. So this seems not to be the case?
>

Be careful with generalizations like "when a network acts up, it will be
completely down and noticeably unreachable for all parts", since networks
can break in thousands of not-very-obvious ways which are not 0%-vs-100%
but somewhere in between.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Commit and Apply latency on nautilus

2019-09-30 Thread Marc Roos


What parameters are you exactly using? I want to do a similar test on 
luminous, before I upgrade to Nautilus. I have quite a lot (74+)

type_instance=Osd.opBeforeDequeueOpLat
type_instance=Osd.opBeforeQueueOpLat
type_instance=Osd.opLatency
type_instance=Osd.opPrepareLatency
type_instance=Osd.opProcessLatency
type_instance=Osd.opRLatency
type_instance=Osd.opRPrepareLatency
type_instance=Osd.opRProcessLatency
type_instance=Osd.opRwLatency
type_instance=Osd.opRwPrepareLatency
type_instance=Osd.opRwProcessLatency
type_instance=Osd.opWLatency
type_instance=Osd.opWPrepareLatency
type_instance=Osd.opWProcessLatency
type_instance=Osd.subopLatency
type_instance=Osd.subopWLatency
...
...





-Original Message-
From: Alex Litvak [mailto:alexander.v.lit...@gmail.com] 
Sent: zondag 29 september 2019 13:06
To: ceph-users@lists.ceph.com
Cc: ceph-de...@vger.kernel.org
Subject: [ceph-users] Commit and Apply latency on nautilus

Hello everyone,

I am running a number of parallel benchmark tests against the cluster 
that should be ready to go to production.
I enabled prometheus to monitor various information and while cluster 
stays healthy through the tests with no errors or slow requests,
I noticed an apply / commit latency jumping between 40 - 600 ms on 
multiple SSDs.  At the same time op_read and op_write are on average 
below 0.25 ms in the worth case scenario.

I am running nautilus 14.2.2, all bluestore, no separate NVME devices 
for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate 
Nytro 1551, osd spread across 6 nodes, running in 
containers.  Each node has plenty of RAM with utilization ~ 25 GB during 
the benchmark runs.

Here are benchmarks being run from 6 client systems in parallel, 
repeating the test for each block size in <4k,16k,128k,4M>.

On rbd mapped partition local to each client:

fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw 
--bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300 
--group_reporting --time_based --rwmixread=70

On mounted cephfs volume with each client storing test file(s) in own 
sub-directory:

fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw 
--bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300 
--group_reporting --time_based --rwmixread=70

dbench -t 30 30

Could you please let me know if huge jump in applied and committed 
latency is justified in my case and whether I can do anything to improve 
/ fix it.  Below is some additional cluster info.

Thank you,

root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETA AVAIL 
  %USE VAR  PGS STATUS
  6   ssd 1.74609  1.0 1.7 TiB  93 GiB  92 GiB 240 MiB  784 MiB 1.7 
TiB 5.21 0.90  44 up
12   ssd 1.74609  1.0 1.7 TiB  98 GiB  97 GiB 118 MiB  906 MiB 1.7 
TiB 5.47 0.95  40 up
18   ssd 1.74609  1.0 1.7 TiB 102 GiB 101 GiB 123 MiB  901 MiB 1.6 
TiB 5.73 0.99  47 up
24   ssd 3.49219  1.0 3.5 TiB 222 GiB 221 GiB 134 MiB  890 MiB 3.3 
TiB 6.20 1.07  96 up
30   ssd 3.49219  1.0 3.5 TiB 213 GiB 212 GiB 151 MiB  873 MiB 3.3 
TiB 5.95 1.03  93 up
35   ssd 3.49219  1.0 3.5 TiB 203 GiB 202 GiB 301 MiB  723 MiB 3.3 
TiB 5.67 0.98 100 up
  5   ssd 1.74609  1.0 1.7 TiB 103 GiB 102 GiB 123 MiB  901 MiB 1.6 
TiB 5.78 1.00  49 up
11   ssd 1.74609  1.0 1.7 TiB 109 GiB 108 GiB  63 MiB  961 MiB 1.6 
TiB 6.09 1.05  46 up
17   ssd 1.74609  1.0 1.7 TiB 104 GiB 103 GiB 205 MiB  819 MiB 1.6 
TiB 5.81 1.01  50 up
23   ssd 3.49219  1.0 3.5 TiB 210 GiB 209 GiB 168 MiB  856 MiB 3.3 
TiB 5.86 1.01  86 up
29   ssd 3.49219  1.0 3.5 TiB 204 GiB 203 GiB 272 MiB  752 MiB 3.3 
TiB 5.69 0.98  92 up
34   ssd 3.49219  1.0 3.5 TiB 198 GiB 197 GiB 295 MiB  729 MiB 3.3 
TiB 5.54 0.96  85 up
  4   ssd 1.74609  1.0 1.7 TiB 119 GiB 118 GiB  16 KiB 1024 MiB 1.6 
TiB 6.67 1.15  50 up
10   ssd 1.74609  1.0 1.7 TiB  95 GiB  94 GiB 183 MiB  841 MiB 1.7 
TiB 5.31 0.92  46 up
16   ssd 1.74609  1.0 1.7 TiB 102 GiB 101 GiB 122 MiB  902 MiB 1.6 
TiB 5.72 0.99  50 up
22   ssd 3.49219  1.0 3.5 TiB 218 GiB 217 GiB 109 MiB  915 MiB 3.3 
TiB 6.11 1.06  91 up
28   ssd 3.49219  1.0 3.5 TiB 198 GiB 197 GiB 343 MiB  681 MiB 3.3 
TiB 5.54 0.96  95 up
33   ssd 3.49219  1.0 3.5 TiB 198 GiB 196 GiB 297 MiB 1019 MiB 3.3 
TiB 5.53 0.96  85 up
  1   ssd 1.74609  1.0 1.7 TiB 101 GiB 100 GiB 222 MiB  802 MiB 1.6 
TiB 5.63 0.97  49 up
  7   ssd 1.74609  1.0 1.7 TiB 102 GiB 101 GiB 153 MiB  871 MiB 1.6 
TiB 5.69 0.99  46 up
13   ssd 1.74609  1.0 1.7 TiB 106 GiB 105 GiB  67 MiB  957 MiB 1.6 
TiB 5.96 1.03  42 up
19   ssd 3.49219  1.0 3.5 TiB 206 GiB 205 GiB 179 MiB  845 MiB 3.3 
TiB 5.77 1.00  83 up
25   ssd 3.49219  1.0 3.5 TiB 195 GiB 194 GiB 352 MiB  672 MiB 3.3 
TiB 5.45 0.94  97 up
31   ssd 3.49219  1.0 3.5 TiB 201 GiB 200 GiB 305 MiB  719 MiB 3.3 
TiB 5.6

Re: [ceph-users] Commit and Apply latency on nautilus

2019-09-30 Thread Sasha Litvak
In my case, I am using premade Prometheus sourced dashboards in grafana.

For individual latency, the query looks like that

 irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
(ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
(ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])

The other ones use

ceph_osd_commit_latency_ms
ceph_osd_apply_latency_ms

and graph the distribution of it over time

Also, average OSD op latency

avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) /
rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) /
rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)

Average OSD apply + commit latency
avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
avg(ceph_osd_commit_latency_ms{cluster="$cluster"})


On Mon, Sep 30, 2019 at 11:13 AM Marc Roos  wrote:

>
> What parameters are you exactly using? I want to do a similar test on
> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
>
> type_instance=Osd.opBeforeDequeueOpLat
> type_instance=Osd.opBeforeQueueOpLat
> type_instance=Osd.opLatency
> type_instance=Osd.opPrepareLatency
> type_instance=Osd.opProcessLatency
> type_instance=Osd.opRLatency
> type_instance=Osd.opRPrepareLatency
> type_instance=Osd.opRProcessLatency
> type_instance=Osd.opRwLatency
> type_instance=Osd.opRwPrepareLatency
> type_instance=Osd.opRwProcessLatency
> type_instance=Osd.opWLatency
> type_instance=Osd.opWPrepareLatency
> type_instance=Osd.opWProcessLatency
> type_instance=Osd.subopLatency
> type_instance=Osd.subopWLatency
> ...
> ...
>
>
>
>
>
> -Original Message-
> From: Alex Litvak [mailto:alexander.v.lit...@gmail.com]
> Sent: zondag 29 september 2019 13:06
> To: ceph-users@lists.ceph.com
> Cc: ceph-de...@vger.kernel.org
> Subject: [ceph-users] Commit and Apply latency on nautilus
>
> Hello everyone,
>
> I am running a number of parallel benchmark tests against the cluster
> that should be ready to go to production.
> I enabled prometheus to monitor various information and while cluster
> stays healthy through the tests with no errors or slow requests,
> I noticed an apply / commit latency jumping between 40 - 600 ms on
> multiple SSDs.  At the same time op_read and op_write are on average
> below 0.25 ms in the worth case scenario.
>
> I am running nautilus 14.2.2, all bluestore, no separate NVME devices
> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate
> Nytro 1551, osd spread across 6 nodes, running in
> containers.  Each node has plenty of RAM with utilization ~ 25 GB during
> the benchmark runs.
>
> Here are benchmarks being run from 6 client systems in parallel,
> repeating the test for each block size in <4k,16k,128k,4M>.
>
> On rbd mapped partition local to each client:
>
> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> --group_reporting --time_based --rwmixread=70
>
> On mounted cephfs volume with each client storing test file(s) in own
> sub-directory:
>
> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> --group_reporting --time_based --rwmixread=70
>
> dbench -t 30 30
>
> Could you please let me know if huge jump in applied and committed
> latency is justified in my case and whether I can do anything to improve
> / fix it.  Below is some additional cluster info.
>
> Thank you,
>
> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETA AVAIL
>   %USE VAR  PGS STATUS
>   6   ssd 1.74609  1.0 1.7 TiB  93 GiB  92 GiB 240 MiB  784 MiB 1.7
> TiB 5.21 0.90  44 up
> 12   ssd 1.74609  1.0 1.7 TiB  98 GiB  97 GiB 118 MiB  906 MiB 1.7
> TiB 5.47 0.95  40 up
> 18   ssd 1.74609  1.0 1.7 TiB 102 GiB 101 GiB 123 MiB  901 MiB 1.6
> TiB 5.73 0.99  47 up
> 24   ssd 3.49219  1.0 3.5 TiB 222 GiB 221 GiB 134 MiB  890 MiB 3.3
> TiB 6.20 1.07  96 up
> 30   ssd 3.49219  1.0 3.5 TiB 213 GiB 212 GiB 151 MiB  873 MiB 3.3
> TiB 5.95 1.03  93 up
> 35   ssd 3.49219  1.0 3.5 TiB 203 GiB 202 GiB 301 MiB  723 MiB 3.3
> TiB 5.67 0.98 100 up
>   5   ssd 1.74609  1.0 1.7 TiB 103 GiB 102 GiB 123 MiB  901 MiB 1.6
> TiB 5.78 1.00  49 up
> 11   ssd 1.74609  1.0 1.7 TiB 109 GiB 108 GiB  63 MiB  961 MiB 1.6
> TiB 6.09 1.05  46 up
> 17   ssd 1.74609  1.0 1.7 TiB 104 GiB 103 GiB 205 MiB  819 MiB 1.6
> TiB 5.81 1.01  50 up
> 23   ssd 3.49219  1.0 3.5 TiB 210 GiB 209 GiB 168 MiB  856 MiB 3.3
> TiB 5.86 1.01  86 up
> 29   ssd 3.49219  1.0 3.5 TiB 204 GiB 203 GiB 272 MiB  752 MiB 3.3
> TiB 5.69 0.98  92 up
> 34   ssd 3.49219  1.0 3.5 TiB 198 GiB 197 GiB 295 MiB  729 MiB 3.3
> TiB 5.54 0.96  85 up
>   4   ssd 1.74609  1.0 1.7 TiB 119 GiB 1

[ceph-users] NFS

2019-09-30 Thread Brent Kennedy
Wondering if there are any documents for standing up NFS with an existing
ceph cluster.  We don't use ceph-ansible or any other tools besides
ceph-deploy.  The iscsi directions were pretty good once I got past the
dependencies.  

 

I saw the one based on Rook, but it doesn't seem to apply to our setup of
ceph vms with physical hosts doing OSDs.  The official ceph documents talk
about using ganesha but doesn't seem to dive into the details of what the
process is for getting it online.  We don't use cephfs, so that's not setup
either.  The basic docs seem to note this is required.  Seems my google-fu
is failing me when I try to find a more definitive guide.

 

The servers are all centos 7 with the latest updates.

 

Any guidance would be greatly appreciated!

 

Regards,

-Brent

 

Existing Clusters:

Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi
gateways ( all virtual on nvme )

US Production(HDD): Nautilus 14.2.2 with 13 osd servers, 3 mons, 4 gateways,
2 iscsi gateways

UK Production(HDD): Nautilus 14.2.2 with 25 osd servers, 3 mons/man, 3
gateways behind

US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons/man, 3
gateways, 2 iscsi gateways

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS

2019-09-30 Thread Marc Roos
 
Just install these

http://download.ceph.com/nfs-ganesha/
nfs-ganesha-rgw-2.7.1-0.1.el7.x86_64
nfs-ganesha-vfs-2.7.1-0.1.el7.x86_64
libnfsidmap-0.25-19.el7.x86_64
nfs-ganesha-mem-2.7.1-0.1.el7.x86_64
nfs-ganesha-xfs-2.7.1-0.1.el7.x86_64
nfs-ganesha-2.7.1-0.1.el7.x86_64
nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64


And export your cephfs like this:
EXPORT {
Export_Id = 10;
Path = /nfs/cblr-repos;
Pseudo = /cblr-repos;
FSAL { Name = CEPH; User_Id = "cephfs.nfs.cblr"; 
Secret_Access_Key = "xxx"; }
Disable_ACL = FALSE;
CLIENT { Clients = 192.168.10.2; access_type = "RW"; }
CLIENT { Clients = 192.168.10.253; }
}


-Original Message-
From: Brent Kennedy [mailto:bkenn...@cfl.rr.com] 
Sent: maandag 30 september 2019 20:56
To: 'ceph-users'
Subject: [ceph-users] NFS

Wondering if there are any documents for standing up NFS with an 
existing ceph cluster.  We don’t use ceph-ansible or any other tools 
besides ceph-deploy.  The iscsi directions were pretty good once I got 
past the dependencies.  

 

I saw the one based on Rook, but it doesn’t seem to apply to our setup 
of ceph vms with physical hosts doing OSDs.  The official ceph documents 
talk about using ganesha but doesn’t seem to dive into the details of 
what the process is for getting it online.  We don’t use cephfs, so 
that’s not setup either.  The basic docs seem to note this is required. 
 Seems my google-fu is failing me when I try to find a more definitive 
guide.

 

The servers are all centos 7 with the latest updates.

 

Any guidance would be greatly appreciated!

 

Regards,

-Brent

 

Existing Clusters:

Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi 
gateways ( all virtual on nvme )

US Production(HDD): Nautilus 14.2.2 with 13 osd servers, 3 mons, 4 
gateways, 2 iscsi gateways

UK Production(HDD): Nautilus 14.2.2 with 25 osd servers, 3 mons/man, 3 
gateways behind

US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons/man, 3 
gateways, 2 iscsi gateways

 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Commit and Apply latency on nautilus

2019-09-30 Thread Paul Emmerich
BTW: commit and apply latency are the exact same thing since
BlueStore, so don't bother looking at both.

In fact you should mostly be looking at the op_*_latency counters


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak
 wrote:
>
> In my case, I am using premade Prometheus sourced dashboards in grafana.
>
> For individual latency, the query looks like that
>
>  irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on (ceph_daemon) 
> irate(ceph_osd_op_r_latency_count[1m])
> irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on (ceph_daemon) 
> irate(ceph_osd_op_w_latency_count[1m])
>
> The other ones use
>
> ceph_osd_commit_latency_ms
> ceph_osd_apply_latency_ms
>
> and graph the distribution of it over time
>
> Also, average OSD op latency
>
> avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) / 
> rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
> avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) / 
> rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)
>
> Average OSD apply + commit latency
> avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
> avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
>
>
> On Mon, Sep 30, 2019 at 11:13 AM Marc Roos  wrote:
>>
>>
>> What parameters are you exactly using? I want to do a similar test on
>> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
>>
>> type_instance=Osd.opBeforeDequeueOpLat
>> type_instance=Osd.opBeforeQueueOpLat
>> type_instance=Osd.opLatency
>> type_instance=Osd.opPrepareLatency
>> type_instance=Osd.opProcessLatency
>> type_instance=Osd.opRLatency
>> type_instance=Osd.opRPrepareLatency
>> type_instance=Osd.opRProcessLatency
>> type_instance=Osd.opRwLatency
>> type_instance=Osd.opRwPrepareLatency
>> type_instance=Osd.opRwProcessLatency
>> type_instance=Osd.opWLatency
>> type_instance=Osd.opWPrepareLatency
>> type_instance=Osd.opWProcessLatency
>> type_instance=Osd.subopLatency
>> type_instance=Osd.subopWLatency
>> ...
>> ...
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Alex Litvak [mailto:alexander.v.lit...@gmail.com]
>> Sent: zondag 29 september 2019 13:06
>> To: ceph-users@lists.ceph.com
>> Cc: ceph-de...@vger.kernel.org
>> Subject: [ceph-users] Commit and Apply latency on nautilus
>>
>> Hello everyone,
>>
>> I am running a number of parallel benchmark tests against the cluster
>> that should be ready to go to production.
>> I enabled prometheus to monitor various information and while cluster
>> stays healthy through the tests with no errors or slow requests,
>> I noticed an apply / commit latency jumping between 40 - 600 ms on
>> multiple SSDs.  At the same time op_read and op_write are on average
>> below 0.25 ms in the worth case scenario.
>>
>> I am running nautilus 14.2.2, all bluestore, no separate NVME devices
>> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate
>> Nytro 1551, osd spread across 6 nodes, running in
>> containers.  Each node has plenty of RAM with utilization ~ 25 GB during
>> the benchmark runs.
>>
>> Here are benchmarks being run from 6 client systems in parallel,
>> repeating the test for each block size in <4k,16k,128k,4M>.
>>
>> On rbd mapped partition local to each client:
>>
>> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
>> --group_reporting --time_based --rwmixread=70
>>
>> On mounted cephfs volume with each client storing test file(s) in own
>> sub-directory:
>>
>> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
>> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
>> --group_reporting --time_based --rwmixread=70
>>
>> dbench -t 30 30
>>
>> Could you please let me know if huge jump in applied and committed
>> latency is justified in my case and whether I can do anything to improve
>> / fix it.  Below is some additional cluster info.
>>
>> Thank you,
>>
>> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd df
>> ID CLASS WEIGHT  REWEIGHT SIZERAW USE DATAOMAPMETA AVAIL
>>   %USE VAR  PGS STATUS
>>   6   ssd 1.74609  1.0 1.7 TiB  93 GiB  92 GiB 240 MiB  784 MiB 1.7
>> TiB 5.21 0.90  44 up
>> 12   ssd 1.74609  1.0 1.7 TiB  98 GiB  97 GiB 118 MiB  906 MiB 1.7
>> TiB 5.47 0.95  40 up
>> 18   ssd 1.74609  1.0 1.7 TiB 102 GiB 101 GiB 123 MiB  901 MiB 1.6
>> TiB 5.73 0.99  47 up
>> 24   ssd 3.49219  1.0 3.5 TiB 222 GiB 221 GiB 134 MiB  890 MiB 3.3
>> TiB 6.20 1.07  96 up
>> 30   ssd 3.49219  1.0 3.5 TiB 213 GiB 212 GiB 151 MiB  873 MiB 3.3
>> TiB 5.95 1.03  93 up
>> 35   ssd 3.49219  1.0 3.5 TiB 203 GiB 202 GiB 301 MiB  723 MiB 3.3
>> TiB 5.67 0.98 100 up
>>   5   ssd 1.74609  1.0 1.7 TiB 103 GiB 102 GiB 123 MiB  901 MiB 1.6
>> TiB 5.78 1.00  49  

[ceph-users] best way to delete all OSDs and start over

2019-09-30 Thread Shawn A Kwang
I am wondering what the best way is of deleting a cluster, removing all
the OSDs, and basically start over. I plan to create a few ceph test
clusters to determine what works best in our use-case. There is no real
data being stored, so I don't care about data-loss.

I have a cephfs setup on top of two pools: data and metadata. Presumably
I can remove this easily with 'ceph fs rm'

1. Do I need to delete the OSD pools?
2. How to I remove the OSDs from the cluster without ceph doing what it
does and rebalancing data between the remaining OSDs?

I read the Manually Remove OSD documentation page,
https://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual,
but I want to remove ALL OSDs from the cluster? Is this still the right
set of steps/commands?

Thanks for any insight for a ceph newbie.

PS - If it matters the servers running ceph-mon and ceph-mgr are on
separate computers than the servers running ceph-osd.

Sincerely,
Shawn Kwang
-- 
Associate Scientist
Center for Gravitation, Cosmology, and Astrophysics
University of Wisconsin-Milwaukee
office: +1 414 229 4960



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Commit and Apply latency on nautilus

2019-09-30 Thread Sasha Litvak
At this point, I ran out of ideas.  I changed nr_requests and readahead
parameters to 128->1024 and 128->4096, tuned nodes to
performance-throughput.  However, I still get high latency during benchmark
testing.  I attempted to disable cache on ssd

for i in {a..f}; do hdparm -W 0 -A 0 /dev/sd$i; done

and I think it make things not better at all.  I have H740 and H730
controllers with drives in HBA mode.

Other them converting them one by one to RAID0 I am not sure what else I
can try.

Any suggestions?


On Mon, Sep 30, 2019 at 2:45 PM Paul Emmerich 
wrote:

> BTW: commit and apply latency are the exact same thing since
> BlueStore, so don't bother looking at both.
>
> In fact you should mostly be looking at the op_*_latency counters
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Mon, Sep 30, 2019 at 8:46 PM Sasha Litvak
>  wrote:
> >
> > In my case, I am using premade Prometheus sourced dashboards in grafana.
> >
> > For individual latency, the query looks like that
> >
> >  irate(ceph_osd_op_r_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
> (ceph_daemon) irate(ceph_osd_op_r_latency_count[1m])
> > irate(ceph_osd_op_w_latency_sum{ceph_daemon=~"$osd"}[1m]) / on
> (ceph_daemon) irate(ceph_osd_op_w_latency_count[1m])
> >
> > The other ones use
> >
> > ceph_osd_commit_latency_ms
> > ceph_osd_apply_latency_ms
> >
> > and graph the distribution of it over time
> >
> > Also, average OSD op latency
> >
> > avg(rate(ceph_osd_op_r_latency_sum{cluster="$cluster"}[5m]) /
> rate(ceph_osd_op_r_latency_count{cluster="$cluster"}[5m]) >= 0)
> > avg(rate(ceph_osd_op_w_latency_sum{cluster="$cluster"}[5m]) /
> rate(ceph_osd_op_w_latency_count{cluster="$cluster"}[5m]) >= 0)
> >
> > Average OSD apply + commit latency
> > avg(ceph_osd_apply_latency_ms{cluster="$cluster"})
> > avg(ceph_osd_commit_latency_ms{cluster="$cluster"})
> >
> >
> > On Mon, Sep 30, 2019 at 11:13 AM Marc Roos 
> wrote:
> >>
> >>
> >> What parameters are you exactly using? I want to do a similar test on
> >> luminous, before I upgrade to Nautilus. I have quite a lot (74+)
> >>
> >> type_instance=Osd.opBeforeDequeueOpLat
> >> type_instance=Osd.opBeforeQueueOpLat
> >> type_instance=Osd.opLatency
> >> type_instance=Osd.opPrepareLatency
> >> type_instance=Osd.opProcessLatency
> >> type_instance=Osd.opRLatency
> >> type_instance=Osd.opRPrepareLatency
> >> type_instance=Osd.opRProcessLatency
> >> type_instance=Osd.opRwLatency
> >> type_instance=Osd.opRwPrepareLatency
> >> type_instance=Osd.opRwProcessLatency
> >> type_instance=Osd.opWLatency
> >> type_instance=Osd.opWPrepareLatency
> >> type_instance=Osd.opWProcessLatency
> >> type_instance=Osd.subopLatency
> >> type_instance=Osd.subopWLatency
> >> ...
> >> ...
> >>
> >>
> >>
> >>
> >>
> >> -Original Message-
> >> From: Alex Litvak [mailto:alexander.v.lit...@gmail.com]
> >> Sent: zondag 29 september 2019 13:06
> >> To: ceph-users@lists.ceph.com
> >> Cc: ceph-de...@vger.kernel.org
> >> Subject: [ceph-users] Commit and Apply latency on nautilus
> >>
> >> Hello everyone,
> >>
> >> I am running a number of parallel benchmark tests against the cluster
> >> that should be ready to go to production.
> >> I enabled prometheus to monitor various information and while cluster
> >> stays healthy through the tests with no errors or slow requests,
> >> I noticed an apply / commit latency jumping between 40 - 600 ms on
> >> multiple SSDs.  At the same time op_read and op_write are on average
> >> below 0.25 ms in the worth case scenario.
> >>
> >> I am running nautilus 14.2.2, all bluestore, no separate NVME devices
> >> for WAL/DB, 6 SSDs per node(Dell PowerEdge R440) with all drives Seagate
> >> Nytro 1551, osd spread across 6 nodes, running in
> >> containers.  Each node has plenty of RAM with utilization ~ 25 GB during
> >> the benchmark runs.
> >>
> >> Here are benchmarks being run from 6 client systems in parallel,
> >> repeating the test for each block size in <4k,16k,128k,4M>.
> >>
> >> On rbd mapped partition local to each client:
> >>
> >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> >> --group_reporting --time_based --rwmixread=70
> >>
> >> On mounted cephfs volume with each client storing test file(s) in own
> >> sub-directory:
> >>
> >> fio --name=randrw --ioengine=libaio --iodepth=4 --rw=randrw
> >> --bs=<4k,16k,128k,4M> --direct=1 --size=2G --numjobs=8 --runtime=300
> >> --group_reporting --time_based --rwmixread=70
> >>
> >> dbench -t 30 30
> >>
> >> Could you please let me know if huge jump in applied and committed
> >> latency is justified in my case and whether I can do anything to improve
> >> / fix it.  Below is some additional cluster info.
> >>
> >> Thank you,
> >>
> >> root@storage2n2-la:~# podman exec -it ceph-mon-storage2n2-la ceph osd
> df
> >> ID 

Re: [ceph-users] cluster network down

2019-09-30 Thread Lars Täuber
Mon, 30 Sep 2019 15:21:18 +0200
Janne Johansson  ==> Lars Täuber  :
> >
> > I don't remember where I read it, but it was told that the cluster is
> > migrating its complete traffic over to the public network when the cluster
> > networks goes down. So this seems not to be the case?
> >  
> 
> Be careful with generalizations like "when a network acts up, it will be
> completely down and noticeably unreachable for all parts", since networks
> can break in thousands of not-very-obvious ways which are not 0%-vs-100%
> but somewhere in between.
> 

Ok. I ask my question in a new way.
What does ceph do, when I switch off all switches of the cluster network?
Does ceph handle this silently without interruption? Does the heartbeat systems 
use the public network as a failover automatically?

Thanks
Lars
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD crashed during the fio test

2019-09-30 Thread Alex Litvak

Hellow everyone,

Can you shed the line on the cause of the crash?  Could actually client request 
trigger it?

Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 
7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block) aio_submit 
retries 16
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.867 
7f093d71e700 -1 bdev(0x55b72c156000 /var/lib/ceph/osd/ceph-17/block)  aio 
submit got (11) Resource temporarily unavailable
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: 
In fun
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: 
757: F

Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  ceph version 14.2.2 
(4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) 
[0x55b71f668cf4]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  2: 
(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char 
const*, ...)+0) [0x55b71f668ec2]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  3: 
(KernelDevice::aio_submit(IOContext*)+0x701) [0x55b71fd61ca1]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  4: 
(BlueStore::_txc_aio_submit(BlueStore::TransContext*)+0x42) [0x55b71fc29892]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  5: 
(BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x42b) [0x55b71fc496ab]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  6: (BlueStore::queue_transactions(boost::intrusive_ptr&, std::vectorstd::allocator >&, boost::intrusive_ptr, ThreadPool::T
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  7: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector >&, 
boost::intrusive_ptr)+0x54) [0x55b71f9b1b84]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  8: (ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptrstd::default_delete >&&, eversion_t const&, eversion_t const&, s

Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  9: 
(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, 
PrimaryLogPG::OpContext*)+0xf12) [0x55b71f90e322]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  10: 
(PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xfae) [0x55b71f969b7e]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  11: 
(PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x3965) [0x55b71f96de15]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  12: 
(PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0xbd4) [0x55b71f96f8a4]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  13: 
(OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, 
ThreadPool::TPHandle&)+0x1a9) [0x55b71f7a9ea9]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  14: (PGOpItem::run(OSD*, OSDShard*, 
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x62) [0x55b71fa475d2]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  15: 
(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x9f4) 
[0x55b71f7c6ef4]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  16: 
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) 
[0x55b71fdc5ce3]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  17: 
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55b71fdc8d80]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  18: (()+0x7dd5) 
[0x7f0971da9dd5]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  19: (clone()+0x6d) 
[0x7f0970c7002d]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 2019-09-30 22:52:58.879 7f093d71e700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/os/bluestore/KernelDevice.cc: 
757: F

Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  ceph version 14.2.2 
(4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) 
[0x55b71f668cf4]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  2: 
(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char 
const*, ...)+0) [0x55b71f668ec2]
Sep 30 22:52:58 storage2n2-la ceph-osd-17[10770]:  3: 
(KernelDevice::aio_submit(IOContext*)+0x701) [0