Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-24 Thread Daniel Carrasco
This is what i get:




:/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump
2018-07-24 09:05:19.350720 7fc562ffd700  0 client.1452545 ms_handle_reset
on 10.22.0.168:6800/1685786126
2018-07-24 09:05:29.103903 7fc563fff700  0 client.1452548 ms_handle_reset
on 10.22.0.168:6800/1685786126
mds.kavehome-mgto-pro-fs01 dumping heap profile now.

MALLOC:  760199640 (  725.0 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +246962320 (  235.5 MiB) Bytes in central cache freelist
MALLOC: + 43933664 (   41.9 MiB) Bytes in transfer cache freelist
MALLOC: + 41012664 (   39.1 MiB) Bytes in thread cache freelists
MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   1102295200 ( 1051.2 MiB) Actual memory used (physical + swap)
MALLOC: +   4268335104 ( 4070.6 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
MALLOC:
MALLOC:  33027  Spans in use
MALLOC: 19  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.





:/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
2018-07-24 09:14:25.747706 7f94f700  0 client.1452578 ms_handle_reset
on 10.22.0.168:6800/1685786126
2018-07-24 09:14:25.754034 7f95057fa700  0 client.1452581 ms_handle_reset
on 10.22.0.168:6800/1685786126
mds.kavehome-mgto-pro-fs01 tcmalloc heap
stats:
MALLOC:  960649328 (  916.1 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +108867288 (  103.8 MiB) Bytes in central cache freelist
MALLOC: + 37179424 (   35.5 MiB) Bytes in transfer cache freelist
MALLOC: + 40143000 (   38.3 MiB) Bytes in thread cache freelists
MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   1157025952 ( 1103.4 MiB) Actual memory used (physical + swap)
MALLOC: +   4213604352 ( 4018.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
MALLOC:
MALLOC:  33028  Spans in use
MALLOC: 19  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.




After heap release:
:/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
2018-07-24 09:15:28.540203 7f2f7affd700  0 client.1443339 ms_handle_reset
on 10.22.0.168:6800/1685786126
2018-07-24 09:15:28.547153 7f2f7bfff700  0 client.1443342 ms_handle_reset
on 10.22.0.168:6800/1685786126
mds.kavehome-mgto-pro-fs01 tcmalloc heap
stats:
MALLOC:  710315776 (  677.4 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +246471880 (  235.1 MiB) Bytes in central cache freelist
MALLOC: + 40802848 (   38.9 MiB) Bytes in transfer cache freelist
MALLOC: + 38689304 (   36.9 MiB) Bytes in thread cache freelists
MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =   1046466720 (  998.0 MiB) Actual memory used (physical + swap)
MALLOC: +   4324163584 ( 4123.8 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
MALLOC:
MALLOC:  33177  Spans in use
MALLOC: 19  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.


The other commands fails with a curl error:
Failed to get profile: curl 'http:///pprof/profile?seconds=30' >
/root/pprof/.tmp.ceph-mds.1532416424.:


Greetings!!

2018-07-24 5:35 GMT+02:00 Yan, Zheng :

> could you profile memory allocation of mds
>
> http://docs.c

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Robert Sander
On 24.07.2018 07:02, Satish Patel wrote:
> My 5 node ceph cluster is ready for production, now i am looking for
> good monitoring tool (Open source), what majority of folks using in
> their production?

Some people already use Prometheus and the exporter from the Ceph Mgr.

Some use more traditional monitoring systems (like me). I have written a
Ceph plugin for the Check_MK monitoring system:

https://github.com/HeinleinSupport/check_mk/tree/master/ceph

Caution: It will not scale to hundreds of OSDs as it invokes the Ceph
CLI tools to gather monitoring data on every node. This takes some time.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Inconsistent PG could not be repaired

2018-07-24 Thread Arvydas Opulskis
Hello, Cephers,

after trying different repair approaches I am out of ideas how to repair
inconsistent PG. I hope, someones sharp eye will notice what I overlooked.

Some info about cluster:
Centos 7.4
Jewel 10.2.10
Pool size 2 (yes, I know it's a very bad choice)
Pool with inconsistent PG: .rgw.buckets

After routine deep-scrub I've found PG 26.c3f in inconsistent status. While
running "ceph pg repair 26.c3f" command and monitoring "ceph -w" log, I
noticed these errors:

2018-07-24 08:28:06.517042 osd.36 [ERR] 26.c3f shard 30: soid
26:fc32a1f1:::default.142609570.87_20180206.093111%2frepositories%2fnuget-local%2fApplication%2fCompany.Application.Api%2fCompany.Application.Api.1.1.1.nupkg.artifactory-metadata%2fproperties.xml:head
data_digest 0x540e4f8b != data_digest 0x49a34c1f from auth oi
26:e261561a:::default.168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data-segmentation.application.131.xxx-jvm.cpu.load%2f2018-05-05T03%3a51%3a39+00%3a00.sha1:head(167828'216051
client.179334015.0:1847715760 dirty|data_digest|omap_digest s 40 uv 216051
dd 49a34c1f od  alloc_hint [0 0])

2018-07-24 08:28:06.517118 osd.36 [ERR] 26.c3f shard 36: soid
26:fc32a1f1:::default.142609570.87_20180206.093111%2frepositories%2fnuget-local%2fApplication%2fCompany.Application.Api%2fCompany.Application.Api.1.1.1.nupkg.artifactory-metadata%2fproperties.xml:head
data_digest 0x540e4f8b != data_digest 0x49a34c1f from auth oi
26:e261561a:::default.168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data-segmentation.application.131.xxx-jvm.cpu.load%2f2018-05-05T03%3a51%3a39+00%3a00.sha1:head(167828'216051
client.179334015.0:1847715760 dirty|data_digest|omap_digest s 40 uv 216051
dd 49a34c1f od  alloc_hint [0 0])

2018-07-24 08:28:06.517122 osd.36 [ERR] 26.c3f soid
26:fc32a1f1:::default.142609570.87_20180206.093111%2frepositories%2fnuget-local%2fApplication%2fCompany.Application.Api%2fCompany.Application.Api.1.1.1.nupkg.artifactory-metadata%2fproperties.xml:head:
failed to pick suitable auth object

...and same errors about another object on same PG.

Repair failed, so I checked inconsistencies "rados list-inconsistent-obj
26.c3f --format=json-pretty":

{
"epoch": 178403,
"inconsistents": [
{
"object": {
"name":
"default.142609570.87_20180203.020047\/repositories\/docker-local\/yyy\/company.yyy.api.assets\/1.2.4\/sha256__ce41e5246ead8bddd2a2b5bbb863db250f328be9dc5c3041481d778a32f8130d",
"nspace": "",
"locator": "",
"snap": "head",
"version": 217749
},
"errors": [],
"union_shard_errors": [
"data_digest_mismatch_oi"
],
"selected_object_info":
"26:f4ce1748:::default.168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data-segmentation.application.131.xxx-jvm.cpu.load%2f2018-05-08T03%3a45%3a15+00%3a00.sha1:head(167944'217749
client.177936559.0:1884719302 dirty|data_digest|omap_digest s 40 uv 217749
dd 422f251b od  alloc_hint [0 0])",
"shards": [
{
"osd": 30,
"errors": [
"data_digest_mismatch_oi"
],
"size": 40,
"omap_digest": "0x",
"data_digest": "0x551c282f"
},
{
"osd": 36,
"errors": [
"data_digest_mismatch_oi"
],
"size": 40,
"omap_digest": "0x",
"data_digest": "0x551c282f"
}
]
},
{
"object": {
"name":
"default.142609570.87_20180206.093111\/repositories\/nuget-local\/Application\/Company.Application.Api\/Company.Application.Api.1.1.1.nupkg.artifactory-metadata\/properties.xml",
"nspace": "",
"locator": "",
"snap": "head",
"version": 216051
},
"errors": [],
"union_shard_errors": [
"data_digest_mismatch_oi"
],
"selected_object_info":
"26:e261561a:::default.168602061.10_team-xxx.xxx-jobs.H6.HADOOP.data-segmentation.application.131.xxx-jvm.cpu.load%2f2018-05-05T03%3a51%3a39+00%3a00.sha1:head(167828'216051
client.179334015.0:1847715760 dirty|data_digest|omap_digest s 40 uv 216051
dd 49a34c1f od  alloc_hint [0 0])",
"shards": [
{
"osd": 30,
"errors": [
"data_digest_mismatch_oi"
],
"size": 40,
"omap_digest": "0x",
"data_digest": "0x540e4f8b"
},
{
"osd": 36,
"errors": [
"data_digest_mismatch_oi"
],
   

Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Mateusz Skala (UST, POL)
Thank You for help, it is exactly that I need.

Regards

Mateusz



From: Jason Dillaman [mailto:jdill...@redhat.com]
Sent: Wednesday, July 18, 2018 1:28 PM
To: Mateusz Skala (UST, POL) 
Cc: dillaman ; ceph-users 
Subject: Re: [ceph-users] Read/write statistics per RBD image



Yup, on the host running librbd, you just need to enable the "admin socket" in 
your ceph.conf and then use "ceph --admin-daemon 
/path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").



See the example in this tip window [1] for how to configure for a "libvirt" 
CephX user.



[1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph



On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

   Thanks  for response.

   In ‘ceph perf dump’ there is no statistics for read/write operations on 
specific RBD image, only for osd and total client operations. I need to get 
statistics on one specific RBD image, to get top used images. It is possible?

   Regards

   Mateusz



   From: Jason Dillaman [mailto:jdill...@redhat.com]
   Sent: Tuesday, July 17, 2018 3:29 PM
   To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
   Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
   Subject: Re: [ceph-users] Read/write statistics per RBD image



   Yes, you just need to enable the "admin socket" in your ceph.conf and then 
use "ceph --admin-daemon /path/to/image/admin/socket.asok perf dump".



   On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

  Hi,

  It is possible to get statistics of issued reads/writes to specific RBD 
image? Best will be statistics like in /proc/diskstats in linux.

  Regards

  Mateusz

  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






   --

   Jason






   --

   Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Marc Roos


Just use collectd to start with. That is easiest with influxdb. However 
do not expect to much of the support on influxdb.


-Original Message-
From: Satish Patel [mailto:satish@gmail.com] 
Sent: dinsdag 24 juli 2018 7:02
To: ceph-users
Subject: [ceph-users] ceph cluster monitoring tool

My 5 node ceph cluster is ready for production, now i am looking for 
good monitoring tool (Open source), what majority of folks using in 
their production?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-24 Thread Yan, Zheng
I mean:

ceph tell mds.x heap start_profiler

... wait for some time

ceph tell mds.x heap stop_profiler

pprof --text  /usr/bin/ceph-mds
/var/log/ceph/ceph-mds.x.profile..heap




On Tue, Jul 24, 2018 at 3:18 PM Daniel Carrasco  wrote:
>
> This is what i get:
>
> 
> 
> 
> :/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump
> 2018-07-24 09:05:19.350720 7fc562ffd700  0 client.1452545 ms_handle_reset on 
> 10.22.0.168:6800/1685786126
> 2018-07-24 09:05:29.103903 7fc563fff700  0 client.1452548 ms_handle_reset on 
> 10.22.0.168:6800/1685786126
> mds.kavehome-mgto-pro-fs01 dumping heap profile now.
> 
> MALLOC:  760199640 (  725.0 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +246962320 (  235.5 MiB) Bytes in central cache freelist
> MALLOC: + 43933664 (   41.9 MiB) Bytes in transfer cache freelist
> MALLOC: + 41012664 (   39.1 MiB) Bytes in thread cache freelists
> MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =   1102295200 ( 1051.2 MiB) Actual memory used (physical + swap)
> MALLOC: +   4268335104 ( 4070.6 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
> MALLOC:
> MALLOC:  33027  Spans in use
> MALLOC: 19  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> Bytes released to the OS take up virtual address space but no physical memory.
>
>
> 
> 
> 
> :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> 2018-07-24 09:14:25.747706 7f94f700  0 client.1452578 ms_handle_reset on 
> 10.22.0.168:6800/1685786126
> 2018-07-24 09:14:25.754034 7f95057fa700  0 client.1452581 ms_handle_reset on 
> 10.22.0.168:6800/1685786126
> mds.kavehome-mgto-pro-fs01 tcmalloc heap 
> stats:
> MALLOC:  960649328 (  916.1 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +108867288 (  103.8 MiB) Bytes in central cache freelist
> MALLOC: + 37179424 (   35.5 MiB) Bytes in transfer cache freelist
> MALLOC: + 40143000 (   38.3 MiB) Bytes in thread cache freelists
> MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =   1157025952 ( 1103.4 MiB) Actual memory used (physical + swap)
> MALLOC: +   4213604352 ( 4018.4 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
> MALLOC:
> MALLOC:  33028  Spans in use
> MALLOC: 19  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> Bytes released to the OS take up virtual address space but no physical memory.
>
> 
> 
> 
> After heap release:
> :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> 2018-07-24 09:15:28.540203 7f2f7affd700  0 client.1443339 ms_handle_reset on 
> 10.22.0.168:6800/1685786126
> 2018-07-24 09:15:28.547153 7f2f7bfff700  0 client.1443342 ms_handle_reset on 
> 10.22.0.168:6800/1685786126
> mds.kavehome-mgto-pro-fs01 tcmalloc heap 
> stats:
> MALLOC:  710315776 (  677.4 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +246471880 (  235.1 MiB) Bytes in central cache freelist
> MALLOC: + 40802848 (   38.9 MiB) Bytes in transfer cache freelist
> MALLOC: + 38689304 (   36.9 MiB) Bytes in thread cache freelists
> MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =   1046466720 (  998.0 MiB) Actual memory used (physical + swap)
> MALLOC: +   4324163584 ( 4123.8 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
> MALLOC:
> MALLOC:  33177  Spans in use
> MALLOC: 19  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-24 Thread Daniel Carrasco
Hello,

How many time is neccesary?, because is a production environment and memory
profiler + low cache size because the problem, gives a lot of CPU usage
from OSD and MDS that makes it fails while profiler is running. Is there
any problem if is done in a low traffic time? (less usage and maybe it
don't fails, but maybe less info about usage).

Greetings!

2018-07-24 10:21 GMT+02:00 Yan, Zheng :

> I mean:
>
> ceph tell mds.x heap start_profiler
>
> ... wait for some time
>
> ceph tell mds.x heap stop_profiler
>
> pprof --text  /usr/bin/ceph-mds
> /var/log/ceph/ceph-mds.x.profile..heap
>
>
>
>
> On Tue, Jul 24, 2018 at 3:18 PM Daniel Carrasco 
> wrote:
> >
> > This is what i get:
> >
> > 
> > 
> > 
> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump
> > 2018-07-24 09:05:19.350720 7fc562ffd700  0 client.1452545
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > 2018-07-24 09:05:29.103903 7fc563fff700  0 client.1452548
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > mds.kavehome-mgto-pro-fs01 dumping heap profile now.
> > 
> > MALLOC:  760199640 (  725.0 MiB) Bytes in use by application
> > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> > MALLOC: +246962320 (  235.5 MiB) Bytes in central cache freelist
> > MALLOC: + 43933664 (   41.9 MiB) Bytes in transfer cache freelist
> > MALLOC: + 41012664 (   39.1 MiB) Bytes in thread cache freelists
> > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
> > MALLOC:   
> > MALLOC: =   1102295200 ( 1051.2 MiB) Actual memory used (physical + swap)
> > MALLOC: +   4268335104 ( 4070.6 MiB) Bytes released to OS (aka unmapped)
> > MALLOC:   
> > MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
> > MALLOC:
> > MALLOC:  33027  Spans in use
> > MALLOC: 19  Thread heaps in use
> > MALLOC:   8192  Tcmalloc page size
> > 
> > Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> > Bytes released to the OS take up virtual address space but no physical
> memory.
> >
> >
> > 
> > 
> > 
> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> > 2018-07-24 09:14:25.747706 7f94f700  0 client.1452578
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > 2018-07-24 09:14:25.754034 7f95057fa700  0 client.1452581
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> > MALLOC:  960649328 (  916.1 MiB) Bytes in use by application
> > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> > MALLOC: +108867288 (  103.8 MiB) Bytes in central cache freelist
> > MALLOC: + 37179424 (   35.5 MiB) Bytes in transfer cache freelist
> > MALLOC: + 40143000 (   38.3 MiB) Bytes in thread cache freelists
> > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
> > MALLOC:   
> > MALLOC: =   1157025952 ( 1103.4 MiB) Actual memory used (physical + swap)
> > MALLOC: +   4213604352 ( 4018.4 MiB) Bytes released to OS (aka unmapped)
> > MALLOC:   
> > MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
> > MALLOC:
> > MALLOC:  33028  Spans in use
> > MALLOC: 19  Thread heaps in use
> > MALLOC:   8192  Tcmalloc page size
> > 
> > Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> > Bytes released to the OS take up virtual address space but no physical
> memory.
> >
> > 
> > 
> > 
> > After heap release:
> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> > 2018-07-24 09:15:28.540203 7f2f7affd700  0 client.1443339
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > 2018-07-24 09:15:28.547153 7f2f7bfff700  0 client.1443342
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> > MALLOC:  710315776 (  677.4 MiB) Bytes in use by application
> > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> > MALLOC: +246471880 (  235.1 MiB) Bytes in central cache freelist
> > MALLOC: + 40802848 (   38.9 MiB) Bytes in transfer cache freelist
> > MALLOC: + 38689304 (   36.9 MiB) Bytes in thread cache freelis

Re: [ceph-users] RDMA question for ceph

2018-07-24 Thread Will Zhao
Ok, Thank you very much .   I will try to caontack them and  update the
problem. And in the meantime , I will try to debug it by just seting up one
mon and one osd.   Thanks again.

On Mon, Jul 23, 2018 at 3:49 PM John Hearns  wrote:

> Will, looking at the logs which you sent, the connection cannot be set up.
> I did try Googling for thse error messages, and I Could nto find anything
> definite.
> As an aside QP = Queue Pair which is the structure set up to transfer
> information across an IB network.
> Think of it like a TCP connection.
>
> I think you should contact Mellanos support over this one. They are really
> good guys.
>
>
>
> On 23 July 2018 at 08:14, Will Zhao  wrote:
>
>> Hi John:
>>this is the information  ibv_devinfo   gives .
>>
>> hca_id: mlx4_0
>> transport: InfiniBand (0)
>> fw_ver: 2.35.5100
>> node_guid: e41d:2d03:0072:ed70
>> sys_image_guid: e41d:2d03:0072:ed73
>> vendor_id: 0x02c9
>> vendor_part_id: 4099
>> hw_ver: 0x1
>> board_id: MT_1090110019
>> phys_port_cnt: 2
>> Device ports:
>> port: 1
>> state: PORT_DOWN (1)
>> max_mtu: 4096 (5)
>> active_mtu: 4096 (5)
>> sm_lid: 0
>> port_lid: 0
>> port_lmc: 0x00
>> link_layer: InfiniBand
>>
>> port: 2
>> state: PORT_ACTIVE (4)
>> max_mtu: 4096 (5)
>> active_mtu: 4096 (5)
>> sm_lid: 2
>> port_lid: 11
>> port_lmc: 0x00
>> link_layer: InfiniBand
>>
>>
>>
On Fri, Jul 20, 2018 at 7:09 PM John Hearns  wrote:
What does ibv_devinfo  give you?


On 20 July 2018 at 12:13, Will Zhao  wrote:
Now I add the option "debug ms = 20/20" to ceph.conf global section to see more 
details about the errors, this time "ceph -s" shows thousands of lines, here 
are some log I paste from the results:


2018-07-20 16:12:49.994715 7f3a3be8e700 20 Infiniband verify_prereq 
ms_async_rdma_enable_hugepage value is: 0

2018-07-20 16:12:49.994723 7f3a3be8e700 20 Infiniband Infiniband constructing 
Infiniband...

2018-07-20 16:12:49.994748 7f3a3be8e700 20 RDMAStack RDMAStack constructing 
RDMAStack...

2018-07-20 16:12:49.994750 7f3a3be8e700 20 RDMAStack  creating 
RDMAStack:0x7f3a340b5448 with dispatcher:0x7f3a340b5558

2018-07-20 16:12:49.994924 7f3a3970e700  2 Event(0x7f3a340e2fe0 nevent=5000 
time_id=1).set_owner idx=1 owner=139888048531200

2018-07-20 16:12:49.994990 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 
time_id=1).create_file_event create event started fd=7 mask=1 original mask is 0

2018-07-20 16:12:49.994990 7f3a38f0d700  2 Event(0x7f3a34110850 nevent=5000 
time_id=1).set_owner idx=2 owner=139888040138496

2018-07-20 16:12:49.994999 7f3a3970e700 20 EpollDriver.add_event add event fd=7 
cur_mask=0 add_mask=1 to 6

2018-07-20 16:12:49.994991 7f3a39f0f700  2 Event(0x7f3a340b5770 nevent=5000 
time_id=1).set_owner idx=0 owner=139888056923904

2018-07-20 16:12:49.995009 7f3a3970e700 20 Event(0x7f3a340e2fe0 nevent=5000 
time_id=1).create_file_event create event end fd=7 mask=1 original mask is 1

2018-07-20 16:12:49.995011 7f3a38f0d700 20 Event(0x7f3a34110850 nevent=5000 
time_id=1).create_file_event create event started fd=11 mask=1 original mask is 0

2018-07-20 16:12:49.995013 7f3a39f0f700 20 Event(0x7f3a340b5770 nevent=5000 
time_id=1).create_file_event create event started fd=4 mask=1 original mask is 0

2018-07-20 16:12:49.995016 7f3a38f0d700 20 EpollDriver.add_event add event 
fd=11 cur_mask=0 add_mask=1 to 10

2018-07-20 16:12:49.995017 7f3a39f0f700 20 EpollDriver.add_event add event fd=4 
cur_mask=0 add_mask=1 to 3

2018-07-20 16:12:49.995018 7f3a3970e700 10 stack operator() starting

2018-07-20 16:12:49.995022 7f3a38f0d700 20 Event(0x7f3a34110850 nevent=5000 
time_id=1).create_file_event create event end fd=11 mask=1 original mask is 1

2018-07-20 16:12:49.995022 7f3a39f0f700 20 Event(0x7f3a340b5770 nevent=5000 
time_id=1).create_file_event create event end fd=4 mask=1 original mask is 1

2018-07-20 16:12:49.995026 7f3a38f0d700 10 stack operator() starting

2018-07-20 16:12:49.995027 7f3a39f0f700 10 stack operator() starting

2018-07-20 16:12:49.995938 7f3a3be8e700 10 -- - ready -

2018-07-20 16:12:49.995946 7f3a3be8e700  1  Processor -- start

2018-07-20 16:12:49.995996 7f3a3be8e700  1 -- - start start

2018-07-20 16:12:49.996535 7f3a3be8e700 10 -- - create_connect 
10.10.121.25:6789/0, creating connection and registering

2018-07-20 16:12:49.996574 7f3a3be8e700 10 -- - >> 10.10.121.25:6789/0 
conn(0x7f3a34150270 :-1 s=STATE_NONE pgs=0 cs=0 l=1)._connect csq=0

2018-07-20 16:12:49.996594 7f3a3be8e700 20 Event(0x7f3a340e2fe0 nevent=5000 
time_id=1).wakeup

2018-07-20 16:12:49.996608 7f3a3be8e700 10 -- - get_connection mon.0 
10.10.121.25:6789/0 new 0x7f3a34150270

2018-07-20 16:12:49.99 7f3a3970e700 20 -- - >> 10.10.121.25:6789/0 
conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).process prev state 
is STATE_CONNECTING

2018-07-20 16:12:49.996693 7f3a3be8e700 10 -- - >> 10.10.121.25:6789/0 
conn(0x7f3a34150270 :-1 s=STATE_CONNECTING pgs=0 cs=0 l=1).send_keepalive

2018-07-20 16:12:49.996700 7f3a3be8e700 20 Event(0x7f3a340e2fe0 nevent=50

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-24 Thread Yan, Zheng
On Tue, Jul 24, 2018 at 4:59 PM Daniel Carrasco  wrote:
>
> Hello,
>
> How many time is neccesary?, because is a production environment and memory 
> profiler + low cache size because the problem, gives a lot of CPU usage from 
> OSD and MDS that makes it fails while profiler is running. Is there any 
> problem if is done in a low traffic time? (less usage and maybe it don't 
> fails, but maybe less info about usage).
>

just one time,  wait a few minutes between start_profiler and stop_profiler

> Greetings!
>
> 2018-07-24 10:21 GMT+02:00 Yan, Zheng :
>>
>> I mean:
>>
>> ceph tell mds.x heap start_profiler
>>
>> ... wait for some time
>>
>> ceph tell mds.x heap stop_profiler
>>
>> pprof --text  /usr/bin/ceph-mds
>> /var/log/ceph/ceph-mds.x.profile..heap
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 3:18 PM Daniel Carrasco  wrote:
>> >
>> > This is what i get:
>> >
>> > 
>> > 
>> > 
>> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump
>> > 2018-07-24 09:05:19.350720 7fc562ffd700  0 client.1452545 ms_handle_reset 
>> > on 10.22.0.168:6800/1685786126
>> > 2018-07-24 09:05:29.103903 7fc563fff700  0 client.1452548 ms_handle_reset 
>> > on 10.22.0.168:6800/1685786126
>> > mds.kavehome-mgto-pro-fs01 dumping heap profile now.
>> > 
>> > MALLOC:  760199640 (  725.0 MiB) Bytes in use by application
>> > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
>> > MALLOC: +246962320 (  235.5 MiB) Bytes in central cache freelist
>> > MALLOC: + 43933664 (   41.9 MiB) Bytes in transfer cache freelist
>> > MALLOC: + 41012664 (   39.1 MiB) Bytes in thread cache freelists
>> > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
>> > MALLOC:   
>> > MALLOC: =   1102295200 ( 1051.2 MiB) Actual memory used (physical + swap)
>> > MALLOC: +   4268335104 ( 4070.6 MiB) Bytes released to OS (aka unmapped)
>> > MALLOC:   
>> > MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
>> > MALLOC:
>> > MALLOC:  33027  Spans in use
>> > MALLOC: 19  Thread heaps in use
>> > MALLOC:   8192  Tcmalloc page size
>> > 
>> > Call ReleaseFreeMemory() to release freelist memory to the OS (via 
>> > madvise()).
>> > Bytes released to the OS take up virtual address space but no physical 
>> > memory.
>> >
>> >
>> > 
>> > 
>> > 
>> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
>> > 2018-07-24 09:14:25.747706 7f94f700  0 client.1452578 ms_handle_reset 
>> > on 10.22.0.168:6800/1685786126
>> > 2018-07-24 09:14:25.754034 7f95057fa700  0 client.1452581 ms_handle_reset 
>> > on 10.22.0.168:6800/1685786126
>> > mds.kavehome-mgto-pro-fs01 tcmalloc heap 
>> > stats:
>> > MALLOC:  960649328 (  916.1 MiB) Bytes in use by application
>> > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
>> > MALLOC: +108867288 (  103.8 MiB) Bytes in central cache freelist
>> > MALLOC: + 37179424 (   35.5 MiB) Bytes in transfer cache freelist
>> > MALLOC: + 40143000 (   38.3 MiB) Bytes in thread cache freelists
>> > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
>> > MALLOC:   
>> > MALLOC: =   1157025952 ( 1103.4 MiB) Actual memory used (physical + swap)
>> > MALLOC: +   4213604352 ( 4018.4 MiB) Bytes released to OS (aka unmapped)
>> > MALLOC:   
>> > MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
>> > MALLOC:
>> > MALLOC:  33028  Spans in use
>> > MALLOC: 19  Thread heaps in use
>> > MALLOC:   8192  Tcmalloc page size
>> > 
>> > Call ReleaseFreeMemory() to release freelist memory to the OS (via 
>> > madvise()).
>> > Bytes released to the OS take up virtual address space but no physical 
>> > memory.
>> >
>> > 
>> > 
>> > 
>> > After heap release:
>> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
>> > 2018-07-24 09:15:28.540203 7f2f7affd700  0 client.1443339 ms_handle_reset 
>> > on 10.22.0.168:6800/1685786126
>> > 2018-07-24 09:15:28.547153 7f2f7bfff700  0 client.1443342 ms_handle_reset 
>> > on 10.22.0.168:6800/1685786126
>> > mds.kavehome-mgto-pro-fs01 tcmalloc heap 
>> > stats:
>> > MALLOC:  710315776 (  677.4 MiB) Bytes in use by application

Re: [ceph-users] Self shutdown of 1 whole system: Oops, it did it again

2018-07-24 Thread Nicolas Huillard
Hi all,

The same server did it again with the same CATERR exactly 3 days after
rebooting (+/- 30 seconds).
If it were'nt for the exact +3 days, I would think it's a random event.
But exactly 3 days after reboot does not seem random.

Nothing I added got me more information (mcelog, pstore, BMC video
record, etc.)...

Thanks is advance for any hint ;-)

Le samedi 21 juillet 2018 à 10:31 +0200, Nicolas Huillard a écrit :
> Hi all,
> 
> One of my server silently shutdown last night, with no explanation
> whatsoever in any logs. According to the existing logs, the shutdown
> (without reboot) happened between 03:58:20.061452 (last timestamp
> from
> /var/log/ceph/ceph-mgr.oxygene.log) and 03:59:01.515308 (new MON
> election called, for which oxygene didn't answer).
> 
> Is there any way in which Ceph could silently shutdown a server?
> Can SMART self-test influence scrubbing or compaction?
> 
> The only thing I have is that smartd stated a long self-test on both
> OSD spinning drives on that host:
> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sda [SAT], starting
> scheduled Long Self-Test.
> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdb [SAT], starting
> scheduled Long Self-Test.
> Jul 21 03:21:35 oxygene smartd[712]: Device: /dev/sdc [SAT], starting
> scheduled Long Self-Test.
> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sda [SAT], self-
> test in progress, 90% remaining
> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdb [SAT], self-
> test in progress, 90% remaining
> Jul 21 03:51:35 oxygene smartd[712]: Device: /dev/sdc [SAT], previous
> self-test completed without error
> 
> ...and smartctl now says that the self-tests didn't finish (on both
> drives) :
> # 1  Extended offlineInterrupted (host
> reset)  00% 10636 -
> 
> MON logs on oxygene talks about rockdb compaction a few minutes
> before
> the shutdown, and a deep-scrub finished earlier:
> /var/log/ceph/ceph-osd.6.log
> 2018-07-21 03:32:54.086021 7fd15d82c700  0 log_channel(cluster) log
> [DBG] : 6.1d deep-scrub starts
> 2018-07-21 03:34:31.185549 7fd15d82c700  0 log_channel(cluster) log
> [DBG] : 6.1d deep-scrub ok
> 2018-07-21 03:43:36.720707 7fd178082700  0 -- 172.22.0.16:6801/478362
> >> 172.21.0.16:6800/1459922146 conn(0x556f0642b800 :6801
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0
> l=1).handle_connect_msg: challenging authorizer
> 
> /var/log/ceph/ceph-mgr.oxygene.log
> 2018-07-21 03:58:16.060137 7fbcd300  1 mgr send_beacon standby
> 2018-07-21 03:58:18.060733 7fbcd300  1 mgr send_beacon standby
> 2018-07-21 03:58:20.061452 7fbcd300  1 mgr send_beacon standby
> 
> /var/log/ceph/ceph-mon.oxygene.log
> 2018-07-21 03:52:27.702314 7f25b5406700  4 rocksdb: (Original Log
> Time 2018/07/21-03:52:27.702302) [/build/ceph-
> 12.2.7/src/rocksdb/db/db_impl_compaction_flush.cc:1392] [default]
> Manual compaction from level-0 to level-1 from 'mgrstat .. '
> 2018-07-21 03:52:27.702321 7f25b5406700  4 rocksdb: [/build/ceph-
> 12.2.7/src/rocksdb/db/compaction_job.cc:1403] [default] [JOB 1746]
> Compacting 1@0 + 1@1 files to L1, score -1.00
> 2018-07-21 03:52:27.702329 7f25b5406700  4 rocksdb: [/build/ceph-
> 12.2.7/src/rocksdb/db/compaction_job.cc:1407] [default] Compaction
> start summary: Base version 1745 Base level 0, inputs:
> [149507(602KB)], [149505(13MB)]
> 2018-07-21 03:52:27.702348 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1532137947702334, "job": 1746, "event":
> "compaction_started", "files_L0": [149507], "files_L1": [149505],
> "score": -1, "input_data_size": 14916379}
> 2018-07-21 03:52:27.785532 7f25b5406700  4 rocksdb: [/build/ceph-
> 12.2.7/src/rocksdb/db/compaction_job.cc:1116] [default] [JOB 1746]
> Generated table #149508: 4904 keys, 14808953 bytes
> 2018-07-21 03:52:27.785587 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1532137947785565, "cf_name": "default", "job": 1746,
> "event": "table_file_creation", "file_number": 149508, "file_size":
> 14808953, "table_properties": {"data
> 2018-07-21 03:52:27.785627 7f25b5406700  4 rocksdb: [/build/ceph-
> 12.2.7/src/rocksdb/db/compaction_job.cc:1173] [default] [JOB 1746]
> Compacted 1@0 + 1@1 files to L1 => 14808953 bytes
> 2018-07-21 03:52:27.785656 7f25b5406700  3 rocksdb: [/build/ceph-
> 12.2.7/src/rocksdb/db/version_set.cc:2087] More existing levels in DB
> than needed. max_bytes_for_level_multiplier may not be guaranteed.
> 2018-07-21 03:52:27.791640 7f25b5406700  4 rocksdb: (Original Log
> Time 2018/07/21-03:52:27.791526) [/build/ceph-
> 12.2.7/src/rocksdb/db/compaction_job.cc:621] [default] compacted to:
> base level 1 max bytes base 26843546 files[0 1 0 0 0 0 0]
> 2018-07-21 03:52:27.791657 7f25b5406700  4 rocksdb: (Original Log
> Time 2018/07/21-03:52:27.791563) EVENT_LOG_v1 {"time_micros":
> 1532137947791548, "job": 1746, "event": "compaction_finished",
> "compaction_time_micros": 83261, "output_level"
> 2018-07-21 03:52:27.792024 7f25b5406700  4 rocksdb: EVENT_LOG_v1
> {"time_micros": 153213794779

Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Mateusz Skala (UST, POL)
Hello again,

How can I determine $cctid for specific rbd name? Or is there any good way to 
map admin-socket with rbd?

Regards

Mateusz



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Mateusz Skala (UST, POL)
Sent: Tuesday, July 24, 2018 9:49 AM
To: dilla...@redhat.com
Cc: ceph-users@lists.ceph.com
Subject: [Possibly Forged Email] Re: [ceph-users] Read/write statistics per RBD 
image



Thank You for help, it is exactly that I need.

Regards

Mateusz



From: Jason Dillaman [mailto:jdill...@redhat.com]
Sent: Wednesday, July 18, 2018 1:28 PM
To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] Read/write statistics per RBD image



Yup, on the host running librbd, you just need to enable the "admin socket" in 
your ceph.conf and then use "ceph --admin-daemon 
/path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").



See the example in this tip window [1] for how to configure for a "libvirt" 
CephX user.



[1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph



On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

   Thanks  for response.

   In ‘ceph perf dump’ there is no statistics for read/write operations on 
specific RBD image, only for osd and total client operations. I need to get 
statistics on one specific RBD image, to get top used images. It is possible?

   Regards

   Mateusz



   From: Jason Dillaman [mailto:jdill...@redhat.com]
   Sent: Tuesday, July 17, 2018 3:29 PM
   To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
   Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
   Subject: Re: [ceph-users] Read/write statistics per RBD image



   Yes, you just need to enable the "admin socket" in your ceph.conf and then 
use "ceph --admin-daemon /path/to/image/admin/socket.asok perf dump".



   On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

  Hi,

  It is possible to get statistics of issued reads/writes to specific RBD 
image? Best will be statistics like in /proc/diskstats in linux.

  Regards

  Mateusz

  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






   --

   Jason






   --

   Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error creating compat weight-set with mgr balancer plugin

2018-07-24 Thread Martin Overgaard Hansen
Hi,

I'm having an issue with enabling the mgr balancer plugin, properly because a 
misunderstanding in the fundamentals of the crush algorithm. I hope the list 
can help, thanks.

I've enabled the plugin itself and automatic balancing. The mode is set to 
crush-compat and my minimum compatible client is set to jewel.

The mgr daemon throws me the following error: mgr[balancer] Error creating 
compat weight-set

Creating a compat weight set manually with 'ceph osd crush weight-set 
create-compat' gives me: Error EPERM: crush map contains one or more bucket(s) 
that are not straw2

What changes do I need to implement to get the mgr balancer plugin working? 
Thank.

Please let me know if I need to elaborate.

Med venlig hilsen / Best regards

Martin Overgaard Hansen
System Consultant

Rudolfgårdsvej 1 B
DK-8260 Viby J
T: (+45) 8734 1334 
M: (+45) 3021 4430
www.multihouse.dk

MultiHouse Hosting A/S is a part of ecit.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error creating compat weight-set with mgr balancer plugin

2018-07-24 Thread Lothar Gesslein
On 07/24/2018 12:58 PM, Martin Overgaard Hansen wrote:
> Creating a compat weight set manually with 'ceph osd crush weight-set
> create-compat' gives me: Error EPERM: crush map contains one or more
> bucket(s) that are not straw2
> 
> What changes do I need to implement to get the mgr balancer plugin
> working? Thank.

You will need to run

osd crush set-all-straw-buckets-to-straw2

which exists since ceph mimic v13.0.1 as a handy shortcut to upgrade to
straw2.

The switch of the straw alorithm to the improved straw2 was
introduced with hammer, but before this command you would have had to
edit the crush map by hand.


http://docs.ceph.com/docs/master/rados/operations/crush-map/

There is a new bucket type (straw2) supported. The new straw2
bucket type fixes several limitations in the original straw bucket.
Specifically, the old straw buckets would change some mappings that
should have changed when a weight was adjusted, while straw2 achieves
the original goal of only changing mappings to or from the bucket item
whose weight has changed.
straw2 is the default for any newly created buckets.

Migration impact:

Changing a bucket type from straw to straw2 will result in a
reasonably small amount of data movement, depending on how much the
bucket item weights vary from each other. When the weights are all the
same no data will move, and when item weights vary significantly there
will be more movement.

Best,
Lothar

-- 
Lothar Gesslein
Linux Consultant

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de
GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Implementing multi-site on an existing cluster

2018-07-24 Thread Robert Stanford
 I have a Luminous Ceph cluster that uses just rgw.  We want to turn it
into a mult-site installation.  Are there instructions online for this?
I've been unable to find them.

 -R
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Implementing multi-site on an existing cluster

2018-07-24 Thread Shilpa Manjarabad Jagannath
On Tue, Jul 24, 2018 at 4:56 PM, Robert Stanford 
wrote:

>
>  I have a Luminous Ceph cluster that uses just rgw.  We want to turn it
> into a mult-site installation.  Are there instructions online for this?
> I've been unable to find them.
>
>  -R
>
>
http://docs.ceph.com/docs/luminous/radosgw/multisite/#migrating-a-single-site-system-to-multi-site

This should help.

-Shilpa
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Switch yum repos from CentOS to ceph?

2018-07-24 Thread Drew Weaver
Is there any way to safely switch the yum repo I am using from the CentOS 
Storage repo to the official ceph repo for RPMs or should I just rebuild it?

Thanks,
-Drew

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New cluster issue - poor performance inside guests

2018-07-24 Thread Nick A
Hello,

We've got a 3 node cluster online as part of an openstack ansible installation:

Ceph Mimic

3x OSD nodes:
3x 800GB Intel S3710 SSD OSD's using whole device bluestore (node 3
has 4 for 10 total)
40Gbit networking
96GB Ram
2x E5-2680v2 CPU

Compute nodes are similar but with 192GB ram

Performance using rados bench is fantastic, over 2GB/sec read, 800MB write etc.

However, inside a VM on the compute nodes performance is poor on the
root disk, 70MB read, 15-20k IOPS.

Interestingly, if I create a volume and attach it to the instance,
those numbers exactly double when testing the new volume, to 140MB/sec
read, 30-40k IOPs.

We've tried luminous, same thing. We're only getting about 26Gbit on
the 40G links but that's fine for now until we go to production, it's
hardly the limiting factor here.

Any ideas please?

Regards,
Nick
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread sinan
Hi,

On which node should we add the "admin socket" parameter to ceph.conf. On
the MON, OSD or on what node?

On one of my clients (which is the Ansible node in this case) has the
following:
[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be
writable by QEMU and allowed by SELinux or AppArmor

Permissions on the folder:
0 drwxrwx---.  2 ceph   ceph 40 23 jul 11:27 ceph

But /var/run/ceph is empty.

Thanks!
Sinan

> Hello again,
>
> How can I determine $cctid for specific rbd name? Or is there any good way
> to map admin-socket with rbd?
>
> Regards
>
> Mateusz
>
>
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mateusz Skala (UST, POL)
> Sent: Tuesday, July 24, 2018 9:49 AM
> To: dilla...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Subject: [Possibly Forged Email] Re: [ceph-users] Read/write statistics
> per RBD image
>
>
>
> Thank You for help, it is exactly that I need.
>
> Regards
>
> Mateusz
>
>
>
> From: Jason Dillaman [mailto:jdill...@redhat.com]
> Sent: Wednesday, July 18, 2018 1:28 PM
> To: Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>>
> Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users
> mailto:ceph-users@lists.ceph.com>>
> Subject: Re: [ceph-users] Read/write statistics per RBD image
>
>
>
> Yup, on the host running librbd, you just need to enable the "admin
> socket" in your ceph.conf and then use "ceph --admin-daemon
> /path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").
>
>
>
> See the example in this tip window [1] for how to configure for a
> "libvirt" CephX user.
>
>
>
> [1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph
>
>
>
> On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>>
> wrote:
>
>Thanks  for response.
>
>In ‘ceph perf dump’ there is no statistics for read/write
> operations on specific RBD image, only for osd and total client
> operations. I need to get statistics on one specific RBD image, to get
> top used images. It is possible?
>
>Regards
>
>Mateusz
>
>
>
>From: Jason Dillaman
> [mailto:jdill...@redhat.com]
>Sent: Tuesday, July 17, 2018 3:29 PM
>To: Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>>
>Cc: ceph-users
> mailto:ceph-users@lists.ceph.com>>
>Subject: Re: [ceph-users] Read/write statistics per RBD image
>
>
>
>Yes, you just need to enable the "admin socket" in your ceph.conf and
> then use "ceph --admin-daemon /path/to/image/admin/socket.asok perf
> dump".
>
>
>
>On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>>
> wrote:
>
>   Hi,
>
>   It is possible to get statistics of issued reads/writes to specific
> RBD image? Best will be statistics like in /proc/diskstats in
> linux.
>
>   Regards
>
>   Mateusz
>
>   ___
>   ceph-users mailing list
>   ceph-users@lists.ceph.com
>   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
>
>--
>
>Jason
>
>
>
>
>
>
>--
>
>Jason
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-24 Thread Alfredo Deza
On Mon, Jul 23, 2018 at 2:33 PM, Satish Patel  wrote:
> Alfredo,
>
> Thanks, I think i should go with LVM then :)
>
> I have question here, I have 4 physical SSD per server, some reason i
> am using ceph-ansible 3.0.8 version which doesn't create LVM volume
> itself so i have to create LVM volume manually.
>
> I am using bluestore  ( want to keep WAL/DB on same DATA disk), How do
> i create lvm manually on single physical disk? Do i need to create two
> logical volume (1 for journal & 1 for Data )?
>
> I am reading this
> http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at
> bottom)
>
> lvm_volumes:
>   - data: data-lv1
> data_vg: vg1
> crush_device_class: foo

For a raw device (e.g. /dev/sda) you can do:

lvm_volumes:
  - data: /dev/sda

The LV gets created for you in this one case

>
>
> In above example, did they create vg1 (volume group)  and created
> data-lv1 (logical volume)? If i want to add journal then do i need to
> create one more logical volume?  I am confused in that document so
> need some clarification
>
> On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza  wrote:
>> On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel  wrote:
>>> This is great explanation, based on your details look like when reboot
>>> machine (OSD node) it will take longer time to initialize all number
>>> of OSDs but if we use LVM in that case it shorten that time.
>>
>> That is one aspect, yes. Most importantly: all OSDs will consistently
>> come up with ceph-volume. This wasn't the case with ceph-disk and it
>> was impossible to
>> replicate or understand why (hence the 3 hour timeout)
>>
>>>
>>> There is a good chance that LVM impact some performance because of
>>> extra layer, Does anyone has any data which can provide some inside
>>> about good or bad performance. It would be great if your share so it
>>> will help us to understand impact.
>>
>> There isn't performance impact, and if there is, it is negligible.
>>
>>>
>>>
>>>
>>> On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza  wrote:
 On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard  
 wrote:
> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>> I read that post and that's why I open this thread for few more
>> questions and clearence,
>>
>> When you said OSD doesn't come up what actually that means?  After
>> reboot of node or after service restart or installation of new disk?
>>
>> You said we are using manual method what is that?
>>
>> I'm building new cluster and had zero prior experience so how can I
>> produce this error to see lvm is really life saving tool here? I'm
>> sure there are plenty of people using but I didn't find and good
>> document except that mailing list which raising more questions in my
>> mind.
>
> When I had to change a few drives manually, copying the old contents
> over, I noticed that the logical volumes are tagged with lots of
> information related to how they should be handled at boot time by the
> OSD startup system.
> These LVM tags are a good standard way to add that meta-data within the
> volumes themselves. Apparently, there is no other way to add these tags
> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
> partition, etc.
> They are easy to manage and fail-safe in many configurations.

 This is spot on. To clarify even further, let me give a brief overview
 of how that worked with ceph-disk and GPT GUID:

 * at creation time, ceph-disk would add a GUID to the partitions so
 that it would later be recognized. These GUID were unique so they
 would ensure accuracy
 * a set of udev rules would be in place to detect when these GUID
 would become available in the system
 * at boot time, udev would start detecting devices coming online, and
 the rules would call out to ceph-disk (the executable)
 * the ceph-disk executable would then call out to the ceph-disk
 systemd unit, with a timeout of three hours the device to which it was
 assigned (e.g. ceph-disk@/dev/sda )
 * the previous step would be done *per device*, waiting for all
 devices associated with the OSD to become available (hence the 3 hour
 timeout)
 * the ceph-disk systemd unit would call back again to the ceph-disk
 command line tool signaling devices are ready (with --sync)
 * the ceph-disk command line tool would call *the ceph-disk command
 line tool again* to "activate" the OSD, having detected (finally) the
 device type (encrypted, partially prepared, etc...)

 The above workflow worked for pre-systemd systems, it could've
 probably be streamlined better, but it was what allowed to "discover"
 devices at boot time. The 3 hour timeout was there because
 udev would find these devices being active asynchronously, and
 ceph-disk was trying to coerce a more synchronous behavior to get all
 devices needed. In a d

Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Jason Dillaman
On Tue, Jul 24, 2018 at 6:51 AM Mateusz Skala (UST, POL) <
mateusz.sk...@ust-global.com> wrote:

> Hello again,
>
> How can I determine $cctid for specific rbd name? Or is there any good way
> to map admin-socket with rbd?
>

The $cctid is effectively pseudo-random (it's a memory location within the
process). Your best best is just a $pid mapping.


> Regards
>
> Mateusz
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Mateusz Skala (UST, POL)
> *Sent:* Tuesday, July 24, 2018 9:49 AM
> *To:* dilla...@redhat.com
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* [Possibly Forged Email] Re: [ceph-users] Read/write statistics
> per RBD image
>
>
>
> Thank You for help, it is exactly that I need.
>
> Regards
>
> Mateusz
>
>
>
> *From:* Jason Dillaman [mailto:jdill...@redhat.com ]
> *Sent:* Wednesday, July 18, 2018 1:28 PM
> *To:* Mateusz Skala (UST, POL) 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] Read/write statistics per RBD image
>
>
>
> Yup, on the host running librbd, you just need to enable the "admin
> socket" in your ceph.conf and then use "ceph --admin-daemon
> /path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").
>
>
>
> See the example in this tip window [1] for how to configure for a
> "libvirt" CephX user.
>
>
>
> [1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph
>
>
>
> On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL) <
> mateusz.sk...@ust-global.com> wrote:
>
> Thanks  for response.
>
> In ‘ceph perf dump’ there is no statistics for read/write operations on
> specific RBD image, only for osd and total client operations. I need to get
> statistics on one specific RBD image, to get top used images. It is
> possible?
>
> Regards
>
> Mateusz
>
>
>
> *From:* Jason Dillaman [mailto:jdill...@redhat.com]
> *Sent:* Tuesday, July 17, 2018 3:29 PM
> *To:* Mateusz Skala (UST, POL) 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] Read/write statistics per RBD image
>
>
>
> Yes, you just need to enable the "admin socket" in your ceph.conf and then
> use "ceph --admin-daemon /path/to/image/admin/socket.asok perf dump".
>
>
>
> On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL) <
> mateusz.sk...@ust-global.com> wrote:
>
> Hi,
>
> It is possible to get statistics of issued reads/writes to specific RBD
> image? Best will be statistics like in /proc/diskstats in linux.
>
> Regards
>
> Mateusz
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
>
> Jason
>
>
>
>
> --
>
> Jason
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Mateusz Skala (UST, POL)
If one VM is using multiple rbd’s then using just $pid is not enough. Socket 
shows only one (first) rbd statistics.

Regards

Mateusz



From: Jason Dillaman [mailto:jdill...@redhat.com]
Sent: Tuesday, July 24, 2018 2:39 PM
To: Mateusz Skala (UST, POL) 
Cc: ceph-users 
Subject: Re: [ceph-users] Read/write statistics per RBD image



On Tue, Jul 24, 2018 at 6:51 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

   Hello again,

   How can I determine $cctid for specific rbd name? Or is there any good way 
to map admin-socket with rbd?



   The $cctid is effectively pseudo-random (it's a memory location within the 
process). Your best best is just a $pid mapping.



   Regards

   Mateusz



   From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Mateusz Skala (UST, POL)
   Sent: Tuesday, July 24, 2018 9:49 AM
   To: dilla...@redhat.com
   Cc: ceph-users@lists.ceph.com
   Subject: [Possibly Forged Email] Re: [ceph-users] Read/write statistics per 
RBD image



   Thank You for help, it is exactly that I need.

   Regards

   Mateusz



   From: Jason Dillaman [mailto:jdill...@redhat.com]
   Sent: Wednesday, July 18, 2018 1:28 PM
   To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
   Cc: dillaman mailto:dilla...@redhat.com>>; ceph-users 
mailto:ceph-users@lists.ceph.com>>
   Subject: Re: [ceph-users] Read/write statistics per RBD image



   Yup, on the host running librbd, you just need to enable the "admin socket" 
in your ceph.conf and then use "ceph --admin-daemon 
/path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").



   See the example in this tip window [1] for how to configure for a "libvirt" 
CephX user.



   [1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph



   On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

  Thanks  for response.

  In ‘ceph perf dump’ there is no statistics for read/write operations on 
specific RBD image, only for osd and total client operations. I need to get 
statistics on one specific RBD image, to get top used images. It is possible?

  Regards

  Mateusz



  From: Jason Dillaman 
[mailto:jdill...@redhat.com]
  Sent: Tuesday, July 17, 2018 3:29 PM
  To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
  Cc: ceph-users 
mailto:ceph-users@lists.ceph.com>>
  Subject: Re: [ceph-users] Read/write statistics per RBD image



  Yes, you just need to enable the "admin socket" in your ceph.conf and 
then use "ceph --admin-daemon /path/to/image/admin/socket.asok perf dump".



  On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

 Hi,

 It is possible to get statistics of issued reads/writes to specific 
RBD image? Best will be statistics like in /proc/diskstats in linux.

 Regards

 Mateusz

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






  --

  Jason






   --

   Jason






   --

   Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Why lvm is recommended method for bleustore

2018-07-24 Thread Satish Patel
I did that but i am using Ceph-ansible 3.0.8 version which doesn't
support auto creation of LVM :(  i think 3.1 version has LVM support.

Because of some reason i have to stick to 3.0.8 so i need to create manually.

On Tue, Jul 24, 2018 at 8:34 AM, Alfredo Deza  wrote:
> On Mon, Jul 23, 2018 at 2:33 PM, Satish Patel  wrote:
>> Alfredo,
>>
>> Thanks, I think i should go with LVM then :)
>>
>> I have question here, I have 4 physical SSD per server, some reason i
>> am using ceph-ansible 3.0.8 version which doesn't create LVM volume
>> itself so i have to create LVM volume manually.
>>
>> I am using bluestore  ( want to keep WAL/DB on same DATA disk), How do
>> i create lvm manually on single physical disk? Do i need to create two
>> logical volume (1 for journal & 1 for Data )?
>>
>> I am reading this
>> http://docs.ceph.com/ceph-ansible/master/osds/scenarios.html (at
>> bottom)
>>
>> lvm_volumes:
>>   - data: data-lv1
>> data_vg: vg1
>> crush_device_class: foo
>
> For a raw device (e.g. /dev/sda) you can do:
>
> lvm_volumes:
>   - data: /dev/sda
>
> The LV gets created for you in this one case
>
>>
>>
>> In above example, did they create vg1 (volume group)  and created
>> data-lv1 (logical volume)? If i want to add journal then do i need to
>> create one more logical volume?  I am confused in that document so
>> need some clarification
>>
>> On Mon, Jul 23, 2018 at 2:06 PM, Alfredo Deza  wrote:
>>> On Mon, Jul 23, 2018 at 1:56 PM, Satish Patel  wrote:
 This is great explanation, based on your details look like when reboot
 machine (OSD node) it will take longer time to initialize all number
 of OSDs but if we use LVM in that case it shorten that time.
>>>
>>> That is one aspect, yes. Most importantly: all OSDs will consistently
>>> come up with ceph-volume. This wasn't the case with ceph-disk and it
>>> was impossible to
>>> replicate or understand why (hence the 3 hour timeout)
>>>

 There is a good chance that LVM impact some performance because of
 extra layer, Does anyone has any data which can provide some inside
 about good or bad performance. It would be great if your share so it
 will help us to understand impact.
>>>
>>> There isn't performance impact, and if there is, it is negligible.
>>>



 On Mon, Jul 23, 2018 at 8:37 AM, Alfredo Deza  wrote:
> On Mon, Jul 23, 2018 at 6:09 AM, Nicolas Huillard  
> wrote:
>> Le dimanche 22 juillet 2018 à 09:51 -0400, Satish Patel a écrit :
>>> I read that post and that's why I open this thread for few more
>>> questions and clearence,
>>>
>>> When you said OSD doesn't come up what actually that means?  After
>>> reboot of node or after service restart or installation of new disk?
>>>
>>> You said we are using manual method what is that?
>>>
>>> I'm building new cluster and had zero prior experience so how can I
>>> produce this error to see lvm is really life saving tool here? I'm
>>> sure there are plenty of people using but I didn't find and good
>>> document except that mailing list which raising more questions in my
>>> mind.
>>
>> When I had to change a few drives manually, copying the old contents
>> over, I noticed that the logical volumes are tagged with lots of
>> information related to how they should be handled at boot time by the
>> OSD startup system.
>> These LVM tags are a good standard way to add that meta-data within the
>> volumes themselves. Apparently, there is no other way to add these tags
>> that allow for bluestore/filestore, SATA/SAS/NVMe, whole drive or
>> partition, etc.
>> They are easy to manage and fail-safe in many configurations.
>
> This is spot on. To clarify even further, let me give a brief overview
> of how that worked with ceph-disk and GPT GUID:
>
> * at creation time, ceph-disk would add a GUID to the partitions so
> that it would later be recognized. These GUID were unique so they
> would ensure accuracy
> * a set of udev rules would be in place to detect when these GUID
> would become available in the system
> * at boot time, udev would start detecting devices coming online, and
> the rules would call out to ceph-disk (the executable)
> * the ceph-disk executable would then call out to the ceph-disk
> systemd unit, with a timeout of three hours the device to which it was
> assigned (e.g. ceph-disk@/dev/sda )
> * the previous step would be done *per device*, waiting for all
> devices associated with the OSD to become available (hence the 3 hour
> timeout)
> * the ceph-disk systemd unit would call back again to the ceph-disk
> command line tool signaling devices are ready (with --sync)
> * the ceph-disk command line tool would call *the ceph-disk command
> line tool again* to "activate" the OSD, having detected (finally) the
> device type (encrypted, partially prepared, etc...)

Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Jason Dillaman
On Tue, Jul 24, 2018 at 8:48 AM Mateusz Skala (UST, POL) <
mateusz.sk...@ust-global.com> wrote:

> If one VM is using multiple rbd’s then using just $pid is not enough.
> Socket shows only one (first) rbd statistics.
>

Yup, that's why $cctid was added. In your case, you would need to scrap all
of them. The librbd json dictionary key for librbd contains the image name
so you can determine which is which after you dump the perf counters.


> Regards
>
> Mateusz
>
>
>
> *From:* Jason Dillaman [mailto:jdill...@redhat.com]
> *Sent:* Tuesday, July 24, 2018 2:39 PM
> *To:* Mateusz Skala (UST, POL) 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] Read/write statistics per RBD image
>
>
>
> On Tue, Jul 24, 2018 at 6:51 AM Mateusz Skala (UST, POL) <
> mateusz.sk...@ust-global.com> wrote:
>
> Hello again,
>
> How can I determine $cctid for specific rbd name? Or is there any good way
> to map admin-socket with rbd?
>
>
>
> The $cctid is effectively pseudo-random (it's a memory location within the
> process). Your best best is just a $pid mapping.
>
>
>
> Regards
>
> Mateusz
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Mateusz Skala (UST, POL)
> *Sent:* Tuesday, July 24, 2018 9:49 AM
> *To:* dilla...@redhat.com
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* [Possibly Forged Email] Re: [ceph-users] Read/write statistics
> per RBD image
>
>
>
> Thank You for help, it is exactly that I need.
>
> Regards
>
> Mateusz
>
>
>
> *From:* Jason Dillaman [mailto:jdill...@redhat.com ]
> *Sent:* Wednesday, July 18, 2018 1:28 PM
> *To:* Mateusz Skala (UST, POL) 
> *Cc:* dillaman ; ceph-users <
> ceph-users@lists.ceph.com>
> *Subject:* Re: [ceph-users] Read/write statistics per RBD image
>
>
>
> Yup, on the host running librbd, you just need to enable the "admin
> socket" in your ceph.conf and then use "ceph --admin-daemon
> /path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").
>
>
>
> See the example in this tip window [1] for how to configure for a
> "libvirt" CephX user.
>
>
>
> [1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph
>
>
>
> On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL) <
> mateusz.sk...@ust-global.com> wrote:
>
> Thanks  for response.
>
> In ‘ceph perf dump’ there is no statistics for read/write operations on
> specific RBD image, only for osd and total client operations. I need to get
> statistics on one specific RBD image, to get top used images. It is
> possible?
>
> Regards
>
> Mateusz
>
>
>
> *From:* Jason Dillaman [mailto:jdill...@redhat.com]
> *Sent:* Tuesday, July 17, 2018 3:29 PM
> *To:* Mateusz Skala (UST, POL) 
> *Cc:* ceph-users 
> *Subject:* Re: [ceph-users] Read/write statistics per RBD image
>
>
>
> Yes, you just need to enable the "admin socket" in your ceph.conf and then
> use "ceph --admin-daemon /path/to/image/admin/socket.asok perf dump".
>
>
>
> On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL) <
> mateusz.sk...@ust-global.com> wrote:
>
> Hi,
>
> It is possible to get statistics of issued reads/writes to specific RBD
> image? Best will be statistics like in /proc/diskstats in linux.
>
> Regards
>
> Mateusz
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
>
> Jason
>
>
>
>
> --
>
> Jason
>
>
>
>
> --
>
> Jason
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Wido den Hollander


On 07/24/2018 12:51 PM, Mateusz Skala (UST, POL) wrote:
> Hello again,
> 
> How can I determine $cctid for specific rbd name? Or is there any good
> way to map admin-socket with rbd?
> 

Yes, check the output of 'perf dump', you can fetch the RBD image
information from that JSON output.

Wido

> Regards
> 
> Mateusz
> 
>  
> 
> *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Mateusz Skala (UST, POL)
> *Sent:* Tuesday, July 24, 2018 9:49 AM
> *To:* dilla...@redhat.com
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* [Possibly Forged Email] Re: [ceph-users] Read/write
> statistics per RBD image
> 
>  
> 
> Thank You for help, it is exactly that I need.
> 
> Regards
> 
> Mateusz
> 
>  
> 
> *From:*Jason Dillaman [mailto:jdill...@redhat.com]
> *Sent:* Wednesday, July 18, 2018 1:28 PM
> *To:* Mateusz Skala (UST, POL)  >
> *Cc:* dillaman mailto:dilla...@redhat.com>>;
> ceph-users mailto:ceph-users@lists.ceph.com>>
> *Subject:* Re: [ceph-users] Read/write statistics per RBD image
> 
>  
> 
> Yup, on the host running librbd, you just need to enable the "admin
> socket" in your ceph.conf and then use "ceph --admin-daemon
> /path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").
> 
>  
> 
> See the example in this tip window [1] for how to configure for a
> "libvirt" CephX user.
> 
>  
> 
> [1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph
> 
>  
> 
> On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>> wrote:
> 
> Thanks  for response.
> 
> In ‘ceph perf dump’ there is no statistics for read/write operations
> on specific RBD image, only for osd and total client operations. I
> need to get statistics on one specific RBD image, to get top used
> images. It is possible?
> 
> Regards
> 
> Mateusz
> 
>  
> 
> *From:*Jason Dillaman [mailto:jdill...@redhat.com
> ]
> *Sent:* Tuesday, July 17, 2018 3:29 PM
> *To:* Mateusz Skala (UST, POL)  >
> *Cc:* ceph-users  >
> *Subject:* Re: [ceph-users] Read/write statistics per RBD image
> 
>  
> 
> Yes, you just need to enable the "admin socket" in your ceph.conf
> and then use "ceph --admin-daemon /path/to/image/admin/socket.asok
> perf dump".
> 
>  
> 
> On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>>
> wrote:
> 
> Hi,
> 
> It is possible to get statistics of issued reads/writes to
> specific RBD image? Best will be statistics like in
> /proc/diskstats in linux.
> 
> Regards
> 
> Mateusz
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
>  
> 
> -- 
> 
> Jason
> 
> 
>  
> 
> -- 
> 
> Jason
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-24 Thread SCHAER Frederic
Hi,

I read the 12.2.7 upgrade notes, and set "osd skip data digest = true" before I 
started upgrading from 12.2.6 on my Bluestore-only cluster.
As far as I can tell, my OSDs all got restarted during the upgrade and all got 
the option enabled :

This is what I see for a specific OSD taken at random:
# ceph --admin-daemon /var/run/ceph/ceph-osd.68.asok config show|grep 
data_digest
"osd_skip_data_digest": "true",

This is what I see when I try to injectarg the option data digest ignore option 
:

# ceph tell osd.* injectargs '--osd_skip_data_digest=true' 2>&1|head
osd.0: osd_skip_data_digest = 'true' (not observed, change may require restart)
osd.1: osd_skip_data_digest = 'true' (not observed, change may require restart)
osd.2: osd_skip_data_digest = 'true' (not observed, change may require restart)
osd.3: osd_skip_data_digest = 'true' (not observed, change may require restart)
(...)

This has been like that since I upgraded to 12.2.7.
I read in the releanotes that the skip_data_digest  option should be sufficient 
to ignore the 12.2.6 corruptions and that objects should auto-heal on rewrite...

However...

My config :

-  Using tiering with an SSD hot storage tier

-  HDDs for cold storage

And... I get I/O errors on some VMs when running some commands as simple as 
"yum check-update".

The qemu/kvm/libirt logs show me these (in : /var/log/libvirt/qemu) :


block I/O error in device 'drive-virtio-disk0': Input/output error (5)

In the ceph logs, I can see these errors :


2018-07-24 11:17:56.420391 osd.71 [ERR] 1.23 copy from 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
0x3bb26e16 != source 0xec476c54

2018-07-24 11:17:56.429936 osd.71 [ERR] 1.23 copy from 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
0x3bb26e16 != source 0xec476c54

(yes, my cluster is seen as healthy)

On the affected OSDs, I can see these errors :

2018-07-24 11:17:56.420349 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data 
digest 0x3bb26e16 != source 0xec476c54
2018-07-24 11:17:56.420388 7f034642a700 -1 log_channel(cluster) log [ERR] : 
1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
0x3bb26e16 != source 0xec476c54
2018-07-24 11:17:56.420395 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected 
promote error (5) Input/output error
2018-07-24 11:17:56.429900 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data 
digest 0x3bb26e16 != source 0xec476c54
2018-07-24 11:17:56.429934 7f034642a700 -1 log_channel(cluster) log [ERR] : 
1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
0x3bb26e16 != source 0xec476c54
2018-07-24 11:17:56.429939 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected 
promote error (5) Input/output error

And I don't know how to recover from that.
Pool #1 is my SSD cache tier, hence pg 1.23 is on the SSD side.

I've tried setting the cache pool to "readforward" despite the "not well 
supported" warning and could immediately get back working VMs (no more I/O 
errors).
But with no SSD tiering : not really useful.

As soon as I've tried setting the cache tier to writeback again, I got those 
I/O errors again... (not on the yum command, but in the mean time I've stopped 
and set out, then unset out osd.71 to check it with badblocks just in case...)
I still have to find how to reproduce the io error on an affected host to 
further try to debug/fix that issue...

Any ideas ?

Thanks && regards

___
ceph-users mailing list
ceph-u

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Matthew Vernon
Hi,

On 24/07/18 06:02, Satish Patel wrote:
> My 5 node ceph cluster is ready for production, now i am looking for
> good monitoring tool (Open source), what majority of folks using in
> their production?

This does come up from time to time, so it's worth checking the list
archives.

We use collected to collect metrics, graphite to store them (we've found
it much easier to look after than influxdb), and grafana to plot them, e.g.

https://cog.sanger.ac.uk/ceph_dashboard/ceph-dashboard-may2018.png

Regards,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Mateusz Skala (UST, POL)
OK, it will be nice feature if we can get name of rbd from admin socket, at now 
I’m doing this in the way you wrote.

Thanks for help,

Mateusz



From: Jason Dillaman [mailto:jdill...@redhat.com]
Sent: Tuesday, July 24, 2018 2:52 PM
To: Mateusz Skala (UST, POL) 
Cc: ceph-users 
Subject: Re: [ceph-users] Read/write statistics per RBD image



On Tue, Jul 24, 2018 at 8:48 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

   If one VM is using multiple rbd’s then using just $pid is not enough. Socket 
shows only one (first) rbd statistics.



   Yup, that's why $cctid was added. In your case, you would need to scrap all 
of them. The librbd json dictionary key for librbd contains the image name so 
you can determine which is which after you dump the perf counters.



   Regards

   Mateusz



   From: Jason Dillaman [mailto:jdill...@redhat.com]
   Sent: Tuesday, July 24, 2018 2:39 PM
   To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
   Cc: ceph-users mailto:ceph-users@lists.ceph.com>>
   Subject: Re: [ceph-users] Read/write statistics per RBD image



   On Tue, Jul 24, 2018 at 6:51 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

  Hello again,

  How can I determine $cctid for specific rbd name? Or is there any good 
way to map admin-socket with rbd?



   The $cctid is effectively pseudo-random (it's a memory location within the 
process). Your best best is just a $pid mapping.



  Regards

  Mateusz



  From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of Mateusz Skala (UST, POL)
  Sent: Tuesday, July 24, 2018 9:49 AM
  To: dilla...@redhat.com
  Cc: ceph-users@lists.ceph.com
  Subject: [Possibly Forged Email] Re: [ceph-users] Read/write statistics 
per RBD image



  Thank You for help, it is exactly that I need.

  Regards

  Mateusz



  From: Jason Dillaman [mailto:jdill...@redhat.com]
  Sent: Wednesday, July 18, 2018 1:28 PM
  To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
  Cc: dillaman mailto:dilla...@redhat.com>>; 
ceph-users mailto:ceph-users@lists.ceph.com>>
  Subject: Re: [ceph-users] Read/write statistics per RBD image



  Yup, on the host running librbd, you just need to enable the "admin 
socket" in your ceph.conf and then use "ceph --admin-daemon 
/path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").



  See the example in this tip window [1] for how to configure for a 
"libvirt" CephX user.



  [1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph



  On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

 Thanks  for response.

 In ‘ceph perf dump’ there is no statistics for read/write operations 
on specific RBD image, only for osd and total client operations. I need to get 
statistics on one specific RBD image, to get top used images. It is possible?

 Regards

 Mateusz



 From: Jason Dillaman 
[mailto:jdill...@redhat.com]
 Sent: Tuesday, July 17, 2018 3:29 PM
 To: Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>>
 Cc: ceph-users 
mailto:ceph-users@lists.ceph.com>>
 Subject: Re: [ceph-users] Read/write statistics per RBD image



 Yes, you just need to enable the "admin socket" in your ceph.conf and 
then use "ceph --admin-daemon /path/to/image/admin/socket.asok perf dump".



 On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL) 
mailto:mateusz.sk...@ust-global.com>> wrote:

Hi,

It is possible to get statistics of issued reads/writes to specific 
RBD image? Best will be statistics like in /proc/diskstats in linux.

Regards

Mateusz

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com






 --

 Jason






  --

  Jason






   --

   Jason






   --

   Jason

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-24 Thread SCHAER Frederic
Oh my...

Tried to yum upgrade in writeback mode and noticed the syslogs on the VM :

Jul 24 15:16:57 dev7240 kernel: end_request: I/O error, dev vda, sector 1896024
Jul 24 15:16:57 dev7240 kernel: end_request: I/O error, dev vda, sector 1896064
Jul 24 15:16:57 dev7240 kernel: end_request: I/O error, dev vda, sector 1895552
Jul 24 15:16:57 dev7240 kernel: end_request: I/O error, dev vda, sector 1895536
Jul 24 15:16:57 dev7240 kernel: end_request: I/O error, dev vda, sector 1895520
(...)

Ceph is also lgging many errors :

2018-07-24 15:20:24.893872 osd.74 [ERR] 1.33 copy from 
1:cd70e921:::rbd_data.21e0fe2ae8944a.:head to 
1:cd70e921:::rbd_data.21e0fe2ae8944a.:head data digest 
0x1480c7a1 != source 0xe1e7591b
[root@ceph0 ~]# egrep 'copy from.*to.*data digest' /var/log/ceph/ceph.log |wc -l
928

Setting the cache tier again to forward mode prevents the IO errors again :

In writeback mode :

# yum update 2>&1|tail
---> Package glibc-headers.x86_64 0:2.12-1.209.el6_9.2 will be updated
---> Package glibc-headers.x86_64 0:2.12-1.212.el6 will be an update
---> Package gmp.x86_64 0:4.3.1-12.el6 will be updated
---> Package gmp.x86_64 0:4.3.1-13.el6 will be an update
---> Package gnupg2.x86_64 0:2.0.14-8.el6 will be updated
---> Package gnupg2.x86_64 0:2.0.14-9.el6_10 will be an update
---> Package gnutls.x86_64 0:2.12.23-21.el6 will be updated
---> Package gnutls.x86_64 0:2.12.23-22.el6 will be an update
---> Package httpd.x86_64 0:2.2.15-60.sl6.6 will be updated
Error: disk I/O error


ð  Each time I run a yum update, I get a bit farther in the yum update process.

In forward mode : works as expected
I haven't tried to flush the cache pool while in forward mode... yet...

Ugh :/

Regards


De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de SCHAER 
Frederic
Envoyé : mardi 24 juillet 2018 15:01
À : ceph-users 
Objet : [PROVENANCE INTERNET] [ceph-users] 12.2.7 + osd skip data digest + 
bluestore + I/O errors

Hi,

I read the 12.2.7 upgrade notes, and set "osd skip data digest = true" before I 
started upgrading from 12.2.6 on my Bluestore-only cluster.
As far as I can tell, my OSDs all got restarted during the upgrade and all got 
the option enabled :

This is what I see for a specific OSD taken at random:
# ceph --admin-daemon /var/run/ceph/ceph-osd.68.asok config show|grep 
data_digest
"osd_skip_data_digest": "true",

This is what I see when I try to injectarg the option data digest ignore option 
:

# ceph tell osd.* injectargs '--osd_skip_data_digest=true' 2>&1|head
osd.0: osd_skip_data_digest = 'true' (not observed, change may require restart)
osd.1: osd_skip_data_digest = 'true' (not observed, change may require restart)
osd.2: osd_skip_data_digest = 'true' (not observed, change may require restart)
osd.3: osd_skip_data_digest = 'true' (not observed, change may require restart)
(...)

This has been like that since I upgraded to 12.2.7.
I read in the releanotes that the skip_data_digest  option should be sufficient 
to ignore the 12.2.6 corruptions and that objects should auto-heal on rewrite...

However...

My config :

-  Using tiering with an SSD hot storage tier

-  HDDs for cold storage

And... I get I/O errors on some VMs when running some commands as simple as 
"yum check-update".

The qemu/kvm/libirt logs show me these (in : /var/log/libvirt/qemu) :


block I/O error in device 'drive-virtio-disk0': Input/output error (5)

In the ceph logs, I can see these errors :


2018-07-24 11:17:56.420391 osd.71 [ERR] 1.23 copy from 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
0x3bb26e16 != source 0xec476c54

2018-07-24 11:17:56.429936 osd.71 [ERR] 1.23 copy from 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
0x3bb26e16 != source 0xec476c54

(yes, my cluster is seen as healthy)

On the affected OSDs, I can see these errors :

2018-07-24 11:17:56.420349 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data 
digest 0x3bb26e16 != source 0xec476c54
2018-07-24 11:17:56.420388 7f034642a700 -1 log_channel(cluster) log [ERR] : 
1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
0x3bb26e16 != source 0xec476c54
2018-07-24 11:17:56.420395 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'463407

Re: [ceph-users] Read/write statistics per RBD image

2018-07-24 Thread Mateusz Skala (UST, POL)
You must add this on node that You are running VM's and [client.libvirt] is 
name of user configured in VM. Additional if you run vm's  as standard user, 
this user should have write permissions on /var/run/ceph/ directory.

Regards,
Mateusz
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
si...@turka.nl
Sent: Tuesday, July 24, 2018 2:30 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Read/write statistics per RBD image

Hi,

On which node should we add the "admin socket" parameter to ceph.conf. On the 
MON, OSD or on what node?

On one of my clients (which is the Ansible node in this case) has the
following:
[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # must be 
writable by QEMU and allowed by SELinux or AppArmor

Permissions on the folder:
0 drwxrwx---.  2 ceph   ceph 40 23 jul 11:27 ceph

But /var/run/ceph is empty.

Thanks!
Sinan

> Hello again,
>
> How can I determine $cctid for specific rbd name? Or is there any good 
> way to map admin-socket with rbd?
>
> Regards
>
> Mateusz
>
>
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Mateusz Skala (UST, POL)
> Sent: Tuesday, July 24, 2018 9:49 AM
> To: dilla...@redhat.com
> Cc: ceph-users@lists.ceph.com
> Subject: [Possibly Forged Email] Re: [ceph-users] Read/write 
> statistics per RBD image
>
>
>
> Thank You for help, it is exactly that I need.
>
> Regards
>
> Mateusz
>
>
>
> From: Jason Dillaman [mailto:jdill...@redhat.com]
> Sent: Wednesday, July 18, 2018 1:28 PM
> To: Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>>
> Cc: dillaman mailto:dilla...@redhat.com>>; 
> ceph-users 
> mailto:ceph-users@lists.ceph.com>>
> Subject: Re: [ceph-users] Read/write statistics per RBD image
>
>
>
> Yup, on the host running librbd, you just need to enable the "admin 
> socket" in your ceph.conf and then use "ceph --admin-daemon 
> /path/to/image/admin/socket.asok perf dump" (i.e. not "ceph perf dump").
>
>
>
> See the example in this tip window [1] for how to configure for a 
> "libvirt" CephX user.
>
>
>
> [1] http://docs.ceph.com/docs/mimic/rbd/libvirt/#configuring-ceph
>
>
>
> On Wed, Jul 18, 2018 at 4:02 AM Mateusz Skala (UST, POL) 
> mailto:mateusz.sk...@ust-global.com>>
> wrote:
>
>Thanks  for response.
>
>In ‘ceph perf dump’ there is no statistics for read/write 
> operations on specific RBD image, only for osd and total client 
> operations. I need to get statistics on one specific RBD image, to get 
> top used images. It is possible?
>
>Regards
>
>Mateusz
>
>
>
>From: Jason Dillaman
> [mailto:jdill...@redhat.com]
>Sent: Tuesday, July 17, 2018 3:29 PM
>To: Mateusz Skala (UST, POL)
> mailto:mateusz.sk...@ust-global.com>>
>Cc: ceph-users
> mailto:ceph-users@lists.ceph.com>>
>Subject: Re: [ceph-users] Read/write statistics per RBD image
>
>
>
>Yes, you just need to enable the "admin socket" in your ceph.conf 
> and then use "ceph --admin-daemon /path/to/image/admin/socket.asok 
> perf dump".
>
>
>
>On Tue, Jul 17, 2018 at 8:53 AM Mateusz Skala (UST, POL) 
> mailto:mateusz.sk...@ust-global.com>>
> wrote:
>
>   Hi,
>
>   It is possible to get statistics of issued reads/writes to 
> specific RBD image? Best will be statistics like in /proc/diskstats in 
> linux.
>
>   Regards
>
>   Mateusz
>
>   ___
>   ceph-users mailing list
>   ceph-users@lists.ceph.com
>   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
>
>--
>
>Jason
>
>
>
>
>
>
>--
>
>Jason
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Guilherme Steinmüller
Satish,

I'm currently working on monasca's roles for openstack-ansible.

We have plugins that monitors ceph as well and I use in production. Bellow
you can see an example:

https://imgur.com/a/6l6Q2K6



Em ter, 24 de jul de 2018 às 02:02, Satish Patel 
escreveu:

> My 5 node ceph cluster is ready for production, now i am looking for
> good monitoring tool (Open source), what majority of folks using in
> their production?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] download.ceph.com repository changes

2018-07-24 Thread Alfredo Deza
Hi all,

After the 12.2.6 release went out, we've been thinking on better ways
to remove a version from our repositories to prevent users from
upgrading/installing a known bad release.

The way our repos are structured today means every single version of
the release is included in the repository. That is, for Luminous,
every 12.x.x version of the binaries is in the same repo. This is true
for both RPM and DEB repositories.

However, the DEB repos don't allow pinning to a given version because
our tooling (namely reprepro) doesn't construct the repositories in a
way that this is allowed. For RPM repos this is fine, and version
pinning works.

To remove a bad version we have to proposals (and would like to hear
ideas on other possibilities), one that would involve symlinks and the
other one which purges the known bad version from our repos.

*Symlinking*
When releasing we would have a "previous" and "latest" symlink that
would get updated as versions move forward. It would require
separation of versions at the URL level (all versions would no longer
be available in one repo).

The URL structure would then look like:

debian/luminous/12.2.3/
debian/luminous/previous/  (points to 12.2.5)
debian/luminous/latest/   (points to 12.2.7)

Caveats: the url structure would change from debian-luminous/ to
prevent breakage, and the versions would be split. For RPMs it would
mean a regression if someone is used to pinning, for example pinning
to 12.2.2 wouldn't be possible using the same url.

Pros: Faster release times, less need to move packages around, and
easier to remove a bad version


*Single version removal*
Our tooling would need to go and remove the known bad version from the
repository, which would require to rebuild the repository again, so
that the metadata is updated with the difference in the binaries.

Caveats: time intensive process, almost like cutting a new release
which takes about a day (and sometimes longer). Error prone since the
process wouldn't be the same (one off, just when a version needs to be
removed)

Pros: all urls for download.ceph.com and its structure are kept the same.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 12.2.7 + osd skip data digest + bluestore + I/O errors

2018-07-24 Thread Dan van der Ster
`ceph versions` -- you're sure all the osds are running 12.2.7 ?

osd_skip_data_digest = true is supposed to skip any crc checks during reads.
But maybe the cache tiering IO path is different and checks the crc anyway?

-- dan


On Tue, Jul 24, 2018 at 3:01 PM SCHAER Frederic  wrote:
>
> Hi,
>
>
>
> I read the 12.2.7 upgrade notes, and set “osd skip data digest = true” before 
> I started upgrading from 12.2.6 on my Bluestore-only cluster.
>
> As far as I can tell, my OSDs all got restarted during the upgrade and all 
> got the option enabled :
>
>
>
> This is what I see for a specific OSD taken at random:
>
> # ceph --admin-daemon /var/run/ceph/ceph-osd.68.asok config show|grep 
> data_digest
>
> "osd_skip_data_digest": "true",
>
>
>
> This is what I see when I try to injectarg the option data digest ignore 
> option :
>
>
>
> # ceph tell osd.* injectargs '--osd_skip_data_digest=true' 2>&1|head
>
> osd.0: osd_skip_data_digest = 'true' (not observed, change may require 
> restart)
>
> osd.1: osd_skip_data_digest = 'true' (not observed, change may require 
> restart)
>
> osd.2: osd_skip_data_digest = 'true' (not observed, change may require 
> restart)
>
> osd.3: osd_skip_data_digest = 'true' (not observed, change may require 
> restart)
>
> (…)
>
>
>
> This has been like that since I upgraded to 12.2.7.
>
> I read in the releanotes that the skip_data_digest  option should be 
> sufficient to ignore the 12.2.6 corruptions and that objects should auto-heal 
> on rewrite…
>
>
>
> However…
>
>
>
> My config :
>
> -  Using tiering with an SSD hot storage tier
>
> -  HDDs for cold storage
>
>
>
> And… I get I/O errors on some VMs when running some commands as simple as 
> “yum check-update”.
>
>
>
> The qemu/kvm/libirt logs show me these (in : /var/log/libvirt/qemu) :
>
>
>
> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>
>
>
> In the ceph logs, I can see these errors :
>
>
>
> 2018-07-24 11:17:56.420391 osd.71 [ERR] 1.23 copy from 
> 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
> 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
> 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.429936 osd.71 [ERR] 1.23 copy from 
> 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
> 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
> 0x3bb26e16 != source 0xec476c54
>
>
>
> (yes, my cluster is seen as healthy)
>
>
>
> On the affected OSDs, I can see these errors :
>
>
>
> 2018-07-24 11:17:56.420349 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
> 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
> n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
> 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
> 182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data 
> digest 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.420388 7f034642a700 -1 log_channel(cluster) log [ERR] : 
> 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
> 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
> 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.420395 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
> 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
> n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
> 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
> 182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected 
> promote error (5) Input/output error
>
> 2018-07-24 11:17:56.429900 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
> 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
> n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
> 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
> 182367'46340723 mlcod 182367'46340723 active+clean] process_copy_chunk data 
> digest 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.429934 7f034642a700 -1 log_channel(cluster) log [ERR] : 
> 1.23 copy from 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head to 
> 1:c590b9d7:::rbd_data.1920e2238e1f29.00e7:head data digest 
> 0x3bb26e16 != source 0xec476c54
>
> 2018-07-24 11:17:56.429939 7f034642a700 -1 osd.71 pg_epoch: 182367 pg[1.23( v 
> 182367'46340724 (182367'46339152,182367'46340724] local-lis/les=182298/182299 
> n=344 ec=2726/2726 lis/c 182298/182298 les/c/f 182299/182299/0 
> 182298/182298/43896) [71,101,74] r=0 lpr=182298 crt=182367'46340724 lcod 
> 182367'46340723 mlcod 182367'46340723 active+clean] finish_promote unexpected 
> promote error (5) Input/output error
>
>
>
> And…. I don’t know how to recover from that.
>
> Pool #1 is my SSD cache tier, hence pg 1.23 is on the SSD side.
>
>
>
> I’ve tried setting the cache pool to “readforward” despite the “not well 
> supported” warning and could immedi

Re: [ceph-users] ceph cluster monitoring tool

2018-07-24 Thread Lenz Grimmer
On 07/24/2018 07:02 AM, Satish Patel wrote:

> My 5 node ceph cluster is ready for production, now i am looking for
> good monitoring tool (Open source), what majority of folks using in
> their production?

There are several, using Prometheus with the Ceph Exporter Manager
module is a popular choice for collecting the metrics. The ceph-metrics
project provide an exhaustive collection of dashboards for Grafana that
will help with the visualization and some alerting based on these metrics.

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Dan van der Ster
On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza  wrote:
>
> Hi all,
>
> After the 12.2.6 release went out, we've been thinking on better ways
> to remove a version from our repositories to prevent users from
> upgrading/installing a known bad release.
>
> The way our repos are structured today means every single version of
> the release is included in the repository. That is, for Luminous,
> every 12.x.x version of the binaries is in the same repo. This is true
> for both RPM and DEB repositories.
>
> However, the DEB repos don't allow pinning to a given version because
> our tooling (namely reprepro) doesn't construct the repositories in a
> way that this is allowed. For RPM repos this is fine, and version
> pinning works.
>
> To remove a bad version we have to proposals (and would like to hear
> ideas on other possibilities), one that would involve symlinks and the
> other one which purges the known bad version from our repos.

What we did with our mirror was: `rm -f *12.2.6*; createrepo --update
.` Took a few seconds. Then disabled the mirror cron.

-- Dan

>
> *Symlinking*
> When releasing we would have a "previous" and "latest" symlink that
> would get updated as versions move forward. It would require
> separation of versions at the URL level (all versions would no longer
> be available in one repo).
>
> The URL structure would then look like:
>
> debian/luminous/12.2.3/
> debian/luminous/previous/  (points to 12.2.5)
> debian/luminous/latest/   (points to 12.2.7)
>
> Caveats: the url structure would change from debian-luminous/ to
> prevent breakage, and the versions would be split. For RPMs it would
> mean a regression if someone is used to pinning, for example pinning
> to 12.2.2 wouldn't be possible using the same url.
>
> Pros: Faster release times, less need to move packages around, and
> easier to remove a bad version
>
>
> *Single version removal*
> Our tooling would need to go and remove the known bad version from the
> repository, which would require to rebuild the repository again, so
> that the metadata is updated with the difference in the binaries.
>
> Caveats: time intensive process, almost like cutting a new release
> which takes about a day (and sometimes longer). Error prone since the
> process wouldn't be the same (one off, just when a version needs to be
> removed)
>
> Pros: all urls for download.ceph.com and its structure are kept the same.
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Alfredo Deza
On Tue, Jul 24, 2018 at 10:54 AM, Dan van der Ster  wrote:
> On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza  wrote:
>>
>> Hi all,
>>
>> After the 12.2.6 release went out, we've been thinking on better ways
>> to remove a version from our repositories to prevent users from
>> upgrading/installing a known bad release.
>>
>> The way our repos are structured today means every single version of
>> the release is included in the repository. That is, for Luminous,
>> every 12.x.x version of the binaries is in the same repo. This is true
>> for both RPM and DEB repositories.
>>
>> However, the DEB repos don't allow pinning to a given version because
>> our tooling (namely reprepro) doesn't construct the repositories in a
>> way that this is allowed. For RPM repos this is fine, and version
>> pinning works.
>>
>> To remove a bad version we have to proposals (and would like to hear
>> ideas on other possibilities), one that would involve symlinks and the
>> other one which purges the known bad version from our repos.
>
> What we did with our mirror was: `rm -f *12.2.6*; createrepo --update
> .` Took a few seconds. Then disabled the mirror cron.

Up until next time when we cut another release and you have to
re-enable the mirror with 12.2.6 in it :(

This is also fast for RPM repos, but not quite fast for DEB repos.
Finally, *if* you are doing this, the metadata changes, and the repos
need to
be signed again. I am curious how that --update operation didn't make
installations complain

>
> -- Dan
>
>>
>> *Symlinking*
>> When releasing we would have a "previous" and "latest" symlink that
>> would get updated as versions move forward. It would require
>> separation of versions at the URL level (all versions would no longer
>> be available in one repo).
>>
>> The URL structure would then look like:
>>
>> debian/luminous/12.2.3/
>> debian/luminous/previous/  (points to 12.2.5)
>> debian/luminous/latest/   (points to 12.2.7)
>>
>> Caveats: the url structure would change from debian-luminous/ to
>> prevent breakage, and the versions would be split. For RPMs it would
>> mean a regression if someone is used to pinning, for example pinning
>> to 12.2.2 wouldn't be possible using the same url.
>>
>> Pros: Faster release times, less need to move packages around, and
>> easier to remove a bad version
>>
>>
>> *Single version removal*
>> Our tooling would need to go and remove the known bad version from the
>> repository, which would require to rebuild the repository again, so
>> that the metadata is updated with the difference in the binaries.
>>
>> Caveats: time intensive process, almost like cutting a new release
>> which takes about a day (and sometimes longer). Error prone since the
>> process wouldn't be the same (one off, just when a version needs to be
>> removed)
>>
>> Pros: all urls for download.ceph.com and its structure are kept the same.
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Ken Dreyer
On Tue, Jul 24, 2018 at 8:54 AM, Dan van der Ster  wrote:
> On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza  wrote:
>>
>> Hi all,
>>
>> After the 12.2.6 release went out, we've been thinking on better ways
>> to remove a version from our repositories to prevent users from
>> upgrading/installing a known bad release.
>>
>> The way our repos are structured today means every single version of
>> the release is included in the repository. That is, for Luminous,
>> every 12.x.x version of the binaries is in the same repo. This is true
>> for both RPM and DEB repositories.
>>
>> However, the DEB repos don't allow pinning to a given version because
>> our tooling (namely reprepro) doesn't construct the repositories in a
>> way that this is allowed. For RPM repos this is fine, and version
>> pinning works.
>>
>> To remove a bad version we have to proposals (and would like to hear
>> ideas on other possibilities), one that would involve symlinks and the
>> other one which purges the known bad version from our repos.
>
> What we did with our mirror was: `rm -f *12.2.6*; createrepo --update
> .` Took a few seconds. Then disabled the mirror cron.

Unfortunately with Debian repositories, reprepro is a lot more
complicated, and then we have to re-sign the new repository metadata,
so it's a little more involved there.

BUT perfect is the enemy of the good so maybe we should have just done
your suggestion for RPMs at least.

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Dan van der Ster
On Tue, Jul 24, 2018 at 4:59 PM Alfredo Deza  wrote:
>
> On Tue, Jul 24, 2018 at 10:54 AM, Dan van der Ster  
> wrote:
> > On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza  wrote:
> >>
> >> Hi all,
> >>
> >> After the 12.2.6 release went out, we've been thinking on better ways
> >> to remove a version from our repositories to prevent users from
> >> upgrading/installing a known bad release.
> >>
> >> The way our repos are structured today means every single version of
> >> the release is included in the repository. That is, for Luminous,
> >> every 12.x.x version of the binaries is in the same repo. This is true
> >> for both RPM and DEB repositories.
> >>
> >> However, the DEB repos don't allow pinning to a given version because
> >> our tooling (namely reprepro) doesn't construct the repositories in a
> >> way that this is allowed. For RPM repos this is fine, and version
> >> pinning works.
> >>
> >> To remove a bad version we have to proposals (and would like to hear
> >> ideas on other possibilities), one that would involve symlinks and the
> >> other one which purges the known bad version from our repos.
> >
> > What we did with our mirror was: `rm -f *12.2.6*; createrepo --update
> > .` Took a few seconds. Then disabled the mirror cron.
>
> Up until next time when we cut another release and you have to
> re-enable the mirror with 12.2.6 in it :(
>

Right... we re-sync'd 12.2.6 along with 12.2.7 -- but people here
mostly grab the highest version.

> This is also fast for RPM repos, but not quite fast for DEB repos.
> Finally, *if* you are doing this, the metadata changes, and the repos
> need to
> be signed again. I am curious how that --update operation didn't make
> installations complain

Good question.. I don't know enough about the repo signatures to
comment on this.
I do know that all clients who had distro-sync'd up to 12.2.6
successfully distro-sync'd back to 12.2.5.
(Our client machines yum distro-sync daily).

-- Dan

>
> >
> > -- Dan
> >
> >>
> >> *Symlinking*
> >> When releasing we would have a "previous" and "latest" symlink that
> >> would get updated as versions move forward. It would require
> >> separation of versions at the URL level (all versions would no longer
> >> be available in one repo).
> >>
> >> The URL structure would then look like:
> >>
> >> debian/luminous/12.2.3/
> >> debian/luminous/previous/  (points to 12.2.5)
> >> debian/luminous/latest/   (points to 12.2.7)
> >>
> >> Caveats: the url structure would change from debian-luminous/ to
> >> prevent breakage, and the versions would be split. For RPMs it would
> >> mean a regression if someone is used to pinning, for example pinning
> >> to 12.2.2 wouldn't be possible using the same url.
> >>
> >> Pros: Faster release times, less need to move packages around, and
> >> easier to remove a bad version
> >>
> >>
> >> *Single version removal*
> >> Our tooling would need to go and remove the known bad version from the
> >> repository, which would require to rebuild the repository again, so
> >> that the metadata is updated with the difference in the binaries.
> >>
> >> Caveats: time intensive process, almost like cutting a new release
> >> which takes about a day (and sometimes longer). Error prone since the
> >> process wouldn't be the same (one off, just when a version needs to be
> >> removed)
> >>
> >> Pros: all urls for download.ceph.com and its structure are kept the same.
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Dan van der Ster
On Tue, Jul 24, 2018 at 5:08 PM Dan van der Ster  wrote:
>
> On Tue, Jul 24, 2018 at 4:59 PM Alfredo Deza  wrote:
> >
> > On Tue, Jul 24, 2018 at 10:54 AM, Dan van der Ster  
> > wrote:
> > > On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza  wrote:
> > >>
> > >> Hi all,
> > >>
> > >> After the 12.2.6 release went out, we've been thinking on better ways
> > >> to remove a version from our repositories to prevent users from
> > >> upgrading/installing a known bad release.
> > >>
> > >> The way our repos are structured today means every single version of
> > >> the release is included in the repository. That is, for Luminous,
> > >> every 12.x.x version of the binaries is in the same repo. This is true
> > >> for both RPM and DEB repositories.
> > >>
> > >> However, the DEB repos don't allow pinning to a given version because
> > >> our tooling (namely reprepro) doesn't construct the repositories in a
> > >> way that this is allowed. For RPM repos this is fine, and version
> > >> pinning works.
> > >>
> > >> To remove a bad version we have to proposals (and would like to hear
> > >> ideas on other possibilities), one that would involve symlinks and the
> > >> other one which purges the known bad version from our repos.
> > >
> > > What we did with our mirror was: `rm -f *12.2.6*; createrepo --update
> > > .` Took a few seconds. Then disabled the mirror cron.
> >
> > Up until next time when we cut another release and you have to
> > re-enable the mirror with 12.2.6 in it :(
> >
>
> Right... we re-sync'd 12.2.6 along with 12.2.7 -- but people here
> mostly grab the highest version.
>
> > This is also fast for RPM repos, but not quite fast for DEB repos.
> > Finally, *if* you are doing this, the metadata changes, and the repos
> > need to
> > be signed again. I am curious how that --update operation didn't make
> > installations complain
>
> Good question.. I don't know enough about the repo signatures to
> comment on this.

I asked our mirror man. Apparently we don't sign the repo, only the
rpms. So not applicable in general I suppose.

Another completely different (and not my) idea, how about we retag the
last good release with z+1. In this case we had 12.2.5 as the last
good, and 12.2.6 broken, so we add the v12.2.7 tag on v12.2.5,
effectively re-pushing 12.2.5 to the top.

-- dan

> I do know that all clients who had distro-sync'd up to 12.2.6
> successfully distro-sync'd back to 12.2.5.
> (Our client machines yum distro-sync daily).
>
> -- Dan
>
> >
> > >
> > > -- Dan
> > >
> > >>
> > >> *Symlinking*
> > >> When releasing we would have a "previous" and "latest" symlink that
> > >> would get updated as versions move forward. It would require
> > >> separation of versions at the URL level (all versions would no longer
> > >> be available in one repo).
> > >>
> > >> The URL structure would then look like:
> > >>
> > >> debian/luminous/12.2.3/
> > >> debian/luminous/previous/  (points to 12.2.5)
> > >> debian/luminous/latest/   (points to 12.2.7)
> > >>
> > >> Caveats: the url structure would change from debian-luminous/ to
> > >> prevent breakage, and the versions would be split. For RPMs it would
> > >> mean a regression if someone is used to pinning, for example pinning
> > >> to 12.2.2 wouldn't be possible using the same url.
> > >>
> > >> Pros: Faster release times, less need to move packages around, and
> > >> easier to remove a bad version
> > >>
> > >>
> > >> *Single version removal*
> > >> Our tooling would need to go and remove the known bad version from the
> > >> repository, which would require to rebuild the repository again, so
> > >> that the metadata is updated with the difference in the binaries.
> > >>
> > >> Caveats: time intensive process, almost like cutting a new release
> > >> which takes about a day (and sometimes longer). Error prone since the
> > >> process wouldn't be the same (one off, just when a version needs to be
> > >> removed)
> > >>
> > >> Pros: all urls for download.ceph.com and its structure are kept the same.
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > >> the body of a message to majord...@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Brent Kennedy
It would be nice if ceph-deploy could select the version as well as the
release.  E.G:  --release luminous --version 12.2.7   

Otherwise, I deploy a newest release to a new OSD server, then have to
upgrade the rest of the cluster ( unless the cluster is on a previous
release at the highest level )

Not sure if this adds to this particular discussion though :)

-Brent

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan
van der Ster
Sent: Tuesday, July 24, 2018 11:26 AM
To: Alfredo Deza 
Cc: ceph-users ; ceph-de...@vger.kernel.org;
ceph-maintain...@ceph.com
Subject: Re: [ceph-users] download.ceph.com repository changes

On Tue, Jul 24, 2018 at 5:08 PM Dan van der Ster  wrote:
>
> On Tue, Jul 24, 2018 at 4:59 PM Alfredo Deza  wrote:
> >
> > On Tue, Jul 24, 2018 at 10:54 AM, Dan van der Ster 
wrote:
> > > On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza  wrote:
> > >>
> > >> Hi all,
> > >>
> > >> After the 12.2.6 release went out, we've been thinking on better 
> > >> ways to remove a version from our repositories to prevent users 
> > >> from upgrading/installing a known bad release.
> > >>
> > >> The way our repos are structured today means every single version 
> > >> of the release is included in the repository. That is, for 
> > >> Luminous, every 12.x.x version of the binaries is in the same 
> > >> repo. This is true for both RPM and DEB repositories.
> > >>
> > >> However, the DEB repos don't allow pinning to a given version 
> > >> because our tooling (namely reprepro) doesn't construct the 
> > >> repositories in a way that this is allowed. For RPM repos this is 
> > >> fine, and version pinning works.
> > >>
> > >> To remove a bad version we have to proposals (and would like to 
> > >> hear ideas on other possibilities), one that would involve 
> > >> symlinks and the other one which purges the known bad version from
our repos.
> > >
> > > What we did with our mirror was: `rm -f *12.2.6*; createrepo 
> > > --update .` Took a few seconds. Then disabled the mirror cron.
> >
> > Up until next time when we cut another release and you have to 
> > re-enable the mirror with 12.2.6 in it :(
> >
>
> Right... we re-sync'd 12.2.6 along with 12.2.7 -- but people here 
> mostly grab the highest version.
>
> > This is also fast for RPM repos, but not quite fast for DEB repos.
> > Finally, *if* you are doing this, the metadata changes, and the 
> > repos need to be signed again. I am curious how that --update 
> > operation didn't make installations complain
>
> Good question.. I don't know enough about the repo signatures to 
> comment on this.

I asked our mirror man. Apparently we don't sign the repo, only the rpms. So
not applicable in general I suppose.

Another completely different (and not my) idea, how about we retag the last
good release with z+1. In this case we had 12.2.5 as the last good, and
12.2.6 broken, so we add the v12.2.7 tag on v12.2.5, effectively re-pushing
12.2.5 to the top.

-- dan

> I do know that all clients who had distro-sync'd up to 12.2.6 
> successfully distro-sync'd back to 12.2.5.
> (Our client machines yum distro-sync daily).
>
> -- Dan
>
> >
> > >
> > > -- Dan
> > >
> > >>
> > >> *Symlinking*
> > >> When releasing we would have a "previous" and "latest" symlink 
> > >> that would get updated as versions move forward. It would require 
> > >> separation of versions at the URL level (all versions would no 
> > >> longer be available in one repo).
> > >>
> > >> The URL structure would then look like:
> > >>
> > >> debian/luminous/12.2.3/
> > >> debian/luminous/previous/  (points to 12.2.5)
> > >> debian/luminous/latest/   (points to 12.2.7)
> > >>
> > >> Caveats: the url structure would change from debian-luminous/ to 
> > >> prevent breakage, and the versions would be split. For RPMs it 
> > >> would mean a regression if someone is used to pinning, for 
> > >> example pinning to 12.2.2 wouldn't be possible using the same url.
> > >>
> > >> Pros: Faster release times, less need to move packages around, 
> > >> and easier to remove a bad version
> > >>
> > >>
> > >> *Single version removal*
> > >> Our tooling would need to go and remove the known bad version 
> > >> from the repository, which would require to rebuild the 
> > >> repository again, so that the metadata is updated with the difference
in the binaries.
> > >>
> > >> Caveats: time intensive process, almost like cutting a new 
> > >> release which takes about a day (and sometimes longer). Error 
> > >> prone since the process wouldn't be the same (one off, just when 
> > >> a version needs to be
> > >> removed)
> > >>
> > >> Pros: all urls for download.ceph.com and its structure are kept the
same.
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe 
> > >> ceph-devel" in the body of a message to majord...@vger.kernel.org 
> > >> More majordomo info at  
> > >> http://vger.kernel.org/majordomo-info.html

Re: [ceph-users] download.ceph.com repository changes

2018-07-24 Thread Alfredo Deza
On Tue, Jul 24, 2018 at 1:19 PM, Brent Kennedy  wrote:
> It would be nice if ceph-deploy could select the version as well as the
> release.  E.G:  --release luminous --version 12.2.7
>
> Otherwise, I deploy a newest release to a new OSD server, then have to
> upgrade the rest of the cluster ( unless the cluster is on a previous
> release at the highest level )
>
> Not sure if this adds to this particular discussion though :)

That might work for RPMs, but it doesn't work for DEB repos because
our repos don't support it
>
> -Brent
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan
> van der Ster
> Sent: Tuesday, July 24, 2018 11:26 AM
> To: Alfredo Deza 
> Cc: ceph-users ; ceph-de...@vger.kernel.org;
> ceph-maintain...@ceph.com
> Subject: Re: [ceph-users] download.ceph.com repository changes
>
> On Tue, Jul 24, 2018 at 5:08 PM Dan van der Ster  wrote:
>>
>> On Tue, Jul 24, 2018 at 4:59 PM Alfredo Deza  wrote:
>> >
>> > On Tue, Jul 24, 2018 at 10:54 AM, Dan van der Ster 
> wrote:
>> > > On Tue, Jul 24, 2018 at 4:38 PM Alfredo Deza  wrote:
>> > >>
>> > >> Hi all,
>> > >>
>> > >> After the 12.2.6 release went out, we've been thinking on better
>> > >> ways to remove a version from our repositories to prevent users
>> > >> from upgrading/installing a known bad release.
>> > >>
>> > >> The way our repos are structured today means every single version
>> > >> of the release is included in the repository. That is, for
>> > >> Luminous, every 12.x.x version of the binaries is in the same
>> > >> repo. This is true for both RPM and DEB repositories.
>> > >>
>> > >> However, the DEB repos don't allow pinning to a given version
>> > >> because our tooling (namely reprepro) doesn't construct the
>> > >> repositories in a way that this is allowed. For RPM repos this is
>> > >> fine, and version pinning works.
>> > >>
>> > >> To remove a bad version we have to proposals (and would like to
>> > >> hear ideas on other possibilities), one that would involve
>> > >> symlinks and the other one which purges the known bad version from
> our repos.
>> > >
>> > > What we did with our mirror was: `rm -f *12.2.6*; createrepo
>> > > --update .` Took a few seconds. Then disabled the mirror cron.
>> >
>> > Up until next time when we cut another release and you have to
>> > re-enable the mirror with 12.2.6 in it :(
>> >
>>
>> Right... we re-sync'd 12.2.6 along with 12.2.7 -- but people here
>> mostly grab the highest version.
>>
>> > This is also fast for RPM repos, but not quite fast for DEB repos.
>> > Finally, *if* you are doing this, the metadata changes, and the
>> > repos need to be signed again. I am curious how that --update
>> > operation didn't make installations complain
>>
>> Good question.. I don't know enough about the repo signatures to
>> comment on this.
>
> I asked our mirror man. Apparently we don't sign the repo, only the rpms. So
> not applicable in general I suppose.
>
> Another completely different (and not my) idea, how about we retag the last
> good release with z+1. In this case we had 12.2.5 as the last good, and
> 12.2.6 broken, so we add the v12.2.7 tag on v12.2.5, effectively re-pushing
> 12.2.5 to the top.
>
> -- dan
>
>> I do know that all clients who had distro-sync'd up to 12.2.6
>> successfully distro-sync'd back to 12.2.5.
>> (Our client machines yum distro-sync daily).
>>
>> -- Dan
>>
>> >
>> > >
>> > > -- Dan
>> > >
>> > >>
>> > >> *Symlinking*
>> > >> When releasing we would have a "previous" and "latest" symlink
>> > >> that would get updated as versions move forward. It would require
>> > >> separation of versions at the URL level (all versions would no
>> > >> longer be available in one repo).
>> > >>
>> > >> The URL structure would then look like:
>> > >>
>> > >> debian/luminous/12.2.3/
>> > >> debian/luminous/previous/  (points to 12.2.5)
>> > >> debian/luminous/latest/   (points to 12.2.7)
>> > >>
>> > >> Caveats: the url structure would change from debian-luminous/ to
>> > >> prevent breakage, and the versions would be split. For RPMs it
>> > >> would mean a regression if someone is used to pinning, for
>> > >> example pinning to 12.2.2 wouldn't be possible using the same url.
>> > >>
>> > >> Pros: Faster release times, less need to move packages around,
>> > >> and easier to remove a bad version
>> > >>
>> > >>
>> > >> *Single version removal*
>> > >> Our tooling would need to go and remove the known bad version
>> > >> from the repository, which would require to rebuild the
>> > >> repository again, so that the metadata is updated with the difference
> in the binaries.
>> > >>
>> > >> Caveats: time intensive process, almost like cutting a new
>> > >> release which takes about a day (and sometimes longer). Error
>> > >> prone since the process wouldn't be the same (one off, just when
>> > >> a version needs to be
>> > >> removed)
>> > >>
>> > >> Pros: all urls for download.ceph.com and its structure are kept the
> same.

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-24 Thread Daniel Carrasco
Hello,

I've run the profiler for about 5-6 minutes and this is what I've got:




Using local file /usr/bin/ceph-mds.
Using local file /var/log/ceph/mds.kavehome-mgto-pro-fs01.profile.0009.heap.
Total: 400.0 MB
   362.5  90.6%  90.6%362.5  90.6%
ceph::buffer::create_aligned_in_mempool
20.4   5.1%  95.7% 29.8   7.5% CDir::_load_dentry
 5.9   1.5%  97.2%  6.9   1.7% CDir::add_primary_dentry
 4.7   1.2%  98.4%  4.7   1.2% ceph::logging::Log::create_entry
 1.8   0.5%  98.8%  1.8   0.5% std::_Rb_tree::_M_emplace_hint_unique
 1.8   0.5%  99.3%  2.2   0.5% compact_map_base::decode
 0.6   0.1%  99.4%  0.7   0.2% CInode::add_client_cap
 0.5   0.1%  99.5%  0.5   0.1% std::__cxx11::basic_string::_M_mutate
 0.4   0.1%  99.6%  0.4   0.1% SimpleLock::more
 0.4   0.1%  99.7%  0.4   0.1% MDCache::add_inode
 0.3   0.1%  99.8%  0.3   0.1% CDir::add_to_bloom
 0.2   0.1%  99.9%  0.2   0.1% CDir::steal_dentry
 0.2   0.0%  99.9%  0.2   0.0% CInode::get_or_open_dirfrag
 0.1   0.0%  99.9%  0.8   0.2% std::enable_if::type decode
 0.1   0.0% 100.0%  0.1   0.0% ceph::buffer::list::crc32c
 0.1   0.0% 100.0%  0.1   0.0% decode_message
 0.0   0.0% 100.0%  0.0   0.0% OpTracker::create_request
 0.0   0.0% 100.0%  0.0   0.0% TrackedOp::TrackedOp
 0.0   0.0% 100.0%  0.0   0.0% std::vector::_M_emplace_back_aux
 0.0   0.0% 100.0%  0.0   0.0% std::_Rb_tree::_M_insert_unique
 0.0   0.0% 100.0%  0.0   0.0% CInode::add_dirfrag
 0.0   0.0% 100.0%  0.0   0.0% MDLog::_prepare_new_segment
 0.0   0.0% 100.0%  0.0   0.0% DispatchQueue::enqueue
 0.0   0.0% 100.0%  0.0   0.0% ceph::buffer::list::push_back
 0.0   0.0% 100.0%  0.0   0.0% Server::prepare_new_inode
 0.0   0.0% 100.0%365.6  91.4% EventCenter::process_events
 0.0   0.0% 100.0%  0.0   0.0% std::_Rb_tree::_M_copy
 0.0   0.0% 100.0%  0.0   0.0% CDir::add_null_dentry
 0.0   0.0% 100.0%  0.0   0.0% Locker::check_inode_max_size
 0.0   0.0% 100.0%  0.0   0.0% CDentry::add_client_lease
 0.0   0.0% 100.0%  0.0   0.0% CInode::project_inode
 0.0   0.0% 100.0%  0.0   0.0% std::__cxx11::list::_M_insert
 0.0   0.0% 100.0%  0.0   0.0% MDBalancer::handle_heartbeat
 0.0   0.0% 100.0%  0.0   0.0% MDBalancer::send_heartbeat
 0.0   0.0% 100.0%  0.0   0.0% C_GatherBase::C_GatherSub::complete
 0.0   0.0% 100.0%  0.0   0.0% EventCenter::create_time_event
 0.0   0.0% 100.0%  0.0   0.0% CDir::_omap_fetch
 0.0   0.0% 100.0%  0.0   0.0% Locker::handle_inode_file_caps
 0.0   0.0% 100.0%  0.0   0.0% std::_Rb_tree::_M_insert_equal
 0.0   0.0% 100.0%  0.0   0.0% Locker::issue_caps
 0.0   0.0% 100.0%  0.1   0.0% MDLog::_submit_thread
 0.0   0.0% 100.0%  0.0   0.0% Journaler::_wait_for_flush
 0.0   0.0% 100.0%  0.0   0.0% Journaler::wrap_finisher
 0.0   0.0% 100.0%  0.0   0.0% MDSCacheObject::add_waiter
 0.0   0.0% 100.0%  0.0   0.0% std::__cxx11::list::insert
 0.0   0.0% 100.0%  0.0   0.0% std::__detail::_Map_base::operator[]
 0.0   0.0% 100.0%  0.0   0.0% Locker::mark_updated_scatterlock
 0.0   0.0% 100.0%  0.0   0.0% std::_Rb_tree::_M_insert_
 0.0   0.0% 100.0%  0.0   0.0% alloc_ptr::operator->
 0.0   0.0% 100.0%  0.0   0.0% ceph::buffer::list::append@5c1560
 0.0   0.0% 100.0%  0.0   0.0%
ceph::buffer::malformed_input::~malformed_input
 0.0   0.0% 100.0%  0.0   0.0% compact_set_base::insert
 0.0   0.0% 100.0%  0.0   0.0% CDir::add_waiter
 0.0   0.0% 100.0%  0.0   0.0% InoTable::apply_release_ids
 0.0   0.0% 100.0%  0.0   0.0% InoTable::project_release_ids
 0.0   0.0% 100.0%  2.2   0.5% InodeStoreBase::decode_bare
 0.0   0.0% 100.0%  0.0   0.0% interval_set::erase
 0.0   0.0% 100.0%  1.1   0.3% std::map::operator[]
 0.0   0.0% 100.0%  0.0   0.0% Beacon::_send
 0.0   0.0% 100.0%  0.0   0.0% MDSDaemon::reset_tick
 0.0   0.0% 100.0%  0.0   0.0% MgrClient::send_report
 0.0   0.0% 100.0%  0.0   0.0% Journaler::_do_flush
 0.0   0.0% 100.0%  0.1   0.0% Locker::rdlock_start
 0.0   0.0% 100.0%  0.0   0.0% MDCache::_get_waiter
 0.0   0.0% 100.0%  0.0   0.0% CDentry::~CDentry
 0.0   0.0% 100.0%  0.0   0.0% MonClient::schedule_tick
 0.0   0.0% 100.0%  0.1   0.0% AsyncConnection::handle_write
 0.0   0.0% 100.0%  0.1   0.0% AsyncConnection::prepare_send_message
 0.0   0.0% 100.0%365.5  91.4% AsyncConnection::process
 0.0