[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.6

2020-03-07 Thread danjou . philippe
I'm having the same issue on 14.2.4. Have you fixed it?
I disabled all modules apart from pg balancer (can't be disabled).

I opened a report with wallclock profiler on tracker 
https://tracker.ceph.com/issues/44496
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Disabling Telemetry

2020-03-07 Thread m
Is there another way to disable telemetry then using:

> ceph telemetry off
> Error EIO: Module 'telemetry' has experienced an error and cannot handle 
> commands: cannot concatenate 'str' and 'UUID' objects

I'm attempting to get all my clusters out of a constant HEALTH_ERR state caused 
by either the above error or the telemetry endpoint being down.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Disabling Telemetry

2020-03-07 Thread Sage Weil
On Sat, 7 Mar 2020, m...@silvenga.com wrote:
> Is there another way to disable telemetry then using:
> 
> > ceph telemetry off
> > Error EIO: Module 'telemetry' has experienced an error and cannot handle 
> > commands: cannot concatenate 'str' and 'UUID' objects
> 
> I'm attempting to get all my clusters out of a constant HEALTH_ERR state 
> caused by either the above error or the telemetry endpoint being down.

Restart the mgr daemon and the problem will go away (until the next time 
the telemetry server is unavailable).  The endpoint went down because we 
were updating the backend VM; hopefully it won't happen again!

Unfortunately the fix to avoid these errors[1] didn't merge in time for 
the latest nautilus (14.2.8), but it will be in the next point release.

sage

[1] https://github.com/ceph/ceph/pull/33141
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Disabling Telemetry

2020-03-07 Thread m
Thanks! I should have tried that, upgrading the clusters to 14.2.8, concurrent 
to that endpoint being down, made the issue hard to track.

I'll make sure to re-enable telemetry when that pr merges into the next release.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph rbd volumes/images IO details

2020-03-07 Thread M Ranga Swami Reddy
On Fri, Mar 6, 2020 at 1:06 AM M Ranga Swami Reddy 
wrote:

> Hello,
> Can we get the IOPs of any rbd image/volume?
>
> For ex: I have created volumes via OpenStack Cinder. Want to know
> the IOPs of these volumes.
>
> In general - we can get pool stats, but not seen the per volumes stats.
>
> Any hint here? Appreciated.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Identify slow ops

2020-03-07 Thread ceph
Hi Thomas,

I would First try to get more space - as ceph will block io when your disks are 
full - perhaps your PGs are unbalaced.

Does ceph osd df tree give any hint?

Or is this already resolved?

Hth
Mehmet 

Am 5. März 2020 09:26:13 MEZ schrieb Thomas Schneider <74cmo...@gmail.com>:
>Hi,
>
>I have stopped all 3 MON services sequentially.
>After starting the 3 MON services again, the slow ops where gone.
>However, just after 1 min. of MON service uptime, the slow ops are back
>again, and the blocked time is increasing constantly.
>
>root@ld3955:/home/ceph-scripts
># ceph -w
>  cluster:
>    id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
>    health: HEALTH_WARN
>    17 nearfull osd(s)
>    1 pool(s) nearfull
>    2 slow ops, oldest one blocked for 63 sec, mon.ld5505 has
>slow ops
>
>  services:
>    mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 67s)
>    mgr: ld5505(active, since 11d), standbys: ld5506, ld5507
>    mds: cephfs:2 {0=ld5507=up:active,1=ld5505=up:active} 2
>up:standby-replay 3 up:standby
>    osd: 442 osds: 442 up (since 4w), 442 in (since 4w); 10 remapped
>pgs
>
>  data:
>    pools:   7 pools, 19628 pgs
>    objects: 72.14M objects, 275 TiB
>    usage:   826 TiB used, 705 TiB / 1.5 PiB avail
>    pgs: 16920/216422157 objects misplaced (0.008%)
> 19618 active+clean
> 10    active+remapped+backfilling
>
>  io:
>    client:   454 KiB/s rd, 15 MiB/s wr, 905 op/s rd, 463 op/s wr
>    recovery: 125 MiB/s, 31 objects/s
>
>
>2020-03-05 09:21:48.647440 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 63 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:21:53.648708 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 68 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:21:58.650186 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 73 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:03.651447 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 78 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:08.653066 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 83 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:13.654699 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 88 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:18.655912 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 93 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:23.657263 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 98 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:28.658514 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 103 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:33.659965 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 108 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:38.661360 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 113 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:43.662727 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 118 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:48.663940 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 123 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:53.685451 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 128 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:22:58.691603 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 133 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:03.692841 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 138 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:08.694502 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 143 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:13.695991 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 148 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:18.697689 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 153 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:23.698945 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 158 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:28.700331 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 163 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:33.701754 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 168 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:38.703021 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 173 sec, mon.ld5505 has slow ops (SLOW_OPS)
>2020-03-05 09:23:43.704396 mon.ld5505 [WRN] Health check update: 2 slow
>ops, oldest one blocked for 178 sec, mon.ld5505 has slow ops (

[ceph-users] Welcome to the "ceph-users" mailing list

2020-03-07 Thread Abhinav Singh
singhabhinav9051571...@gmail.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph rbd volumes/images IO details

2020-03-07 Thread XuYun
You can enable prometheus module of mgr if you are running Nautilus.

> 2020年3月8日 上午2:15,M Ranga Swami Reddy  写道:
> 
> On Fri, Mar 6, 2020 at 1:06 AM M Ranga Swami Reddy 
> wrote:
> 
>> Hello,
>> Can we get the IOPs of any rbd image/volume?
>> 
>> For ex: I have created volumes via OpenStack Cinder. Want to know
>> the IOPs of these volumes.
>> 
>> In general - we can get pool stats, but not seen the per volumes stats.
>> 
>> Any hint here? Appreciated.
>> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW jaegerTracing

2020-03-07 Thread Abhinav Singh
I am trying to implement jaeger tracing in RGW, I need some advice
regarding on which functions should I actually tracing to get a good actual
performance status of clusters

Till now I am able to deduce followings :
1.I think we need to provide tracing functions where the `rgw` is
communicating with the librados, (particularly the librgw where the
communication is actually happening), because http request and response not
to be considered for tracing because that depends on clients internet speed.
2.In librgw the functions like this here

and
its corresponding overloading methods and also the this function here

and
its corresponding overloaded functions.
3.I see that pools are ultimately used to enter the crush algorithm for
writing data, so I think the ceation of pools should also be taken into
account while tracing,(creation of pool should be main span and these
functions

should
be its child span).


Functionality of buckets like that of this

do not require tracing beacuse they are http requests.

Any kind of guidance will be of great help.

Thank You.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: log_latency_fn slow operation

2020-03-07 Thread XuYun
I finally figured out this problem: swap memory was assigned to OSD processes 
for some reasons (vm.swappiness is already set to 0) which decreased the 
performance of KV. I restarted OSDs and switch swap off. Now the warning seems 
disappeared from OSD logs.

> 2020年3月4日 上午11:08,徐蕴  写道:
> 
> Hi,
> 
> Our cluster (14.2.6) has sporadic slow ops warnings since upgrading from 
> Jewel 1 month ago. Today I checked the OSD log files and found out a lot of 
> entries like:
> 
> ceph-osd.5.log:2020-03-04 10:33:31.592 7f18ca41f700  0 
> bluestore(/var/lib/ceph/osd/ceph-5) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.16871s, txc = 0x55e33ae41b80
> ceph-osd.5.log:2020-03-04 10:33:31.592 7f18ca41f700  0 
> bluestore(/var/lib/ceph/osd/ceph-5) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.15158s, txc = 0x55e3639b3340
> ceph-osd.5.log:2020-03-04 10:33:31.592 7f18ca41f700  0 
> bluestore(/var/lib/ceph/osd/ceph-5) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 6.77361s, txc = 0x55e3379cc840
> ceph-osd.5.log:2020-03-04 10:33:52.666 7f18ca41f700  0 
> bluestore(/var/lib/ceph/osd/ceph-5) log_latency_fn slow operation observed 
> for _txc_committed_kv, latency = 5.42519s, txc = 0x55e33722d600
> 
> or 
> /var/log/kolla/ceph/ceph-osd.7.log:2020-03-04 00:41:31.110 7f3dc0bc8700  0 
> bluestore(/var/lib/ceph/osd/ceph-7) log_latency slow operation observed for 
> submit_transact, latency = 8.1279s
> /var/log/kolla/ceph/ceph-osd.7.log:2020-03-04 00:41:31.110 7f3dd1bea700  0 
> bluestore(/var/lib/ceph/osd/ceph-7) log_latency slow operation observed for 
> kv_final, latency = 7.88786s
> /var/log/kolla/ceph/ceph-osd.7.log:2020-03-04 02:21:35.180 7f3dd1bea700  0 
> bluestore(/var/lib/ceph/osd/ceph-7) log_latency slow operation observed for 
> kv_final, latency = 6.06171s
> /var/log/kolla/ceph/ceph-osd.7.log:2020-03-04 05:31:30.298 7f3dc1bca700  0 
> bluestore(/var/lib/ceph/osd/ceph-7) log_latency slow operation observed for 
> submit_transact, latency = 5.34228s
> 
> The cluster setup is: SATA SSD (as DB) + SATA HDD 1:3.
> Any suggest how to debug this problem? Thank you!
> 
> 
> br,
> Xu Yun
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io