On 08/08/17 10:50 AM, David Turner wrote:
> Are you also seeing osds marking themselves down for a little bit and
> then coming back up?  There are 2 very likely problems
> causing/contributing to this.  The first is if you are using a lot of
> snapshots.  Deleting snapshots is a very expensive operation for your
> cluster and can cause a lot of slowness.  The second is PG subfolder
> splitting.  This will show as blocked requests and osds marking
> themselves down and coming back up a little later without any errors in
> the log.  I linked a previous thread where someone was having these
> problems where both causes were investigated.
> 
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg36923.html  

We are not seeing OSDs marking themselves down a little bit and coming
back as far as we can tell. We will do some more investigation in to this.

We are creating and deleting quite a few snapshots, is there anything we
can do to make this less expensive? We are going to attempt to create
less snapshots in our systems, but unfortunately we have to create a
fair number due to our use case.

Is slow snapshot deletion likely to cause a slow backlog of purged
snaps? In some cases we are seeing ~40k snaps still in cached_removed_snaps.

> If you have 0.94.9 or 10.2.5 or later, then you can split your PG
> subfolders sanely while your osds are temporarily turned off using the
> 'ceph-objectstore-tool apply-layout-settings'.  There are a lot of ways
> to skin the cat of snap trimming, but it depends greatly on your use case.

We are currently running 10.2.5, and are planning to update to 10.2.9 at
some point soon. Our clients are using the 4.9 kernel RBD driver (which
sort of forces us to keep our snapshot count down below 510), we are
currently testing the possibility of using the nbd-rbd driver as an
alternative.

> On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick <patrick.mcl...@sony.com
> <mailto:patrick.mcl...@sony.com>> wrote:
> 
>     High CPU utilization and inexplicably slow I/O requests
> 
>     We have been having similar performance issues across several ceph
>     clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK
>     for a while, but eventually performance worsens and becomes (at first
>     intermittently, but eventually continually) HEALTH_WARN due to slow I/O
>     request blocked for longer than 32 sec. These slow requests are
>     accompanied by "currently waiting for rw locks", but we have not found
>     any network issue that normally is responsible for this warning.
> 
>     Examining the individual slow OSDs from `ceph health detail` has been
>     unproductive; there don't seem to be any slow disks and if we stop the
>     OSD the problem just moves somewhere else.
> 
>     We also think this trends with increased number of RBDs on the clusters,
>     but not necessarily a ton of Ceph I/O. At the same time, user %CPU time
>     spikes up to 95-100%, at first frequently and then consistently,
>     simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz CPU
>     with 6 cores and 64GiB RAM per node.
> 
>     ceph1 ~ $ sudo ceph status
>         cluster XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
>          health HEALTH_WARN
>                 547 requests are blocked > 32 sec
>          monmap e1: 3 mons at
>     
> {cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0}
>                 election epoch 16, quorum 0,1,2
>     
> cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX
>          osdmap e577122: 72 osds: 68 up, 68 in
>                 flags sortbitwise,require_jewel_osds
>           pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects
>                 126 TB used, 368 TB / 494 TB avail
>                     4084 active+clean
>                       12 active+clean+scrubbing+deep
>       client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr
> 
>     ceph1 ~ $ vmstat 5 5
>     procs -----------memory---------- ---swap-- -----io---- -system--
>     ------cpu-----
>      r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>     id wa st
>     27  1      0 3112660 165544 36261692    0    0   472  1274    0    1 22
>     1 76  1  0
>     25  0      0 3126176 165544 36246508    0    0   858 12692 12122 110478
>     97  2  1  0  0
>     22  0      0 3114284 165544 36258136    0    0     1  6118 9586 118625
>     97  2  1  0  0
>     11  0      0 3096508 165544 36276244    0    0     8  6762 10047 188618
>     89  3  8  0  0
>     18  0      0 2990452 165544 36384048    0    0  1209 21170 11179 179878
>     85  4 11  0  0
> 
>     There is no apparent memory shortage, and none of the HDDs or SSDs show
>     consistently high utilization, slow service times, or any other form of
>     hardware saturation, other than user CPU utilization. Can CPU starvation
>     be responsible for "waiting for rw locks"?
> 
>     Our main pool (the one with all the data) currently has 1024 PGs,
>     leaving us room to add more PGs if needed, but we're concerned if we do
>     so that we'd consume even more CPU.
> 
>     We have moved to running Ceph + jemalloc instead of tcmalloc, and that
>     has helped with CPU utilization somewhat, but we still see occurences of
>     95-100% CPU with not terribly high Ceph workload.
> 
>     Any suggestions of what else to look at? We have a peculiar use case
>     where we have many RBDs but only about 1-5% of them are active at the
>     same time, and we're constantly making and expiring RBD snapshots. Could
>     this lead to aberrant performance? For instance, is it normal to have
>     ~40k snaps still in cached_removed_snaps?
> 
> 
> 
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to