Hi Chris,

Yes, all pools have size=3 and min_size=2. The clients are only RBD.

I did a shutdown to make a firmware upgrade.

Kr.
Luis

On 15/07/16 09:05, Christian Balzer wrote:
Hello,

On Fri, 15 Jul 2016 00:28:37 +0200 Luis Ramirez wrote:

Hi,

      I've a cluster with 3 MON nodes and 5 OSD nodes. If i make a reboot
of 1 of the osd nodes i get slow request waiting for active.

2016-07-14 19:39:07.996942 osd.33 10.255.128.32:6824/7404 888 : cluster
[WRN] slow request 60.627789 seconds old, received at 2016-07-14
19:38:07.369009: osd_op(client.593241.0:3283308 3.d8215fdb (undecoded)
ondisk+write+known_if_redirected e11409) currently waiting for active
2016-07-14 19:39:07.996950 osd.33 10.255.128.32:6824/7404 889 : cluster
[WRN] slow request 60.623972 seconds old, received at 2016-07-14
19:38:07.372826: osd_op(client.593241.0:3283309 3.d8215fdb (undecoded)
ondisk+write+known_if_redirected e11411) currently waiting for active
2016-07-14 19:39:07.996958 osd.33 10.255.128.32:6824/7404 890 : cluster
[WRN] slow request 240.631544 seconds old, received at 2016-07-14
19:35:07.365255: osd_op(client.593241.0:3283269 3.d8215fdb (undecoded)
ondisk+write+known_if_redirected e11384) currently waiting for active
2016-07-14 19:39:07.996965 osd.33 10.255.128.32:6824/7404 891 : cluster
[WRN] slow request 30.625102 seconds old, received at 2016-07-14
19:38:37.371697: osd_op(client.593241.0:3283315 3.d8215fdb (undecoded)
ondisk+write+known_if_redirected e11433) currently waiting for active
2016-07-14 19:39:12.997985 osd.33 10.255.128.32:6824/7404 893 : cluster
[WRN] 83 slow requests, 4 included below; oldest blocked for >
395.971587 secs

And the service will not recover until the node restart sucesffully.
Anyone could provide me any light about what i'm doing wrong?

First of all, do all your pools have a size=3 and a min_size=2?

What kind of clients does your cluster have (RBD images, CephFS, RGW?)

How do you reboot that OSD node?

Normally when you stop OSDs via their initscript or systemd, they will be
removed gracefully and re-peering of clients will start right away before
any lengthy timeouts are reached.

See this example from my test cluster, the output is from "rados bench" and
I did stop all OSDs (via "service ceph stop osd") on one node from second
58.
Note that shutting down all 4 OSDs on that node takes about 1-2 seconds
each.

Then we get about 10 seconds of things sorting themselves out and the
things continue normally.
No slow request warnings.
---
    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     55      31      1022       991   72.0578       116    2.6782   1.74562
     56      31      1041      1010    72.128        76  0.955143   1.73901
     57      31      1066      1035   72.6166       100  0.972699   1.72883
     58      31      1084      1053   72.6058        72  0.549388   1.72471
     59      31      1100      1069   72.4597        64   0.75425   1.72927
    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     60      31      1118      1087   72.4519        72    2.2628   1.72937
     61      31      1131      1100   72.1164        52   2.92359    1.7259
     62      31      1141      1110   71.5983        40   1.68941    1.7285
     63      31      1149      1118   70.9697        32   1.30379   1.73533
     64      31      1153      1122   70.1108        16   3.05046   1.73568
     65      31      1156      1125   69.2167        12   2.82071   1.73744
     66      31      1158      1127   68.2892         8   3.01163   1.73965
     67      31      1158      1127     67.27         0         -   1.73965
     68      31      1159      1128   66.3396         2   5.11638   1.74264
     69      31      1161      1130   65.4941         8   8.64385   1.75326
     70      31      1161      1130   64.5585         0         -   1.75326
     71      31      1161      1130   63.6492         0         -   1.75326
     72      31      1161      1130   62.7652         0         -   1.75326
     73      31      1163      1132    62.015         2   13.7002   1.77289
     74      31      1163      1132   61.1769         0         -   1.77289
     75      31      1163      1132   60.3613         0         -   1.77289
     76      31      1163      1132   59.5671         0         -   1.77289
     77      31      1163      1132   58.7935         0         -   1.77289
     78      31      1163      1132   58.0397         0         -   1.77289
     79      31      1163      1132   57.3051         0         -   1.77289
    sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     80      31      1163      1132   56.5888         0         -   1.77289
     81      31      1163      1132   55.8901         0         -   1.77289
     82      31      1163      1132   55.2086         0         -   1.77289
     83      31      1163      1132   54.5434         0         -   1.77289
     84      31      1163      1132   53.8941         0         -   1.77289
     85      31      1164      1133   53.3071  0.333333   22.5502   1.79123
     86      31      1170      1139   52.9663        24   21.7306   1.90575
     87      31      1174      1143   52.5414        16   26.7337   1.98175
     88      31      1184      1153   52.3988        40   1.92565   2.07644
     89      31      1189      1158   52.0347        20   1.12557   2.10756
     90      31      1201      1170   51.9898        48  0.767024    2.1907
     91      31      1214      1183   51.9898        52  0.652047   2.24676
     92      31      1227      1196   51.9898        52   28.9226   2.28787
     93      31      1240      1209   51.9898        52   32.7307   2.35555
     94      31      1261      1230   52.3302        84  0.482482   2.40575
     95      31      1283      1252   52.7054        88   1.31267   2.39677
     96      31      1300      1269   52.8647        68  0.796716   2.38455
---

Note that with another test via CephFS and a different "rados bench" I was
able to create some slow requests, but they cleared up very quickly and
definitely did not require any of the OSDs to be brought back up.

Christian

--
---------------------------------------------------------

Luis Ramírez Viejo <mailto:luis.rami...@opencloud.es>

<<attachment: luis_ramirez.vcf>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to