Apologies for replying to myself, I thought I'd add a bit more information.  We 
don't think the ceph cluster is the issue, but maybe something on the clients 
(bad configuration setting?  Bug in our older version of ceph-client?).  I've 
attached our CrushMap and OSD tree, as well.  Neither /var/log/messages nor 
dmesg show anything interesting.

Below is typical of what I'll see.  This doesn't happen every time the system 
is active, but if I perform several tests during the day I will certainly see 
this.  During the tests, we'll also often find service times pretty high:

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda               0.00    46.50    0.00   17.20     0.00   509.60    29.63     
1.74  101.19   1.33   2.28
rbd2              0.00     0.00    0.00    0.40     0.00    37.40    93.50     
1.01   36.00 2500.00 100.00  <-----
rbd3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
1.00    0.00   0.00 100.00
rbd5              0.00     0.00    0.00    0.20     0.00     5.00    25.00     
0.00   18.00  18.00   0.36
rbd6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
2.00    0.00   0.00 100.00
rbd7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     
2.00    0.00   0.00 100.00
rbd8              0.00     0.00    0.00    0.10     0.00     5.00    50.00     
0.00   20.00  20.00   0.20
rbd10             0.00     0.00    0.00    0.00     0.00     0.00     0.00     
1.00    0.00   0.00 100.00
rbd12             0.00     0.00    0.00    0.00     0.00     0.00     0.00     
1.00    0.00   0.00 100.00
rbd13             0.00     0.00    0.70    0.70    41.10    18.20    42.36     
0.03   24.86  24.86   3.48
rbd14             0.00     0.00    0.00    0.50     0.00    30.60    61.20     
1.02   48.00 2000.00 100.00  <-----
rbd16             0.00     0.00    0.00    0.00     0.00     0.00     0.00     
1.00    0.00   0.00 100.00
rbd18             0.00     0.00    0.40    1.10     0.80    41.20    28.00     
1.04   29.33 666.67 100.00
rbd19             0.00     0.00    0.00    0.00     0.00     0.00     0.00     
1.00    0.00   0.00 100.00
rbd21             0.00     0.00    0.40    0.80     2.00    28.80    25.67     
2.04   31.67 833.33 100.00
rbd22             0.00     0.00    0.00    0.60     0.00    30.60    51.00     
0.02   32.67  32.67   1.96
rbd23             0.00     0.00    0.00    0.00     0.00     0.00     0.00     
1.00    0.00   0.00 100.00
rbd24             0.00     0.00    0.00    0.00     0.00     0.00     0.00     
1.00    0.00   0.00 100.00

And from 'ceph -w' around that same time:

______________________2014-12-09 11:43:07.446952 osd.14 [WRN] 22 slow requests, 
4 included below; oldest blocked for > 43.561110 secs                           
                                                                           
2014-12-09 11:43:07.446958 osd.14 [WRN] slow request 30.322788 seconds old, 
received at 2014-12-09 11:42:37.124101: osd_op(client.8232.1:18380981 
rb.0.195f.238e1f29.000000000d62 [write 0~12288] 3.c7335bdd ondisk+write e499) 
v4 currently waiting for subops from 0,33                                       
                                                                                
                                          
2014-12-09 11:43:07.446966 osd.14 [WRN] slow request 30.321805 seconds old, 
received at 2014-12-09 11:42:37.125084: osd_op(client.8232.1:18380985 
rb.0.195f.238e1f29.000000000d62 [write 12288~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33                                 
                                                                                
                                           
2014-12-09 11:43:07.446972 osd.14 [WRN] slow request 30.321003 seconds old, 
received at 2014-12-09 11:42:37.125886: osd_op(client.8232.1:18380986 
rb.0.195f.238e1f29.000000000d62 [write 520192~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33                                 
                                                                                
                                          
2014-12-09 11:43:07.446977 osd.14 [WRN] slow request 30.320158 seconds old, 
received at 2014-12-09 11:42:37.126731: osd_op(client.8232.1:18380987 
rb.0.195f.238e1f29.000000000d62 [write 1028096~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33                                 
                                                                                
                                         
2014-12-09 11:43:08.447263 osd.14 [WRN] 26 slow requests, 4 included below; 
oldest blocked for > 44.561424 secs                                             
                                                         
2014-12-09 11:43:08.447271 osd.14 [WRN] slow request 30.995124 seconds old, 
received at 2014-12-09 11:42:37.452079: osd_op(client.8232.1:18381339 
rb.0.195f.238e1f29.000000000d62 [write 1536000~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33                                 
                                                                                
                                         
2014-12-09 11:43:08.447276 osd.14 [WRN] slow request 30.994609 seconds old, 
received at 2014-12-09 11:42:37.452594: osd_op(client.8232.1:18381340 
rb.0.195f.238e1f29.000000000d62 [write 2043904~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33                                 
                                                                                
                                         
2014-12-09 11:43:08.447281 osd.14 [WRN] slow request 30.993274 seconds old, 
received at 2014-12-09 11:42:37.453929: osd_op(client.8232.1:18381342 
rb.0.195f.238e1f29.000000000d62 [write 2551808~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33                                 
                                                                                
                                         
2014-12-09 11:43:08.447286 osd.14 [WRN] slow request 30.878378 seconds old, 
received at 2014-12-09 11:42:37.568825: osd_op(client.8232.1:18381520 
rb.0.195f.238e1f29.000000000d62 [write 3059712~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33                                 
                                                                                
                                         
2014-12-09 11:43:09.447485 osd.14 [WRN] 28 slow requests, 2 included below; 
oldest blocked for > 45.561633 secs
2014-12-09 11:43:09.447491 osd.14 [WRN] slow request 30.963411 seconds old, 
received at 2014-12-09 11:42:38.484001: osd_op(client.8232.1:18382150 
rb.0.195f.238e1f29.000000000d62 [write 3567616~507904] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:09.447498 osd.14 [WRN] slow request 30.855218 seconds old, 
received at 2014-12-09 11:42:38.592194: osd_op(client.8232.1:18382263 
rb.0.195f.238e1f29.000000000d62 [write 4075520~118784] 3.c7335bdd ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:10.447678 osd.14 [WRN] 34 slow requests, 6 included below; 
oldest blocked for > 46.561824 secs
2014-12-09 11:43:10.447685 osd.14 [WRN] slow request 30.379322 seconds old, 
received at 2014-12-09 11:42:40.068281: osd_op(client.8232.1:18383364 
rb.0.196e.238e1f29.000000000d67 [write 0~290816] 3.9208bbf0 ondisk+write e499) 
v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447691 osd.14 [WRN] slow request 30.319496 seconds old, 
received at 2014-12-09 11:42:40.128107: osd_op(client.8232.1:18383434 
rb.0.196e.238e1f29.000000000d67 [write 290816~507904] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447696 osd.14 [WRN] slow request 30.295299 seconds old, 
received at 2014-12-09 11:42:40.152304: osd_op(client.8232.1:18383463 
rb.0.196e.238e1f29.000000000d67 [write 798720~507904] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447701 osd.14 [WRN] slow request 30.294923 seconds old, 
received at 2014-12-09 11:42:40.152680: osd_op(client.8232.1:18383464 
rb.0.196e.238e1f29.000000000d67 [write 1306624~507904] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:10.447707 osd.14 [WRN] slow request 30.291513 seconds old, 
received at 2014-12-09 11:42:40.156090: osd_op(client.8232.1:18383471 
rb.0.196e.238e1f29.000000000d67 [write 1814528~507904] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:11.447889 osd.14 [WRN] 39 slow requests, 6 included below; 
oldest blocked for > 47.562034 secs
2014-12-09 11:43:11.447897 osd.14 [WRN] slow request 31.226020 seconds old, 
received at 2014-12-09 11:42:40.221793: osd_op(client.8232.1:18383506 
rb.0.196e.238e1f29.000000000d67 [write 2322432~507904] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:11.447901 osd.14 [WRN] slow request 31.191602 seconds old, 
received at 2014-12-09 11:42:40.256211: osd_op(client.8232.1:18383522 
rb.0.196e.238e1f29.000000000d67 [write 2830336~507904] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:11.447906 osd.14 [WRN] slow request 30.369915 seconds old, 
received at 2014-12-09 11:42:41.077898: osd_op(client.8229.1:43030061 
rb.0.1917.238e1f29.000000000d6e [write 0~307200] 3.785054cc ondisk+write e499) 
v4 currently waiting for subops from 0,33
2014-12-09 11:43:11.447910 osd.14 [WRN] slow request 30.369655 seconds old, 
received at 2014-12-09 11:42:41.078158: osd_op(client.8229.1:43030063 
rb.0.1917.238e1f29.000000000d6e [write 307200~507904] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:11.447916 osd.14 [WRN] slow request 30.369362 seconds old, 
received at 2014-12-09 11:42:41.078451: osd_op(client.8229.1:43030064 
rb.0.1917.238e1f29.000000000d6e [write 815104~507904] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448075 osd.14 [WRN] 44 slow requests, 6 included below; 
oldest blocked for > 48.562228 secs
2014-12-09 11:43:12.448082 osd.14 [WRN] slow request 31.369335 seconds old, 
received at 2014-12-09 11:42:41.078672: osd_op(client.8229.1:43030065 
rb.0.1917.238e1f29.000000000d6e [write 1323008~507904] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448087 osd.14 [WRN] slow request 31.358194 seconds old, 
received at 2014-12-09 11:42:41.089813: osd_op(client.8229.1:43030097 
rb.0.1917.238e1f29.000000000d6e [write 1830912~507904] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448093 osd.14 [WRN] slow request 31.357607 seconds old, 
received at 2014-12-09 11:42:41.090400: osd_op(client.8229.1:43030098 
rb.0.1917.238e1f29.000000000d6e [write 2338816~507904] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448097 osd.14 [WRN] slow request 31.357285 seconds old, 
received at 2014-12-09 11:42:41.090722: osd_op(client.8229.1:43030099 
rb.0.1917.238e1f29.000000000d6e [write 2846720~507904] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:12.448102 osd.14 [WRN] slow request 31.295790 seconds old, 
received at 2014-12-09 11:42:41.152217: osd_op(client.8232.1:18384206 
rb.0.196e.238e1f29.000000000d67 [write 3338240~507904] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:13.448356 osd.14 [WRN] 46 slow requests, 3 included below; 
oldest blocked for > 49.562511 secs
2014-12-09 11:43:13.448364 osd.14 [WRN] slow request 32.188093 seconds old, 
received at 2014-12-09 11:42:41.260197: osd_op(client.8232.1:18384295 
rb.0.196e.238e1f29.000000000d67 [write 3846144~348160] 3.9208bbf0 ondisk+write 
e499) v4 currently waiting for subops from 33,40
2014-12-09 11:43:13.448369 osd.14 [WRN] slow request 32.138680 seconds old, 
received at 2014-12-09 11:42:41.309610: osd_op(client.8229.1:43030315 
rb.0.1917.238e1f29.000000000d6e [write 3354624~507904] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33
2014-12-09 11:43:13.448374 osd.14 [WRN] slow request 32.090464 seconds old, 
received at 2014-12-09 11:42:41.357826: osd_op(client.8229.1:43030374 
rb.0.1917.238e1f29.000000000d6e [write 3862528~331776] 3.785054cc ondisk+write 
e499) v4 currently waiting for subops from 0,33





__________________
From: Sanders, Bill
Sent: Tuesday, December 09, 2014 9:19 AM
To: Rodrigo Severo
Cc: ceph-users@lists.ceph.com
Subject: RE: [ceph-users] Monitors repeatedly calling for new elections

Thanks for the response.  I did forget to mention that NTP is setup and does 
appear to be running (just double checked).

Is this good enough resolution?

$ for node in $nodes; do ssh tvsa${node} sudo date --rfc-3339=ns; done
2014-12-09 09:15:39.404292557-08:00
2014-12-09 09:15:39.521762397-08:00
2014-12-09 09:15:39.641491188-08:00
2014-12-09 09:15:39.761937524-08:00
2014-12-09 09:15:39.911416676-08:00
2014-12-09 09:15:40.029777457-08:00

Bill
________________________________________
From: Rodrigo Severo [rodr...@fabricadeideias.com]
Sent: Tuesday, December 09, 2014 4:02 AM
To: Sanders, Bill
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Monitors repeatedly calling for new elections

On Mon, Dec 8, 2014 at 5:23 PM, Sanders, Bill <bill.sand...@teradata.com> wrote:

> Under activity, we'll get monitors going into election cycles repeatedly,
> OSD's being "wrongly marked down", as well as slow requests "osd.11
> 39.7.48.6:6833/21938 failed (3 reports from 1 peers after 52.914693 >= grace
> 20.000000)" .  During this, ceph -w shows the cluster essentially idle.
> None of the network, disks, or cpu's ever appear to max out.  It also
> doesn't appear to be the same OSD's, MON's, or node causing the problem.
> Top reports all 128 GB RAM (negligible swap) in use on the storage nodes.
> Only Ceph is running on the storage nodes.


I'm really new to Ceph but my first bet is that your computers aren't
clock synchronized. Are all of them with working ntpds?


Regards,

Rodrigo Severo
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23
device 24 osd.24
device 25 osd.25
device 26 osd.26
device 27 osd.27
device 28 osd.28
device 29 osd.29
device 30 osd.30
device 31 osd.31
device 32 osd.32
device 33 osd.33
device 34 osd.34
device 35 osd.35
device 36 osd.36
device 37 osd.37
device 38 osd.38
device 39 osd.39
device 40 osd.40
device 41 osd.41
device 42 osd.42
device 43 osd.43
device 44 osd.44
device 45 osd.45
device 46 osd.46
device 47 osd.47

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host tvsaq1 {
        id -2           # do not change unnecessarily
        # weight 41.400
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 3.450
        item osd.1 weight 3.450
        item osd.2 weight 3.450
        item osd.3 weight 3.450
        item osd.4 weight 3.450
        item osd.5 weight 3.450
        item osd.6 weight 3.450
        item osd.7 weight 3.450
        item osd.8 weight 3.450
        item osd.9 weight 3.450
        item osd.10 weight 3.450
        item osd.11 weight 3.450
}
host tvsaq2 {
        id -3           # do not change unnecessarily
        # weight 41.400
        alg straw
        hash 0  # rjenkins1
        item osd.12 weight 3.450
        item osd.13 weight 3.450
        item osd.14 weight 3.450
        item osd.15 weight 3.450
        item osd.16 weight 3.450
        item osd.17 weight 3.450
        item osd.18 weight 3.450
        item osd.19 weight 3.450
        item osd.20 weight 3.450
        item osd.21 weight 3.450
        item osd.22 weight 3.450
        item osd.23 weight 3.450
}
host tvsar1 {
        id -4           # do not change unnecessarily
        # weight 41.400
        alg straw
        hash 0  # rjenkins1
        item osd.24 weight 3.450
        item osd.25 weight 3.450
        item osd.26 weight 3.450
        item osd.27 weight 3.450
        item osd.28 weight 3.450
        item osd.29 weight 3.450
        item osd.30 weight 3.450
        item osd.31 weight 3.450
        item osd.32 weight 3.450
        item osd.33 weight 3.450
        item osd.34 weight 3.450
        item osd.35 weight 3.450
}
host tvsar2 {
        id -5           # do not change unnecessarily
        # weight 41.400
        alg straw
        hash 0  # rjenkins1
        item osd.36 weight 3.450
        item osd.37 weight 3.450
        item osd.38 weight 3.450
        item osd.39 weight 3.450
        item osd.40 weight 3.450
        item osd.41 weight 3.450
        item osd.42 weight 3.450
        item osd.43 weight 3.450
        item osd.44 weight 3.450
        item osd.45 weight 3.450
        item osd.46 weight 3.450
        item osd.47 weight 3.450
}
root default {
        id -1           # do not change unnecessarily
        # weight 165.600
        alg straw
        hash 0  # rjenkins1
        item tvsaq1 weight 41.400
        item tvsaq2 weight 41.400
        item tvsar1 weight 41.400
        item tvsar2 weight 41.400
}

# rules
rule replicated_ruleset {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map
# id    weight  type name       up/down reweight
-1      165.6   root default
-2      41.4            host tvsaq1
0       3.45                    osd.0   up      1
1       3.45                    osd.1   up      1
2       3.45                    osd.2   up      1
3       3.45                    osd.3   up      1
4       3.45                    osd.4   up      1
5       3.45                    osd.5   up      1
6       3.45                    osd.6   up      1
7       3.45                    osd.7   up      1
8       3.45                    osd.8   up      1
9       3.45                    osd.9   up      1
10      3.45                    osd.10  up      1
11      3.45                    osd.11  up      1
-3      41.4            host tvsaq2
12      3.45                    osd.12  up      1
13      3.45                    osd.13  up      1
14      3.45                    osd.14  up      1
15      3.45                    osd.15  up      1
16      3.45                    osd.16  up      1
17      3.45                    osd.17  up      1
18      3.45                    osd.18  up      1
19      3.45                    osd.19  up      1
20      3.45                    osd.20  up      1
21      3.45                    osd.21  up      1
22      3.45                    osd.22  up      1
23      3.45                    osd.23  up      1
-4      41.4            host tvsar1
24      3.45                    osd.24  up      1
25      3.45                    osd.25  up      1
26      3.45                    osd.26  up      1
27      3.45                    osd.27  up      1
28      3.45                    osd.28  up      1
29      3.45                    osd.29  up      1
30      3.45                    osd.30  up      1
31      3.45                    osd.31  up      1
32      3.45                    osd.32  up      1
33      3.45                    osd.33  up      1
34      3.45                    osd.34  up      1
35      3.45                    osd.35  up      1
-5      41.4            host tvsar2
36      3.45                    osd.36  up      1
37      3.45                    osd.37  up      1
38      3.45                    osd.38  up      1
39      3.45                    osd.39  up      1
40      3.45                    osd.40  up      1
41      3.45                    osd.41  up      1
42      3.45                    osd.42  up      1
43      3.45                    osd.43  up      1
44      3.45                    osd.44  up      1
45      3.45                    osd.45  up      1
46      3.45                    osd.46  up      1
47      3.45                    osd.47  up      1

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to