On 08/05/2014 08:42 AM, Mariusz Gronczewski wrote:
On Mon, 04 Aug 2014 15:32:50 -0500, Mark Nelson
<mark.nel...@inktank.com> wrote:
On 08/04/2014 03:28 PM, Chris Kitzmiller wrote:
On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote:
I got weird stalling during writes, sometimes I got same write speed
for few minutes and after some time it starts stalling with 0 MB/s for
minutes
I'm getting very similar behavior on my cluster. My writes start well but then just kinda
stop for a while and then bump along slowly until the bench finishes. I've got a thread
about it going here called "Ceph runs great then falters".
This kind of behaviour often results when the journal can write much
faster than the OSD data disks. Initially the journals will be able to
absorb the data and things will run along well, but eventually ceph will
need to stall writes if things get too out of sync. You may want to
take a look at what's happening on the data disks during your tests to
see if there's anything that looks suspect. Checking the admin socket
for dump_historic_ops might provide some clues as well.
Mark
I did check journals already, they are on same disk as data (separate
partition) and during stalls there is no traffic to both of them (like
8 iops on average with 0% io wait).
This may indicate that 1 OSD could be backing up with possibly most if
not all IOs waiting on it. The idea here is that because data placement
is deterministic, if 1 OSD is slow, over time just by random chance all
outstanding client operations will back up on it. Having more
concurrency gives you more wiggle room but may not ultimately solve it.
It's also possible that something else may be causing the OSDs to wait.
dump_historic_ops might help.
I'm 99.99% sure they are ok as they worked few months in cassandra
cluster before, and when I was doing some ceph rebalancing and
adding/removing OSDs they worked constantly with no stalls
If I look into logs of osds that have slow ops I get something like
that:
2014-08-05 15:31:33.461481 7fbff4fd3700 0 log [WRN] : slow request
30.147566 seconds old, received at 2014-08-05 15:31:03.313858:
osd_op(client.190830.0:176
benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304]
7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
2014-08-05 15:32:03.467775 7fbff4fd3700 0 log [WRN] : 1 slow requests, 1
included below; oldest blocked for > 60.153871 secs
2014-08-05 15:32:03.467794 7fbff4fd3700 0 log [WRN] : slow request
60.153871 seconds old, received at 2014-08-05 15:31:03.313858:
osd_op(client.190830.0:176
benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304]
7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
2014-08-05 15:33:03.481163 7fbff4fd3700 0 log [WRN] : 1 slow requests, 1
included below; oldest blocked for > 120.167272 secs
2014-08-05 15:33:03.481170 7fbff4fd3700 0 log [WRN] : slow request
120.167272 seconds old, received at 2014-08-05 15:31:03.313858:
osd_op(client.190830.0:176
benchmark_data_blade103.non.3dart.com_31565_object175 [write 0~4194304]
7.16da2754 ack+ondisk+write e864) v4 currently waiting for subops from 2,6
but when I look on osd.2 I only get cryptic messages like:
2014-08-05 14:39:34.708788 7ff5d36f4700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.24:6800/3540 pipe(0x54ada00 sd=157 :6800 s=2 pgs=26 cs=5 l=0
c=0x5409340).fault with nothing to send, going to standby
2014-08-05 14:39:35.594447 7ff5d0ecc700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.25:6800/3551 pipe(0x41fd500 sd=141 :60790 s=2 pgs=21 cs=5 l=0
c=0x51bd960).fault with nothing to send, going to standby
2014-08-05 14:39:37.594901 7ff5d40fe700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.25:6802/3709 pipe(0x54adc80 sd=149 :35693 s=2 pgs=24 cs=5 l=0
c=0x51bee00).fault with nothing to send, going to standby
2014-08-05 14:39:53.891172 7ff5d15d3700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.24:6802/3694 pipe(0x54a8500 sd=137 :60823 s=2 pgs=24 cs=5 l=0
c=0x5409080).fault with nothing to send, going to standby
2014-08-05 14:40:01.410307 7ff5d4905700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.23:6802/3689 pipe(0x41f9180 sd=133 :39288 s=2 pgs=28 cs=5 l=0
c=0x4956f60).fault with nothing to send, going to standby
2014-08-05 14:40:02.428402 7ff5d1ddb700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.23:6800/3535 pipe(0x41fee00 sd=145 :34278 s=2 pgs=23 cs=5 l=0
c=0x51bad60).fault with nothing to send, going to standby
2014-08-05 15:30:50.752795 7ff5d29e7700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.25:6800/3551 pipe(0x54ad780 sd=153 :6800 s=0 pgs=0 cs=0 l=0
c=0x51ba260).accept connect_seq 6 vs existing 5 state standby
2014-08-05 15:30:51.475488 7ff5d2cea700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.25:6802/3709 pipe(0x6002080 sd=158 :6800 s=0 pgs=0 cs=0 l=0
c=0x51b94a0).accept connect_seq 6 vs existing 5 state standby
2014-08-05 15:30:51.600661 7ff5d32f0700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.24:6802/3694 pipe(0x6001e00 sd=149 :6800 s=0 pgs=0 cs=0 l=0
c=0x4955280).accept connect_seq 6 vs existing 5 state standby
2014-08-05 15:30:51.901849 7ff5d18d6700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.23:6800/3535 pipe(0x6001180 sd=22 :6800 s=0 pgs=0 cs=0 l=0
c=0x4956720).accept connect_seq 6 vs existing 5 state standby
2014-08-05 15:30:54.529900 7ff5d0cca700 0 -- 10.100.245.22:6800/3540 >>
10.100.245.23:6802/3689 pipe(0x54ae400 sd=42 :6800 s=0 pgs=0 cs=0 l=0
c=0x51b91e0).accept connect_seq 6 vs existing 5 state standby
and similiar on osd.2:
2014-08-05 14:39:13.606907 7f34d2140700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.21:6802/3686 pipe(0x4c4a800 sd=183 :6800 s=2 pgs=32 cs=5 l=0
c=0x3848580).fault with nothing to send, going to standby
2014-08-05 14:39:34.709150 7f34d3958700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.22:6800/3540 pipe(0x39f9b80 sd=166 :43213 s=2 pgs=27 cs=5 l=0
c=0x38494a0).fault with nothing to send, going to standby
2014-08-05 14:39:38.693139 7f34cee0e700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.25:6800/3551 pipe(0x3291b80 sd=167 :51687 s=2 pgs=23 cs=5 l=0
c=0x384a680).fault with nothing to send, going to standby
2014-08-05 14:39:38.712965 7f34d2b4a700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.21:6800/3532 pipe(0x4c4ad00 sd=195 :6800 s=2 pgs=31 cs=5 l=0
c=0x3848160).fault with nothing to send, going to standby
2014-08-05 14:39:42.694641 7f34d4961700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.25:6802/3709 pipe(0x4c4aa80 sd=179 :6800 s=2 pgs=21 cs=5 l=0
c=0x3848b00).fault with nothing to send, going to standby
2014-08-05 14:39:42.712340 7f34d3b5a700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.22:6802/3694 pipe(0x39f9680 sd=33 :47495 s=2 pgs=31 cs=5 l=0
c=0x384ee00).fault with nothing to send, going to standby
2014-08-05 14:39:44.493155 7f34d3453700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.23:6800/3535 pipe(0x4c48c80 sd=202 :6800 s=2 pgs=29 cs=5 l=0
c=0x3848f20).fault with nothing to send, going to standby
2014-08-05 14:39:59.387037 7f34d4c64700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.23:6802/3689 pipe(0x4c49b80 sd=34 :6800 s=2 pgs=25 cs=5 l=0
c=0x3849600).fault with nothing to send, going to standby
2014-08-05 15:30:51.911827 7f34cec0c700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.22:6802/3694 pipe(0x39fc880 sd=116 :6800 s=0 pgs=0 cs=0 l=0
c=0x3b714a0).accept connect_seq 6 vs existing 5 state standby
2014-08-05 15:30:52.276484 7f34d2241700 0 -- 10.100.245.24:6800/3540 >>
10.100.245.22:6800/3540 pipe(0x39fc100 sd=118 :6800 s=0 pgs=0 cs=0 l=0
c=0x3b73b20).accept connect_seq 6 vs existing 5 state standby
and 'bad' osds change when I rerun benchmark so it's not like there is one or 2
bad nodes
I've checked for network or IO load on every node and they are just not doing
anything, no kernel errors, and those nodes worked fine under load when they
were us
I'm guessing the issue is probably going to be more subtle than that
unfortunately. At least based on prior issues, it seems like often
something is causing latency in some part of the system and when that
happens it can have very far-reaching effects.
--
Mariusz Gronczewski, Administrator
Efigence S. A.
ul. WoĊoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczew...@efigence.com
<mailto:mariusz.gronczew...@efigence.com>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com