Hello, On Wed, 4 Jun 2014 23:46:33 +0800 Indra Pramana wrote:
> Hi Christian, > > In addition to my previous email, I realised that if I use dd with 4M > block size, I can get higher speed. > > root@Ubuntu-12043-64bit:/data# dd bs=4M count=128 if=/dev/zero of=test4 > conv=fdatasync oflag=direct > 128+0 records in > 128+0 records out > 536870912 bytes (537 MB) copied, 5.68378 s, 94.5 MB/s > > compared to: > > root@Ubuntu-12043-64bit:/data# dd bs=1M count=512 if=/dev/zero of=test8 > conv=fdatasync oflag=direct > 512+0 records in > 512+0 records out > 536870912 bytes (537 MB) copied, 8.91133 s, 60.2 MB/s > That's what I told you. An even bigger impact than saw here. > But still, the difference is still very big. With 4M block size, I can > get 400 MB/s average I/O speed (max 1,000 MB/s) using rados bench, but > only 90 MB/s average using dd on guest VM. I am wondering if there are > any "throttling" settings which prevent the guest VM to get the full I/O > speed the Ceph cluster provides. > Not really, no. However despite the identical block size now, you are still using 2 different tools and thus comparing apples to oranges. rados bench by default starts 16 threads, doesn't have to deal with any inefficiencies of the VM layers and neither with a filesystem. The dd on the other hand is in the VM, writes to a filesystem and most of all is single threaded. If I run a dd I get about half the speed of rados bench, running 2 in parallel on different VMs gets things to 80%, etc. > With regards to the VM user space, kernel space that you mentioned, can > you elaborate more on what do you mean by that? We are using CloudStack > and KVM hypervisor, using libvirt to connect to Ceph RBD. > So probably userspace RBD, I don't really know Cloudstack though. What I was suggesting is mapping and then mounting a (new) RBD image to a host (kernelspace), formatting it with the same FS type as your VM and then run the dd on it. Not a perfect match due to kernel versus user space, but a lot closer than bench versus dd. Christian > Looking forward to your reply, thank you. > > Cheers. > > > > On Wed, Jun 4, 2014 at 10:36 PM, Indra Pramana <in...@sg.or.id> wrote: > > > Hi Christian, > > > > Good day to you, and thank you for your reply. > > > > Just now I managed to identify 3 more OSDs which were slow and needed > > to be trimmed. Here is a longer (1 minute) result of rados bench after > > the trimming: > > > > http://pastebin.com/YFTbLyHA > > > > ==== > > Total time run: 69.441936 > > Total writes made: 3773 > > Write size: 4096000 > > Bandwidth (MB/sec): 212.239 > > > > Stddev Bandwidth: 247.672 > > Max bandwidth (MB/sec): 921.875 > > Min bandwidth (MB/sec): 0 > > Average Latency: 0.58602 > > Stddev Latency: 2.39341 > > Max latency: 32.1121 > > Min latency: 0.04847 > > ==== > > > > When I run this for 60 seconds, I noted some slow requests message > > when I monitor using ceph -w, near the end of the 60-second period. > > > > I have verified that all OSDs have I/O speed of > 220 MB/s after I > > trimmed the remaining slow ones just now. I noted that some SSDs are > > having 250 MB/s of I/O speed when I take it out of cluster, but then > > drop to 150 MB/s -ish after I put back into the cluster. > > > > Could it be due to the latency? You mentioned that average latency of > > 0.5 is pretty horrible. How can I find what contributes to the latency > > and how to fix the problem? Really at loss now. :( > > > > Looking forward to your reply, thank you. > > > > Cheers. > > > > > > > > > > On Mon, Jun 2, 2014 at 4:56 PM, Christian Balzer <ch...@gol.com> wrote: > > > >> > >> Hello, > >> > >> On Mon, 2 Jun 2014 16:15:22 +0800 Indra Pramana wrote: > >> > >> > Dear all, > >> > > >> > I have managed to identify some slow OSDs and journals and have > >> > since replaced them. RADOS benchmark of the whole cluster is now > >> > fast, much improved from last time, showing the cluster can go up > >> > to 700+ MB/s. > >> > > >> > ===== > >> > Maintaining 16 concurrent writes of 4194304 bytes for up to 10 > >> > seconds or 0 objects > >> > Object prefix: benchmark_data_hv-kvm-01_6931 > >> > sec Cur ops started finished avg MB/s cur MB/s last lat > >> > avg lat 0 0 0 0 0 0 - > >> 0 > >> > 1 16 214 198 791.387 792 0.260687 > >> > 0.074689 2 16 275 259 517.721 244 0.079697 > >> > 0.0861397 3 16 317 301 401.174 168 > >> > 0.209022 0.115348 4 16 317 301 300.902 > >> > 0 - 0.115348 5 16 356 340 271.924 > >> > 78 0.040032 0.172452 6 16 389 373 248.604 > >> > 132 0.038983 0.221213 7 16 411 395 > >> > 225.662 88 0.048462 0.211686 8 16 441 > >> > 425 212.454 120 0.048722 0.237671 9 16 > >> > 474 458 203.513 132 0.041285 0.226825 10 > >> > 16 504 488 195.161 120 0.041899 0.224044 > >> > 11 16 505 489 177.784 4 0.622238 > >> > 0.224858 12 16 505 489 162.97 0 > >> > - 0.224858 Total time run: 12.142654 Total writes > >> > made: 505 Write size: 4194304 > >> > Bandwidth (MB/sec): 166.356 > >> > > >> > Stddev Bandwidth: 208.41 > >> > Max bandwidth (MB/sec): 792 > >> > Min bandwidth (MB/sec): 0 > >> > Average Latency: 0.384178 > >> > Stddev Latency: 1.10504 > >> > Max latency: 9.64224 > >> > Min latency: 0.031679 > >> > ===== > >> > > >> This might be better than the last result, but it still shows the same > >> massive variance in latency and a pretty horrible average latency. > >> > >> Also you want to run this test for a lot longer, looking at the > >> bandwidth progression it seems to drop over time. > >> I'd expect the sustained bandwidth over a minute or so be below > >> 100MB/s. > >> > >> > >> > However, dd test result on guest VM is still slow. > >> > > >> > ===== > >> > root@test1# dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync > >> > oflag=direct > >> > 256+0 records in > >> > 256+0 records out > >> > 268435456 bytes (268 MB) copied, 17.1829 s, 15.6 MB/s > >> > ===== > >> > > >> You're kinda comparing apples to oranges here. > >> Firstly, the block size isn't the same, running a rbd bench with 1MB > >> blocks > >> shows about a 25% decrease in bandwidth. > >> > >> Secondly, is the VM user space, kernel space, what FS, etc. > >> Mounting a RBD image formatted the same way in kernelspace on a host > >> and doing the the dd test there would be a better comparison. > >> > >> Christian > >> > I thought I have fixed the problem by replacing all those bad OSDs > >> > and journals but apparently it doesn't resolve the problem. > >> > > >> > Is there any throttling settings which prevents the guest VMs to > >> > get the I/O write speed that it's entitled to? > >> > > >> > Looking forward to your reply, thank you. > >> > > >> > Cheers. > >> > > >> > > >> > > >> > > >> > On Tue, Apr 29, 2014 at 8:54 PM, Christian Balzer <ch...@gol.com> > >> wrote: > >> > > >> > > On Thu, 24 Apr 2014 13:51:49 +0800 Indra Pramana wrote: > >> > > > >> > > > Hi Christian, > >> > > > > >> > > > Good day to you, and thank you for your reply. > >> > > > > >> > > > On Wed, Apr 23, 2014 at 11:41 PM, Christian Balzer > >> > > > <ch...@gol.com> > >> > > wrote: > >> > > > > >> > > > > > > > Using 32 concurrent writes, result is below. The speed > >> really > >> > > > > > > > fluctuates. > >> > > > > > > > > >> > > > > > > > Total time run: 64.31704964.317049 > >> > > > > > > > Total writes made: 1095 > >> > > > > > > > Write size: 4194304 > >> > > > > > > > Bandwidth (MB/sec): 68.100 > >> > > > > > > > > >> > > > > > > > Stddev Bandwidth: 44.6773 > >> > > > > > > > Max bandwidth (MB/sec): 184 > >> > > > > > > > Min bandwidth (MB/sec): 0 > >> > > > > > > > Average Latency: 1.87761 > >> > > > > > > > Stddev Latency: 1.90906 > >> > > > > > > > Max latency: 9.99347 > >> > > > > > > > Min latency: 0.075849 > >> > > > > > > > > >> > > > > > > That is really weird, it should get faster, not slower. > >> > > > > > > ^o^ I assume you've run this a number of times? > >> > > > > > > > >> > > > > > > Also my apologies, the default is 16 threads, not 1, but > >> > > > > > > that still isn't enough to get my cluster to full speed: > >> > > > > > > --- > >> > > > > > > Bandwidth (MB/sec): 349.044 > >> > > > > > > > >> > > > > > > Stddev Bandwidth: 107.582 > >> > > > > > > Max bandwidth (MB/sec): 408 > >> > > > > > > --- > >> > > > > > > at 64 threads it will ramp up from a slow start to: > >> > > > > > > --- > >> > > > > > > Bandwidth (MB/sec): 406.967 > >> > > > > > > > >> > > > > > > Stddev Bandwidth: 114.015 > >> > > > > > > Max bandwidth (MB/sec): 452 > >> > > > > > > --- > >> > > > > > > > >> > > > > > > But what stands out is your latency. I don't have a 10GBE > >> > > > > > > network to compare, but my Infiniband based cluster (going > >> > > > > > > through at least one switch) gives me values like this: > >> > > > > > > --- > >> > > > > > > Average Latency: 0.335519 > >> > > > > > > Stddev Latency: 0.177663 > >> > > > > > > Max latency: 1.37517 > >> > > > > > > Min latency: 0.1017 > >> > > > > > > --- > >> > > > > > > > >> > > > > > > Of course that latency is not just the network. > >> > > > > > > > >> > > > > > > >> > > > > > What else can contribute to this latency? Storage node load, > >> disk > >> > > > > > speed, anything else? > >> > > > > > > >> > > > > That and the network itself are pretty much it, you should > >> > > > > know once you've run those test with atop or iostat on the > >> > > > > storage nodes. > >> > > > > > >> > > > > > > >> > > > > > > I would suggest running atop (gives you more information > >> > > > > > > at > >> one > >> > > > > > > glance) or "iostat -x 3" on all your storage nodes during > >> these > >> > > > > > > tests to identify any node or OSD that is overloaded in > >> > > > > > > some way. > >> > > > > > > > >> > > > > > > >> > > > > > Will try. > >> > > > > > > >> > > > > Do that and let us know about the results. > >> > > > > > >> > > > > >> > > > I have done some tests using iostat and noted some OSDs on a > >> > > > particular storage node going up to the 100% limit when I run > >> > > > the rados bench test. > >> > > > > >> > > Dumping lots of text will make people skip over your mails, you > >> > > need > >> to > >> > > summarize and preferably understand yourself what these numbers > >> > > mean. > >> > > > >> > > The iostat output is not too conclusive, as the numbers when > >> > > reaching 100% utilization are not particular impressive. > >> > > The fact that it happens though should make you look for anything > >> > > different with these OSDs, from smartctl checks to PG > >> > > distribution, as in "ceph pg dump" and then tallying up each PG. > >> > > Also look at "ceph osd tree" and see if those OSDs or node have a > >> > > higher weight than others. > >> > > > >> > > The atop line indicates that sdb was being read at a rate of > >> > > 100MB/s and assuming that your benchmark was more or less the > >> > > only thing running at that time this would mean something very > >> > > odd is going on, as all the other OSDs were have no significant > >> > > reads going on and all were being written at about the same speed. > >> > > > >> > > Christian > >> > > > >> > > > ==== > >> > > > avg-cpu: %user %nice %system %iowait %steal %idle > >> > > > 1.09 0.00 0.92 21.74 0.00 76.25 > >> > > > > >> > > > Device: rrqm/s wrqm/s r/s w/s rkB/s > >> > > > wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util > >> > > > sda 0.00 0.00 4.33 42.00 73.33 > >> > > > 6980.00 304.46 0.29 6.22 0.00 6.86 1.50 6.93 > >> > > > sdb 0.00 0.00 0.00 17.67 0.00 > >> > > > 6344.00 718.19 59.64 854.26 0.00 854.26 56.60 *100.00* > >> > > > sdc 0.00 0.00 12.33 59.33 70.67 > >> > > > 18882.33 528.92 36.54 509.80 64.76 602.31 10.51 75.33 > >> > > > sdd 0.00 0.00 3.33 54.33 24.00 > >> > > > 15249.17 529.71 1.29 22.45 3.20 23.63 1.64 9.47 > >> > > > sde 0.00 0.33 0.00 0.67 0.00 > >> > > > 4.00 12.00 0.30 450.00 0.00 450.00 450.00 30.00 > >> > > > > >> > > > avg-cpu: %user %nice %system %iowait %steal %idle > >> > > > 1.38 0.00 1.13 7.75 0.00 89.74 > >> > > > > >> > > > Device: rrqm/s wrqm/s r/s w/s rkB/s > >> > > > wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util > >> > > > sda 0.00 0.00 5.00 69.00 30.67 > >> > > > 19408.50 525.38 4.29 58.02 0.53 62.18 2.00 14.80 > >> > > > sdb 0.00 0.00 7.00 63.33 41.33 > >> > > > 20911.50 595.82 13.09 826.96 88.57 908.57 5.48 38.53 > >> > > > sdc 0.00 0.00 2.67 30.00 17.33 > >> > > > 6945.33 426.29 0.21 6.53 0.50 7.07 1.59 5.20 > >> > > > sdd 0.00 0.00 2.67 58.67 16.00 > >> > > > 20661.33 674.26 4.89 79.54 41.00 81.30 2.70 16.53 > >> > > > sde 0.00 0.00 0.00 1.67 0.00 > >> > > > 6.67 8.00 0.01 3.20 0.00 3.20 1.60 0.27 > >> > > > > >> > > > avg-cpu: %user %nice %system %iowait %steal %idle > >> > > > 0.97 0.00 0.55 6.73 0.00 91.75 > >> > > > > >> > > > Device: rrqm/s wrqm/s r/s w/s rkB/s > >> > > > wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util > >> > > > sda 0.00 0.00 1.67 15.33 21.33 > >> > > > 120.00 16.63 0.02 1.18 0.00 1.30 0.63 1.07 > >> > > > sdb 0.00 0.00 4.33 62.33 24.00 > >> > > > 13299.17 399.69 2.68 11.18 1.23 11.87 1.94 12.93 > >> > > > sdc 0.00 0.00 0.67 38.33 70.67 > >> > > > 7881.33 407.79 37.66 202.15 0.00 205.67 13.61 53.07 > >> > > > sdd 0.00 0.00 3.00 17.33 12.00 > >> > > > 166.00 17.51 0.05 2.89 3.11 2.85 0.98 2.00 > >> > > > sde 0.00 0.00 0.00 0.00 0.00 > >> > > > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > >> > > > > >> > > > avg-cpu: %user %nice %system %iowait %steal %idle > >> > > > 1.29 0.00 0.92 24.10 0.00 73.68 > >> > > > > >> > > > Device: rrqm/s wrqm/s r/s w/s rkB/s > >> > > > wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util > >> > > > sda 0.00 0.00 0.00 45.33 0.00 > >> > > > 4392.50 193.79 0.62 13.62 0.00 13.62 1.09 4.93 > >> > > > sdb 0.00 0.00 0.00 8.67 0.00 > >> > > > 3600.00 830.77 63.87 1605.54 0.00 1605.54 115.38 *100.00* > >> > > > sdc 0.00 0.33 8.67 42.67 37.33 > >> > > > 5672.33 222.45 16.88 908.78 1.38 1093.09 7.06 36.27 > >> > > > sdd 0.00 0.00 0.33 31.00 1.33 > >> > > > 629.83 40.29 0.06 1.91 0.00 1.94 0.94 2.93 > >> > > > sde 0.00 0.00 0.00 0.33 0.00 > >> > > > 1.33 8.00 0.12 368.00 0.00 368.00 368.00 12.27 > >> > > > > >> > > > avg-cpu: %user %nice %system %iowait %steal %idle > >> > > > 1.59 0.00 0.88 4.82 0.00 92.70 > >> > > > > >> > > > Device: rrqm/s wrqm/s r/s w/s rkB/s > >> > > > wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util > >> > > > sda 0.00 0.00 0.00 29.00 0.00 > >> > > > 235.00 16.21 0.06 1.98 0.00 1.98 0.97 2.80 > >> > > > sdb 0.00 6.00 4.33 114.67 38.67 > >> > > > 6422.33 108.59 9.19 513.19 265.23 522.56 2.08 24.80 > >> > > > sdc 0.00 0.00 0.00 20.67 0.00 > >> > > > 124.00 12.00 0.04 2.00 0.00 2.00 1.03 2.13 > >> > > > sdd 0.00 5.00 1.67 81.00 12.00 > >> > > > 546.17 13.50 0.10 1.21 0.80 1.22 0.39 3.20 > >> > > > sde 0.00 0.00 0.00 0.00 0.00 > >> > > > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > >> > > > ==== > >> > > > > >> > > > And the high utilisation is randomly affecting other OSDs as > >> > > > well within the same node, and not only affecting one > >> > > > particular OSD. > >> > > > > >> > > > atop result on the node: > >> > > > > >> > > > ==== > >> > > > ATOP - > >> > > > ceph-osd-07 > >> > > > 2014/04/24 > >> > > > 13:49:12 > >> > > > ------ > >> > > > 10s elapsed > >> > > > PRC | sys 1.77s | user 2.11s | > >> > > > | | #proc 164 | | #trun 2 | > >> > > > #tslpi 2817 | #tslpu 0 | | #zombie 0 | > >> > > > clones 4 | | | #exit 0 | > >> > > > CPU | sys 14% | user 20% | | irq > >> > > > 1% | | | idle 632% | wait 133% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.79GHz | avgscal 54% | > >> > > > cpu | sys 6% | user 7% | | irq > >> > > > 0% | | | idle 19% | cpu006 w 68% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 2.42GHz | avgscal 73% | > >> > > > cpu | sys 2% | user 3% | | irq > >> > > > 0% | | | idle 88% | cpu002 w 7% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.68GHz | avgscal 50% | > >> > > > cpu | sys 2% | user 2% | | irq > >> > > > 0% | | | idle 86% | cpu003 w 10% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.67GHz | avgscal 50% | > >> > > > cpu | sys 2% | user 2% | | irq > >> > > > 0% | | | idle 75% | cpu001 w 21% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.83GHz | avgscal 55% | > >> > > > cpu | sys 1% | user 2% | | irq > >> > > > 1% | | | idle 70% | cpu000 w 26% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.85GHz | avgscal 56% | > >> > > > cpu | sys 1% | user 2% | | irq > >> > > > 0% | | | idle 97% | cpu004 w 1% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.64GHz | avgscal 49% | > >> > > > cpu | sys 1% | user 1% | | irq > >> > > > 0% | | | idle 98% | cpu005 w 0% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.60GHz | avgscal 48% | > >> > > > cpu | sys 0% | user 1% | | irq > >> > > > 0% | | | idle 98% | cpu007 w 0% > >> > > > | | | steal 0% | guest 0% > >> > > > | | avgf 1.60GHz | avgscal 48% | > >> > > > CPL | avg1 1.12 | | avg5 0.90 > >> > > > | | avg15 0.72 | | > >> > > > | | csw 103682 | | intr 34330 > >> > > > | | | | numcpu 8 | > >> > > > MEM | tot 15.6G | | free 158.2M | cache > >> > > > 13.7G | | dirty 101.4M | buff 18.2M > >> > > > | | slab 574.6M | | > >> > > > | | | | | > >> > > > SWP | tot 518.0M | | free 489.6M | > >> > > > | | | | > >> > > > | | | | | > >> vmcom > >> > > > 5.2G | | vmlim 8.3G | > >> > > > PAG | scan 327450 | | | stall > >> > > > 0 | | | | > >> > > > | | | swin 0 | > >> > > > | | | swout 0 | > >> > > > DSK | sdb | | busy 90% | read > >> > > > 8115 | | write 695 | KiB/r 130 > >> > > > | | > >> KiB/w > >> > > > 194 | MBr/s 103.34 | | MBw/s 13.22 | avq > >> > > > 4.61 | | avio 1.01 ms | > >> > > > DSK | sdc | | busy 32% | read > >> > > > 23 | | write 431 | KiB/r 6 > >> > > > | | > >> KiB/w > >> > > > 318 | MBr/s 0.02 | | MBw/s 13.41 | avq > >> > > > 34.86 | | avio 6.95 ms | > >> > > > DSK | sda | | busy 32% | read > >> > > > 25 | | write 674 | KiB/r 6 > >> > > > | | > >> KiB/w > >> > > > 193 | MBr/s 0.02 | | MBw/s 12.76 | avq > >> > > > 41.00 | | avio 4.48 ms | > >> > > > DSK | sdd | | busy 7% | read > >> > > > 26 | | write 473 | KiB/r 7 > >> > > > | | > >> KiB/w > >> > > > 223 | MBr/s 0.02 | | MBw/s 10.31 | avq > >> > > > 14.29 | | avio 1.45 ms | > >> > > > DSK | sde | | busy 2% | read > >> > > > 0 | | write 5 | KiB/r 0 | > >> > > > | > >> KiB/w > >> > > > 5 | MBr/s 0.00 | | MBw/s 0.00 | avq 1.00 > >> > > > | | avio 44.8 ms | > >> > > > NET | transport | tcpi 21326 | | tcpo > >> > > > 27479 | udpi 0 | udpo 0 | tcpao 0 > >> > > > | | tcppo 2 | tcprs 3 | tcpie 0 | > >> > > > tcpor 0 | | udpnp 0 | udpip 0 | > >> > > > NET | network | | ipi 21326 | ipo > >> > > > 14340 | | ipfrw 0 | deliv 21326 | > >> > > > | | | | | > >> icmpi > >> > > > 0 | | icmpo 0 | > >> > > > NET | p2p2 ---- | pcki 12659 | | pcko > >> > > > 20931 | si 124 Mbps | | so 107 Mbps | coll > >> > > > 0 | mlti 0 | | erri 0 | erro 0 > >> > > > | | drpi 0 | drpo 0 | > >> > > > NET | p2p1 ---- | pcki 8565 | | pcko > >> > > > 6443 | si 106 Mbps | | so 7911 Kbps | coll > >> > > > 0 | mlti 0 | | erri 0 | erro 0 > >> > > > | | drpi 0 | drpo 0 | > >> > > > NET | lo ---- | pcki 108 | | pcko > >> > > > 108 | si 8 Kbps | | so 8 Kbps | coll > >> > > > 0 | mlti 0 | | erri 0 | erro 0 > >> > > > | | drpi 0 | drpo 0 | > >> > > > > >> > > > PID RUID EUID THR > >> > > > SYSCPU USRCPU VGROW RGROW > >> > > > RDDSK WRDSK ST EXC S > >> > > > CPUNR CPU CMD 1/1 > >> > > > 6881 root root 538 > >> > > > 0.74s 0.94s 0K 256K > >> > > > 1.0G 121.3M -- - S > >> > > > 3 17% ceph-osd > >> > > > 28708 root root 720 > >> > > > 0.30s 0.69s 512K -8K > >> > > > 160K 157.7M -- - S > >> > > > 3 10% ceph-osd > >> > > > 31569 root root 678 > >> > > > 0.21s 0.30s 512K -584K > >> > > > 156K 162.7M -- - S > >> > > > 0 5% ceph-osd > >> > > > 32095 root root 654 > >> > > > 0.14s 0.16s 0K 0K > >> > > > 60K 105.9M -- - S > >> > > > 0 3% ceph-osd > >> > > > 61 root root 1 > >> > > > 0.20s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 3 2% kswapd0 > >> > > > 10584 root root 1 > >> > > > 0.03s 0.02s 112K 112K > >> > > > 0K 0K -- - R > >> > > > 4 1% atop > >> > > > 11618 root root 1 > >> > > > 0.03s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 6 0% kworker/6:2 > >> > > > 10 root root 1 > >> > > > 0.02s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 0 0% rcu_sched > >> > > > 38 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 6 0% ksoftirqd/6 > >> > > > 1623 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 6 0% kworker/6:1H > >> > > > 1993 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 2 0% flush-8:48 > >> > > > 2031 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 2 0% flush-8:0 > >> > > > 2032 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 0 0% flush-8:16 > >> > > > 2033 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 2 0% flush-8:32 > >> > > > 5787 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 4K 0K -- - S > >> > > > 3 0% kworker/3:0 > >> > > > 27605 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 1 0% kworker/1:2 > >> > > > 27823 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 0 0% kworker/0:2 > >> > > > 32511 root root 1 > >> > > > 0.01s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 2 0% kworker/2:0 > >> > > > 1536 root root 1 > >> > > > 0.00s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 2 0% irqbalance > >> > > > 478 root root 1 > >> > > > 0.00s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 3 0% usb-storage > >> > > > 494 root root 1 > >> > > > 0.00s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 1 0% jbd2/sde1-8 > >> > > > 1550 root root 1 > >> > > > 0.00s 0.00s 0K 0K > >> > > > 400K 0K -- - S > >> > > > 1 0% xfsaild/sdb1 > >> > > > 1750 root root 1 > >> > > > 0.00s 0.00s 0K 0K > >> > > > 128K 0K -- - S > >> > > > 2 0% xfsaild/sdd1 > >> > > > 1994 root root 1 > >> > > > 0.00s 0.00s 0K 0K > >> > > > 0K 0K -- - S > >> > > > 1 0% flush-8:64 > >> > > > ==== > >> > > > > >> > > > I have tried to trim the SSD drives but the problem seems to > >> persist. > >> > > > Last time trimming the SSD drives can help to improve the > >> > > > performance. > >> > > > > >> > > > Any advice is greatly appreciated. > >> > > > > >> > > > Thank you. > >> > > > >> > > > >> > > -- > >> > > Christian Balzer Network/Systems Engineer > >> > > ch...@gol.com Global OnLine Japan/Fusion Communications > >> > > http://www.gol.com/ > >> > > > >> > >> > >> -- > >> Christian Balzer Network/Systems Engineer > >> ch...@gol.com Global OnLine Japan/Fusion Communications > >> http://www.gol.com/ > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > > > -- Christian Balzer Network/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com