yeah 3TB SAS disks *German Anders* Storage System Engineer Leader *Despegar* | IT Team *office* +54 11 4894 3500 x3408 *mobile* +54 911 3493 7262 *mail* gand...@despegar.com
2015-07-02 9:04 GMT-03:00 Jan Schermer <j...@schermer.cz>: > And those disks are spindles? > Looks like there’s simply too few of there…. > > Jan > > On 02 Jul 2015, at 13:49, German Anders <gand...@despegar.com> wrote: > > output from iostat: > > *CEPHOSD01:* > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sdc(ceph-0) 0.00 0.00 1.00 389.00 0.00 35.98 > 188.96 60.32 120.12 16.00 120.39 1.26 49.20 > sdd(ceph-1) 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > sdf(ceph-2) 0.00 1.00 6.00 521.00 0.02 60.72 > 236.05 143.10 309.75 484.00 307.74 1.90 100.00 > sdg(ceph-3) 0.00 0.00 11.00 535.00 0.04 42.41 > 159.22 139.25 279.72 394.18 277.37 1.83 100.00 > sdi(ceph-4) 0.00 1.00 4.00 560.00 0.02 54.87 > 199.32 125.96 187.07 562.00 184.39 1.65 93.20 > sdj(ceph-5) 0.00 0.00 0.00 566.00 0.00 61.41 > 222.19 109.13 169.62 0.00 169.62 1.53 86.40 > sdl(ceph-6) 0.00 0.00 8.00 0.00 0.09 0.00 > 23.00 0.12 12.00 12.00 0.00 2.50 2.00 > sdm(ceph-7) 0.00 0.00 2.00 481.00 0.01 44.59 > 189.12 116.64 241.41 268.00 241.30 2.05 99.20 > sdn(ceph-8) 0.00 0.00 1.00 0.00 0.00 0.00 > 8.00 0.01 8.00 8.00 0.00 8.00 0.80 > fioa 0.00 0.00 0.00 1016.00 0.00 19.09 > 38.47 0.00 0.06 0.00 0.06 0.00 0.00 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sdc(ceph-0) 0.00 1.00 10.00 278.00 0.04 26.07 > 185.69 60.82 257.97 309.60 256.12 2.83 81.60 > sdd(ceph-1) 0.00 0.00 2.00 0.00 0.02 0.00 > 20.00 0.02 10.00 10.00 0.00 10.00 2.00 > sdf(ceph-2) 0.00 1.00 6.00 579.00 0.02 54.16 > 189.68 142.78 246.55 328.67 245.70 1.71 100.00 > sdg(ceph-3) 0.00 0.00 10.00 75.00 0.05 5.32 > 129.41 4.94 185.08 11.20 208.27 4.05 34.40 > sdi(ceph-4) 0.00 0.00 19.00 147.00 0.09 12.61 > 156.63 17.88 230.89 114.32 245.96 3.37 56.00 > sdj(ceph-5) 0.00 1.00 2.00 629.00 0.01 43.66 > 141.72 143.00 223.35 426.00 222.71 1.58 100.00 > sdl(ceph-6) 0.00 0.00 10.00 0.00 0.04 0.00 > 8.00 0.16 18.40 18.40 0.00 5.60 5.60 > sdm(ceph-7) 0.00 0.00 11.00 4.00 0.05 0.01 > 8.00 0.48 35.20 25.82 61.00 14.13 21.20 > sdn(ceph-8) 0.00 0.00 9.00 0.00 0.07 0.00 > 15.11 0.07 8.00 8.00 0.00 4.89 4.40 > fioa 0.00 0.00 0.00 6415.00 0.00 125.81 > 40.16 0.00 0.14 0.00 0.14 0.00 0.00 > > *CEPHOSD02:* > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sdc1(ceph-9) 0.00 0.00 13.00 0.00 0.11 0.00 > 16.62 0.17 13.23 13.23 0.00 4.92 6.40 > sdd1(ceph-10) 0.00 0.00 15.00 0.00 0.13 0.00 > 18.13 0.26 17.33 17.33 0.00 1.87 2.80 > sdf1(ceph-11) 0.00 0.00 22.00 650.00 0.11 51.75 > 158.04 143.27 212.07 308.55 208.81 1.49 100.00 > sdg1(ceph-12) 0.00 0.00 12.00 282.00 0.05 54.60 > 380.68 13.16 120.52 352.00 110.67 2.91 85.60 > sdi1(ceph-13) 0.00 0.00 1.00 0.00 0.00 0.00 > 8.00 0.01 8.00 8.00 0.00 8.00 0.80 > sdj1(ceph-14) 0.00 0.00 20.00 0.00 0.08 0.00 > 8.00 0.26 12.80 12.80 0.00 3.60 7.20 > sdl1(ceph-15) 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > sdm1(ceph-16) 0.00 0.00 20.00 424.00 0.11 32.20 > 149.05 89.69 235.30 243.00 234.93 2.14 95.20 > sdn1(ceph-17) 0.00 0.00 5.00 411.00 0.02 45.47 > 223.94 98.32 182.28 1057.60 171.63 2.40 100.00 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sdc1(ceph-9) 0.00 0.00 26.00 383.00 0.11 34.32 > 172.44 86.92 258.64 297.08 256.03 2.29 93.60 > sdd1(ceph-10) 0.00 0.00 8.00 31.00 0.09 1.86 > 101.95 0.84 178.15 94.00 199.87 6.46 25.20 > sdf1(ceph-11) 0.00 1.00 5.00 409.00 0.05 48.34 > 239.34 90.94 219.43 383.20 217.43 2.34 96.80 > sdg1(ceph-12) 0.00 0.00 0.00 238.00 0.00 1.64 > 14.12 58.34 143.60 0.00 143.60 1.83 43.60 > sdi1(ceph-13) 0.00 0.00 11.00 0.00 0.05 0.00 > 10.18 0.16 14.18 14.18 0.00 5.09 5.60 > sdj1(ceph-14) 0.00 0.00 1.00 0.00 0.00 0.00 > 8.00 0.02 16.00 16.00 0.00 16.00 1.60 > sdl1(ceph-15) 0.00 0.00 1.00 0.00 0.03 0.00 > 64.00 0.01 12.00 12.00 0.00 12.00 1.20 > sdm1(ceph-16) 0.00 1.00 4.00 587.00 0.03 50.09 > 173.69 143.32 244.97 296.00 244.62 1.69 100.00 > sdn1(ceph-17) 0.00 0.00 0.00 375.00 0.00 23.68 > 129.34 69.76 182.51 0.00 182.51 2.47 92.80 > > The other OSD server had pretty much the same load. > > The config of the OSD's is the following: > > - 2x Intel Xeon E5-2609 v2 @ 2.50GHz (4C) > - 128G RAM > - 2x 120G SSD Intel SSDSC2BB12 (RAID-1) for OS > - 2x 10GbE ADPT DP > - Journals are configured to run on RAMDISK (TMPFS), but in the first OSD > serv we've the journals going on to a FusionIO (/dev/fioa) ADPT with batt. > > CRUSH algorithm is the following: > > # begin crush map > tunable choose_local_tries 0 > tunable choose_local_fallback_tries 0 > tunable choose_total_tries 50 > tunable chooseleaf_descend_once 1 > > # devices > device 0 osd.0 > device 1 osd.1 > device 2 osd.2 > device 3 osd.3 > device 4 osd.4 > device 5 osd.5 > device 6 osd.6 > device 7 osd.7 > device 8 osd.8 > device 9 osd.9 > device 10 osd.10 > device 11 osd.11 > device 12 osd.12 > device 13 osd.13 > device 14 osd.14 > device 15 osd.15 > device 16 osd.16 > device 17 osd.17 > device 18 osd.18 > device 19 osd.19 > device 20 osd.20 > device 21 osd.21 > device 22 osd.22 > device 23 osd.23 > device 24 osd.24 > device 25 osd.25 > device 26 osd.26 > device 27 osd.27 > device 28 osd.28 > device 29 osd.29 > device 30 osd.30 > device 31 osd.31 > device 32 osd.32 > device 33 osd.33 > device 34 osd.34 > device 35 osd.35 > > # types > type 0 osd > type 1 host > type 2 chassis > type 3 rack > type 4 row > type 5 pdu > type 6 pod > type 7 room > type 8 datacenter > type 9 region > type 10 root > > # buckets > host cephosd03 { > id -4 # do not change unnecessarily > # weight 24.570 > alg straw > hash 0 # rjenkins1 > item osd.18 weight 2.730 > item osd.19 weight 2.730 > item osd.20 weight 2.730 > item osd.21 weight 2.730 > item osd.22 weight 2.730 > item osd.23 weight 2.730 > item osd.24 weight 2.730 > item osd.25 weight 2.730 > item osd.26 weight 2.730 > } > host cephosd04 { > id -5 # do not change unnecessarily > # weight 24.570 > alg straw > hash 0 # rjenkins1 > item osd.27 weight 2.730 > item osd.28 weight 2.730 > item osd.29 weight 2.730 > item osd.30 weight 2.730 > item osd.31 weight 2.730 > item osd.32 weight 2.730 > item osd.33 weight 2.730 > item osd.34 weight 2.730 > item osd.35 weight 2.730 > } > root default { > id -1 # do not change unnecessarily > # weight 49.140 > alg straw > hash 0 # rjenkins1 > item cephosd03 weight 24.570 > item cephosd04 weight 24.570 > } > host cephosd01 { > id -2 # do not change unnecessarily > # weight 24.570 > alg straw > hash 0 # rjenkins1 > item osd.0 weight 2.730 > item osd.1 weight 2.730 > item osd.2 weight 2.730 > item osd.3 weight 2.730 > item osd.4 weight 2.730 > item osd.5 weight 2.730 > item osd.6 weight 2.730 > item osd.7 weight 2.730 > item osd.8 weight 2.730 > } > host cephosd02 { > id -3 # do not change unnecessarily > # weight 24.570 > alg straw > hash 0 # rjenkins1 > item osd.9 weight 2.730 > item osd.10 weight 2.730 > item osd.11 weight 2.730 > item osd.12 weight 2.730 > item osd.13 weight 2.730 > item osd.14 weight 2.730 > item osd.15 weight 2.730 > item osd.16 weight 2.730 > item osd.17 weight 2.730 > } > root fusionio { > id -6 # do not change unnecessarily > # weight 49.140 > alg straw > hash 0 # rjenkins1 > item cephosd01 weight 24.570 > item cephosd02 weight 24.570 > } > > # rules > rule replicated_ruleset { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > rule fusionio_ruleset { > ruleset 1 > type replicated > min_size 0 > max_size 10 > step take fusionio > step chooseleaf firstn 1 type host > step emit > step take default > step chooseleaf firstn -1 type host > step emit > } > > # end crush map > > > > > > *German* > > 2015-07-02 8:15 GMT-03:00 Lionel Bouton <lionel+c...@bouton.name>: > >> On 07/02/15 12:48, German Anders wrote: >> > The idea is to cache rbd at a host level. Also could be possible to >> > cache at the osd level. We have high iowait and we need to lower it a >> > bit, since we are getting the max from our sas disks 100-110 iops per >> > disk (3TB osd's), any advice? Flashcache? >> >> It's hard to suggest anything without knowing more about your setup. Are >> your I/O mostly reads or writes? Reads: can you add enough RAM on your >> guests or on your OSD to cache your working set? Writes: do you use SSD >> for journals already? >> >> Lionel >> > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com