Thanks Nick, seems ceph has big performance gap on all ssd setup. Software latency can be a bottleneck.
https://ceph.com/planet/the-ceph-and-tcmalloc-performance-story/ http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S303E_Zhang.pdf http://events.linuxfoundation.org/sites/events/files/slides/optimizing_ceph_flash.pdf Build with jemalloc and try again... 2016-02-12 20:57 GMT+08:00 Nick Fisk <n...@fisk.me.uk>: > I will do my best to answer, but some of the questions are starting to > stretch the limit of my knowledge > > > -----Original Message----- > > From: Huan Zhang [mailto:huan.zhang...@gmail.com] > > Sent: 12 February 2016 12:15 > > To: Nick Fisk <n...@fisk.me.uk> > > Cc: Irek Fasikhov <malm...@gmail.com>; ceph-users <ceph- > > us...@ceph.com> > > Subject: Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue? > > > > My enviroment: > > 32 cores Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz > > 10GiB NICS > > 4 osds/host > > > > My client is database(mysql) direct/sync write per transaction, a little > bit > > sensitive to io latency(sync/direct). > > Ok, yes, write latency is important here if your DB's will be doing lots > of small inserts/updates > > > I used sata disk for osd backends, get ~100 iops/4k/1 iodepth, ~10ms io > > latency , similar to one sata disk iops (fio direct=1 sync=1 bs=4k). > > > > To improve the mysql write performance, use ssd to instead, since ssd > > latency is over 100 times to sata, > > But the result is sad to me. > > Yes, there is an inherent performance cap in software defined storage, > mainly due to the fact you are swapping a SAS cable for networking+code. > You will never get raw SSD performance for low queue depth because of this. > Although I hope that at some point in the future Ceph should be able to hit > about 1000iops with replication. > > > > > There are two things still strange to me. > > 1.fio the journal partition, ~77us latency, why filestore-> > journal_latency: > > ~1.1ms? > > This is most likely due to Ceph not just doing a straight single write. > There is also other processing likely happening as well. I'm sure someone a > bit more knowledgeable, could probably elaborate a bit more. > > > fio --filename=/dev/sda2 --direct=1 --sync=1 --rw=write --bs=4k -- > > numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting -- > > name=journal-test > > > > lat (usec): min=43, max=1503, avg=77.75, stdev=17.42 > > > > 2. 1.1ms journal_latency is far better than sata(5-10ms) i used before, > > why ceph end latency is not improved(ssd ~7ms, sata ~10ms)? > > The journal write is just a small part of the write process. Ie check > crush map, send replica request...and lots more > > > 2ms seems make sense to me. is there a way to calculate the total > latency, > > like journal_latency+...=total latency? > > > > Possibly, but I couldn't even attempt answer this. If you find out, please > let me know as I would also find this very useful :-) > > One thing you can do is turn the debug logging right up and then in the > logs you can see the steps that each IO takes and how long it took. > > Which brings me on to my next point, turn all logging down to 0/0 ( > http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/) . > At 4k IO's the overhead of logging is significant. > > Other things to try are setting the kernel parameter idle=poll, at the > risk of increased power usage and seeing if you can stop your CPU's going > into power saving states. > > If anybody else has any other good ideas, please step in. > > Nick > > > > > > 2016-02-12 19:28 GMT+08:00 Nick Fisk <n...@fisk.me.uk>: > > Write latency of 1.1ms is ok, but not brilliant. What IO size are you > testing > > with? > > > > Don't forget if you have a journal latency of 1.1ms, excluding all other > latency > > introduced by networking, replication and processing in the OSD code, you > > won't get more than about 900 iops. All the things I mention all add > latency > > and so you often see 2-3ms of latency for a replicated write. This in > turn will > > limit you to 300-500 iops for directio writes. > > > > The fact you are seeing around 200 could be about right depending on IO > > size, CPU speed and network speed. > > > > Also what is your end use/requirement? This may or may not matter. > > > > Nick > > > > > -----Original Message----- > > > From: Huan Zhang [mailto:huan.zhang...@gmail.com] > > > Sent: 12 February 2016 11:00 > > > To: Nick Fisk <n...@fisk.me.uk> > > > Cc: Irek Fasikhov <malm...@gmail.com>; ceph-users <ceph- > > > us...@ceph.com> > > > Subject: Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue? > > > > > > thanks nick, > > > filestore-> journal_latency: ~1.1ms > > > 214.0/180611 > > > 0.0011848669239415096 > > > > > > seems ssd write is ok, any other idea is highly appreciated! > > > > > > "filestore": { > > > "journal_queue_max_ops": 300, > > > "journal_queue_ops": 0, > > > "journal_ops": 180611, > > > "journal_queue_max_bytes": 33554432, > > > "journal_queue_bytes": 0, > > > "journal_bytes": 32637888155, > > > "journal_latency": { > > > "avgcount": 180611, > > > "sum": 214.095788552 > > > }, > > > "journal_wr": 176801, > > > "journal_wr_bytes": { > > > "avgcount": 176801, > > > "sum": 33122885632 > > > }, > > > "journal_full": 0, > > > "committing": 0, > > > "commitcycle": 14648, > > > "commitcycle_interval": { > > > "avgcount": 14648, > > > "sum": 73299.187956076 > > > }, > > > > > > > > > 2016-02-12 18:04 GMT+08:00 Nick Fisk <n...@fisk.me.uk>: > > > > > > > > > > -----Original Message----- > > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On > > Behalf > > > Of > > > > Huan Zhang > > > > Sent: 12 February 2016 10:00 > > > > To: Irek Fasikhov <malm...@gmail.com> > > > > Cc: ceph-users <ceph-us...@ceph.com> > > > > Subject: Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue? > > > > > > > > "op_w_latency": > > > > "avgcount": 42991, > > > > "sum": 402.804741329 > > > > > > > > 402.0/42991 > > > > 0.009350794352306296 > > > > > > > > ~9ms latency, that means this ssd not suitable for journal device? > > > > > > I believe that counter includes lots of other operations in the OSD > including > > > the journal write. If you want pure journal stats, I would under the > > Filestore- > > > >journal_latency counter > > > > > > > > > > > > > > > "osd": { > > > > "op_wip": 0, > > > > "op": 58683, > > > > "op_in_bytes": 7309042294, > > > > "op_out_bytes": 507137488, > > > > "op_latency": { > > > > "avgcount": 58683, > > > > "sum": 484.302231121 > > > > }, > > > > "op_process_latency": { > > > > "avgcount": 58683, > > > > "sum": 323.332046552 > > > > }, > > > > "op_r": 902, > > > > "op_r_out_bytes": 507137488, > > > > "op_r_latency": { > > > > "avgcount": 902, > > > > "sum": 0.793759596 > > > > }, > > > > "op_r_process_latency": { > > > > "avgcount": 902, > > > > "sum": 0.619918138 > > > > }, > > > > "op_w": 42991, > > > > "op_w_in_bytes": 7092142080, > > > > "op_w_rlat": { > > > > "avgcount": 38738, > > > > "sum": 334.643717526 > > > > }, > > > > "op_w_latency": { > > > > "avgcount": 42991, > > > > "sum": 402.804741329 > > > > }, > > > > "op_w_process_latency": { > > > > "avgcount": 42991, > > > > "sum": 260.489972416 > > > > }, > > > > ... > > > > > > > > > > > > 2016-02-12 15:56 GMT+08:00 Irek Fasikhov <malm...@gmail.com>: > > > > Hi. > > > > You need to read : > https://www.sebastien-han.fr/blog/2014/10/10/ceph- > > > > how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > > > > > > > > > > > > С уважением, Фасихов Ирек Нургаязович > > > > Моб.: +79229045757 > > > > > > > > 2016-02-12 10:41 GMT+03:00 Huan Zhang <huan.zhang...@gmail.com>: > > > > Hi, > > > > > > > > ceph VERY SLOW with 24 osd(SAMSUNG ssd). > > > > fio /dev/rbd0 iodepth=1 direct=1 IOPS only ~200 > > > > fio /dev/rbd0 iodepth=32 direct=1 IOPS only ~3000 > > > > > > > > But test single ssd deive with fio: > > > > fio iodepth=1 direct=1 IOPS ~15000 > > > > fio iodepth=32 direct=1 IOPS ~30000 > > > > > > > > Why ceph SO SLOW? Could you give me some help? > > > > Appreciated! > > > > > > > > > > > > My Enviroment: > > > > [root@szcrh-controller ~]# ceph -s > > > > cluster eb26a8b9-e937-4e56-a273-7166ffaa832e > > > > health HEALTH_WARN > > > > 1 mons down, quorum 0,1,2,3,4 > > > ceph01,ceph02,ceph03,ceph04,ceph05 > > > > monmap e1: 6 mons at {ceph01= > > > > > > > > > 10.10.204.144:6789/0,ceph02=10.10.204.145:6789/0,ceph03=10.10.204.146:67 > > > > > > > > > 89/0,ceph04=10.10.204.147:6789/0,ceph05=10.10.204.148:6789/0,ceph06=0.0 > > > > .0.0:0/5 > > > > } > > > > election epoch 6, quorum 0,1,2,3,4 > > > > ceph01,ceph02,ceph03,ceph04,ceph05 > > > > osdmap e114: 24 osds: 24 up, 24 in > > > > flags sortbitwise > > > > pgmap v2213: 1864 pgs, 3 pools, 49181 MB data, 4485 objects > > > > 144 GB used, 42638 GB / 42782 GB avail > > > > 1864 active+clean > > > > > > > > [root@ceph03 ~]# lsscsi > > > > [0:0:6:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sda > > > > [0:0:7:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdb > > > > [0:0:8:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdc > > > > [0:0:9:0] disk ATA SAMSUNG MZ7KM1T9 003Q /dev/sdd > > > > > > > > _______________________________________________ > > > > ceph-users mailing list > > > > ceph-users@lists.ceph.com > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com