Thanks Nick,
seems ceph has big performance gap on all ssd setup. Software latency can
be a bottleneck.

https://ceph.com/planet/the-ceph-and-tcmalloc-performance-story/
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S303E_Zhang.pdf
http://events.linuxfoundation.org/sites/events/files/slides/optimizing_ceph_flash.pdf

Build with jemalloc and try again...



2016-02-12 20:57 GMT+08:00 Nick Fisk <n...@fisk.me.uk>:

> I will do my best to answer, but some of the questions are starting to
> stretch the limit of my knowledge
>
> > -----Original Message-----
> > From: Huan Zhang [mailto:huan.zhang...@gmail.com]
> > Sent: 12 February 2016 12:15
> > To: Nick Fisk <n...@fisk.me.uk>
> > Cc: Irek Fasikhov <malm...@gmail.com>; ceph-users <ceph-
> > us...@ceph.com>
> > Subject: Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue?
> >
> > My enviroment:
> > 32 cores Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
> > 10GiB NICS
> > 4 osds/host
> >
> > My client is database(mysql) direct/sync write per transaction, a little
> bit
> > sensitive to io latency(sync/direct).
>
> Ok, yes, write latency is important here if your DB's will be doing lots
> of small inserts/updates
>
> > I used sata disk for osd backends, get  ~100 iops/4k/1 iodepth, ~10ms io
> > latency , similar to one sata disk iops (fio direct=1 sync=1 bs=4k).
> >
> > To improve the mysql write performance, use ssd to instead, since ssd
> > latency is over 100 times to sata,
> > But the result is sad to me.
>
> Yes, there is an inherent performance cap in software defined storage,
> mainly due to the fact you are swapping a SAS cable for networking+code.
> You will never get raw SSD performance for low queue depth because of this.
> Although I hope that at some point in the future Ceph should be able to hit
> about 1000iops with replication.
>
> >
> > There are two things still strange to me.
> > 1.fio the journal partition, ~77us latency, why filestore->
> journal_latency:
> > ~1.1ms?
>
> This is most likely due to Ceph not just doing a straight single write.
> There is also other processing likely happening as well. I'm sure someone a
> bit more knowledgeable, could probably elaborate a bit more.
>
> > fio --filename=/dev/sda2 --direct=1 --sync=1 --rw=write --bs=4k --
> > numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --
> > name=journal-test
> >
> > lat (usec): min=43, max=1503, avg=77.75, stdev=17.42
> >
> > 2. 1.1ms journal_latency is far better than sata(5-10ms) i used before,
> > why ceph end latency is not improved(ssd ~7ms, sata ~10ms)?
>
> The journal write is just a small part of the write process. Ie check
> crush map, send replica request...and lots more
>
> >      2ms seems make sense to me. is there a way to calculate the total
> latency,
> > like  journal_latency+...=total latency?
> >
>
> Possibly, but I couldn't even attempt answer this. If you find out, please
> let me know as I would also find this very useful :-)
>
> One thing you can do is turn the debug logging right up and then in the
> logs you can see the steps that each IO takes and how long it took.
>
> Which brings me on to my next point, turn all logging down to 0/0 (
> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/) .
> At 4k IO's the overhead of logging is significant.
>
> Other things to try are setting the kernel parameter idle=poll, at the
> risk of increased power usage and seeing if you can stop your CPU's going
> into power saving states.
>
> If anybody else has any other good ideas, please step in.
>
> Nick
>
>
> >
> > 2016-02-12 19:28 GMT+08:00 Nick Fisk <n...@fisk.me.uk>:
> > Write latency of 1.1ms is ok, but not brilliant. What IO size are you
> testing
> > with?
> >
> > Don't forget if you have a journal latency of 1.1ms, excluding all other
> latency
> > introduced by networking, replication and processing in the OSD code, you
> > won't get more than about 900 iops. All the things I mention all add
> latency
> > and so you often see 2-3ms of latency for a replicated write. This in
> turn will
> > limit you to 300-500 iops for directio writes.
> >
> > The fact you are seeing around 200 could be about right depending on IO
> > size, CPU speed and network speed.
> >
> > Also what is your end use/requirement? This may or may not matter.
> >
> > Nick
> >
> > > -----Original Message-----
> > > From: Huan Zhang [mailto:huan.zhang...@gmail.com]
> > > Sent: 12 February 2016 11:00
> > > To: Nick Fisk <n...@fisk.me.uk>
> > > Cc: Irek Fasikhov <malm...@gmail.com>; ceph-users <ceph-
> > > us...@ceph.com>
> > > Subject: Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue?
> > >
> > > thanks nick,
> > > filestore-> journal_latency: ~1.1ms
> > > 214.0/180611
> > > 0.0011848669239415096
> > >
> > > seems ssd write is ok, any other idea is highly appreciated!
> > >
> > >  "filestore": {
> > >         "journal_queue_max_ops": 300,
> > >         "journal_queue_ops": 0,
> > >         "journal_ops": 180611,
> > >         "journal_queue_max_bytes": 33554432,
> > >         "journal_queue_bytes": 0,
> > >         "journal_bytes": 32637888155,
> > >         "journal_latency": {
> > >             "avgcount": 180611,
> > >             "sum": 214.095788552
> > >         },
> > >         "journal_wr": 176801,
> > >         "journal_wr_bytes": {
> > >             "avgcount": 176801,
> > >             "sum": 33122885632
> > >         },
> > >         "journal_full": 0,
> > >         "committing": 0,
> > >         "commitcycle": 14648,
> > >         "commitcycle_interval": {
> > >             "avgcount": 14648,
> > >             "sum": 73299.187956076
> > >         },
> > >
> > >
> > > 2016-02-12 18:04 GMT+08:00 Nick Fisk <n...@fisk.me.uk>:
> > >
> > >
> > > > -----Original Message-----
> > > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> > Behalf
> > > Of
> > > > Huan Zhang
> > > > Sent: 12 February 2016 10:00
> > > > To: Irek Fasikhov <malm...@gmail.com>
> > > > Cc: ceph-users <ceph-us...@ceph.com>
> > > > Subject: Re: [ceph-users] ceph 9.2.0 SAMSUNG ssd performance issue?
> > > >
> > > > "op_w_latency":
> > > >      "avgcount": 42991,
> > > >       "sum": 402.804741329
> > > >
> > > > 402.0/42991
> > > > 0.009350794352306296
> > > >
> > > > ~9ms latency, that means this ssd not suitable for journal device?
> > >
> > > I believe that counter includes lots of other operations in the OSD
> including
> > > the journal write. If you want pure journal stats, I would under the
> > Filestore-
> > > >journal_latency counter
> > >
> > > >
> > > >
> > > >  "osd": {
> > > >         "op_wip": 0,
> > > >         "op": 58683,
> > > >         "op_in_bytes": 7309042294,
> > > >         "op_out_bytes": 507137488,
> > > >         "op_latency": {
> > > >             "avgcount": 58683,
> > > >             "sum": 484.302231121
> > > >         },
> > > >         "op_process_latency": {
> > > >             "avgcount": 58683,
> > > >             "sum": 323.332046552
> > > >         },
> > > >         "op_r": 902,
> > > >         "op_r_out_bytes": 507137488,
> > > >         "op_r_latency": {
> > > >             "avgcount": 902,
> > > >             "sum": 0.793759596
> > > >         },
> > > >         "op_r_process_latency": {
> > > >             "avgcount": 902,
> > > >             "sum": 0.619918138
> > > >         },
> > > >         "op_w": 42991,
> > > >         "op_w_in_bytes": 7092142080,
> > > >         "op_w_rlat": {
> > > >             "avgcount": 38738,
> > > >             "sum": 334.643717526
> > > >         },
> > > >         "op_w_latency": {
> > > >             "avgcount": 42991,
> > > >             "sum": 402.804741329
> > > >         },
> > > >         "op_w_process_latency": {
> > > >             "avgcount": 42991,
> > > >             "sum": 260.489972416
> > > >         },
> > > > ...
> > > >
> > > >
> > > > 2016-02-12 15:56 GMT+08:00 Irek Fasikhov <malm...@gmail.com>:
> > > > Hi.
> > > > You need to read :
> https://www.sebastien-han.fr/blog/2014/10/10/ceph-
> > > > how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> > > >
> > > >
> > > > С уважением, Фасихов Ирек Нургаязович
> > > > Моб.: +79229045757
> > > >
> > > > 2016-02-12 10:41 GMT+03:00 Huan Zhang <huan.zhang...@gmail.com>:
> > > > Hi,
> > > >
> > > > ceph VERY SLOW with 24 osd(SAMSUNG ssd).
> > > > fio /dev/rbd0 iodepth=1 direct=1   IOPS only ~200
> > > > fio /dev/rbd0 iodepth=32 direct=1 IOPS only ~3000
> > > >
> > > > But test single ssd deive with fio:
> > > > fio iodepth=1 direct=1   IOPS  ~15000
> > > > fio iodepth=32 direct=1 IOPS  ~30000
> > > >
> > > > Why ceph SO SLOW? Could you give me some help?
> > > > Appreciated!
> > > >
> > > >
> > > > My Enviroment:
> > > > [root@szcrh-controller ~]# ceph -s
> > > >     cluster eb26a8b9-e937-4e56-a273-7166ffaa832e
> > > >      health HEALTH_WARN
> > > >             1 mons down, quorum 0,1,2,3,4
> > > ceph01,ceph02,ceph03,ceph04,ceph05
> > > >      monmap e1: 6 mons at {ceph01=
> > > >
> > >
> > 10.10.204.144:6789/0,ceph02=10.10.204.145:6789/0,ceph03=10.10.204.146:67
> > > >
> > >
> > 89/0,ceph04=10.10.204.147:6789/0,ceph05=10.10.204.148:6789/0,ceph06=0.0
> > > > .0.0:0/5
> > > > }
> > > >             election epoch 6, quorum 0,1,2,3,4
> > > > ceph01,ceph02,ceph03,ceph04,ceph05
> > > >      osdmap e114: 24 osds: 24 up, 24 in
> > > >             flags sortbitwise
> > > >       pgmap v2213: 1864 pgs, 3 pools, 49181 MB data, 4485 objects
> > > >             144 GB used, 42638 GB / 42782 GB avail
> > > >                 1864 active+clean
> > > >
> > > > [root@ceph03 ~]# lsscsi
> > > > [0:0:6:0]    disk    ATA      SAMSUNG MZ7KM1T9 003Q  /dev/sda
> > > > [0:0:7:0]    disk    ATA      SAMSUNG MZ7KM1T9 003Q  /dev/sdb
> > > > [0:0:8:0]    disk    ATA      SAMSUNG MZ7KM1T9 003Q  /dev/sdc
> > > > [0:0:9:0]    disk    ATA      SAMSUNG MZ7KM1T9 003Q  /dev/sdd
> > > >
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > >
> >
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to