Re: [ceph-users] CEPH cache layer. Very slow

Voloshanenko Igor Fri, 14 Aug 2015 11:16:41 -0700

72 osd, 60 hdd, 12 ssd
Primary workload - rbd, kvm

пятница, 14 августа 2015 г. пользователь Ben Hines написал:


> Nice to hear that you have no SSD failures yet in 10months.
>
> How many OSDs are you running, and what is your primary ceph workload?
> (RBD, rgw, etc?)
>
> -Ben
>
> On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович
> <me...@yuterra.ru <javascript:;>> wrote:
> > Hi!
> >
> >
> > Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for
> ceph
> > journals
> > and DC S3700 400Gb in the SSD pool: same hosts, separate root in
> crushmap.
> >
> > SSD pool are not yet in production, journаlling SSDs works under
> production
> > load
> > for 10 months. They're in good condition - no faults, no degradation.
> >
> > We specially take 200Gb SSD for journals to reduce costs, and also have a
> > higher
> > than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended
> > 1/3 to 1/6.
> >
> > So, as a conclusion - I'll recommend you to get a bigger budget and buy
> > durable
> > and fast SSDs for Ceph.
> >
> > Megov Igor
> > CIO, Yuterra
> >
> > ________________________________
> > От: ceph-users <ceph-users-boun...@lists.ceph.com <javascript:;>> от
> имени Voloshanenko
> > Igor <igor.voloshane...@gmail.com <javascript:;>>
> > Отправлено: 13 августа 2015 г. 15:54
> > Кому: Jan Schermer
> > Копия: ceph-users@lists.ceph.com <javascript:;>
> > Тема: Re: [ceph-users] CEPH cache layer. Very slow
> >
> > So, good, but price for 845 DC PRO 400 GB higher in about 2x times than
> > intel S3500 240G (((
> >
> > Any other models? (((
> >
> > 2015-08-13 15:45 GMT+03:00 Jan Schermer <j...@schermer.cz <javascript:;>
> >:
> >>
> >> I tested and can recommend the Samsung 845 DC PRO (make sure it is DC
> PRO
> >> and not just "PRO" or "DC EVO"!).
> >> Those were very cheap but are out of stock at the moment (here).
> >> Faster than Intels, cheaper, and slightly different technology (3D
> V-NAND)
> >> which IMO makes them superior without needing many tricks to do its job.
> >>
> >> Jan
> >>
> >> On 13 Aug 2015, at 14:40, Voloshanenko Igor <
> igor.voloshane...@gmail.com <javascript:;>>
> >> wrote:
> >>
> >> Tnx, Irek! Will try!
> >>
> >> but another question to all, which SSD good enough for CEPH now?
> >>
> >> I'm looking into S3500 240G (I have some S3500 120G which show great
> >> results. Around 8x times better than Samsung)
> >>
> >> Possible you can give advice about other vendors/models with same or
> below
> >> price level as S3500 240G?
> >>
> >> 2015-08-13 12:11 GMT+03:00 Irek Fasikhov <malm...@gmail.com
> <javascript:;>>:
> >>>
> >>> Hi, Igor.
> >>> Try to roll the patch here:
> >>>
> >>>
> http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov
> >>>
> >>> P.S. I am no longer tracks changes in this direction(kernel), because
> we
> >>> use already recommended SSD
> >>>
> >>> С уважением, Фасихов Ирек Нургаязович
> >>> Моб.: +79229045757
> >>>
> >>> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor
> >>> <igor.voloshane...@gmail.com <javascript:;>>:
> >>>>
> >>>> So, after testing SSD (i wipe 1 SSD, and used it for tests)
> >>>>
> >>>> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1
> >>>> --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based
> >>>> --gr[53/1800]
> >>>> ting --name=journal-test
> >>>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync,
> >>>> iodepth=1
> >>>> fio-2.1.3
> >>>> Starting 1 process
> >>>> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops]
> [eta
> >>>> 00m:00s]
> >>>> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13
> >>>> 10:46:42 2015
> >>>>   write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec
> >>>>     clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
> >>>>      lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08
> >>>>     clat percentiles (usec):
> >>>>      |  1.00th=[ 2704],  5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[
> >>>> 2928],
> >>>>      | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[
> >>>> 3408],
> >>>>      | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[
> >>>> 4016],
> >>>>      | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792],
> >>>> 99.95th=[10048],
> >>>>      | 99.99th=[14912]
> >>>>     bw (KB  /s): min= 1064, max= 1213, per=100.00%, avg=1150.07,
> >>>> stdev=34.31
> >>>>     lat (msec) : 4=94.99%, 10=4.96%, 20=0.05%
> >>>>   cpu          : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7
> >>>>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>>> >=64=0.0%
> >>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>>> >=64=0.0%
> >>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>>> >=64=0.0%
> >>>>      issued    : total=r=0/w=17243/d=0, short=r=0/w=0/d=0
> >>>>
> >>>> Run status group 0 (all jobs):
> >>>>   WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s,
> >>>> mint=60001msec, maxt=60001msec
> >>>>
> >>>> Disk stats (read/write):
> >>>>   sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576,
> >>>> util=99.30%
> >>>>
> >>>> So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s
> >>>>
> >>>> I try to change cache mode :
> >>>> echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type
> >>>> echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type
> >>>>
> >>>> no luck, still same shit results, also i found this article:
> >>>> https://lkml.org/lkml/2013/11/20/264 pointed to old very simple
> patch,
> >>>> which disable CMD_FLUSH
> >>>> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba
> >>>>
> >>>> Has everybody better ideas, how to improve this? (or disable CMD_FLUSH
> >>>> without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch
> >>>> because SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this
> >>>> exception was not included into libsata.c)
> >>>>
> >>>> 2015-08-12 19:17 GMT+03:00 Pieter Koorts <pieter.koo...@me.com
> <javascript:;>>:
> >>>>>
> >>>>> Hi Igor
> >>>>>
> >>>>> I suspect you have very much the same problem as me.
> >>>>>
> >>>>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html
> >>>>>
> >>>>> Basically Samsung drives (like many SATA SSD's) are very much hit and
> >>>>> miss so you will need to test them like described here to see if
> they are
> >>>>> any good.
> >>>>>
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
> >>>>>
> >>>>> To give you an idea my average performance went from 11MB/s (with
> >>>>> Samsung SSD) to 30MB/s (without any SSD) on write performance. This
> is a
> >>>>> very small cluster.
> >>>>>
> >>>>> Pieter
> >>>>>
> >>>>> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor
> >>>>> <igor.voloshane...@gmail.com <javascript:;>> wrote:
> >>>>>
> >>>>> Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5
> nodes,
> >>>>> 12 disks on each, 10 HDD, 2 SSD)
> >>>>>
> >>>>> Also we cover this with custom crushmap with 2 root leaf
> >>>>>
> >>>>> ID   WEIGHT  TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >>>>> -100 5.00000 root ssd
> >>>>> -102 1.00000     host ix-s2-ssd
> >>>>>    2 1.00000         osd.2               up  1.00000          1.00000
> >>>>>    9 1.00000         osd.9               up  1.00000          1.00000
> >>>>> -103 1.00000     host ix-s3-ssd
> >>>>>    3 1.00000         osd.3               up  1.00000          1.00000
> >>>>>    7 1.00000         osd.7               up  1.00000          1.00000
> >>>>> -104 1.00000     host ix-s5-ssd
> >>>>>    1 1.00000         osd.1               up  1.00000          1.00000
> >>>>>    6 1.00000         osd.6               up  1.00000          1.00000
> >>>>> -105 1.00000     host ix-s6-ssd
> >>>>>    4 1.00000         osd.4               up  1.00000          1.00000
> >>>>>    8 1.00000         osd.8               up  1.00000          1.00000
> >>>>> -106 1.00000     host ix-s7-ssd
> >>>>>    0 1.00000         osd.0               up  1.00000          1.00000
> >>>>>    5 1.00000         osd.5               up  1.00000          1.00000
> >>>>>   -1 5.00000 root platter
> >>>>>   -2 1.00000     host ix-s2-platter
> >>>>>   13 1.00000         osd.13              up  1.00000          1.00000
> >>>>>   17 1.00000         osd.17              up  1.00000          1.00000
> >>>>>   21 1.00000         osd.21              up  1.00000          1.00000
> >>>>>   27 1.00000         osd.27              up  1.00000          1.00000
> >>>>>   32 1.00000         osd.32              up  1.00000          1.00000
> >>>>>   37 1.00000         osd.37              up  1.00000          1.00000
> >>>>>   44 1.00000         osd.44              up  1.00000          1.00000
> >>>>>   48 1.00000         osd.48              up  1.00000          1.00000
> >>>>>   55 1.00000         osd.55              up  1.00000          1.00000
> >>>>>   59 1.00000         osd.59              up  1.00000          1.00000
> >>>>>   -3 1.00000     host ix-s3-platter
> >>>>>   14 1.00000         osd.14              up  1.00000          1.00000
> >>>>>   18 1.00000         osd.18              up  1.00000          1.00000
> >>>>>   23 1.00000         osd.23              up  1.00000          1.00000
> >>>>>   28 1.00000         osd.28              up  1.00000          1.00000
> >>>>>   33 1.00000         osd.33              up  1.00000          1.00000
> >>>>>   39 1.00000         osd.39              up  1.00000          1.00000
> >>>>>   43 1.00000         osd.43              up  1.00000          1.00000
> >>>>>   47 1.00000         osd.47              up  1.00000          1.00000
> >>>>>   54 1.00000         osd.54              up  1.00000          1.00000
> >>>>>   58 1.00000         osd.58              up  1.00000          1.00000
> >>>>>   -4 1.00000     host ix-s5-platter
> >>>>>   11 1.00000         osd.11              up  1.00000          1.00000
> >>>>>   16 1.00000         osd.16              up  1.00000          1.00000
> >>>>>   22 1.00000         osd.22              up  1.00000          1.00000
> >>>>>   26 1.00000         osd.26              up  1.00000          1.00000
> >>>>>   31 1.00000         osd.31              up  1.00000          1.00000
> >>>>>   36 1.00000         osd.36              up  1.00000          1.00000
> >>>>>   41 1.00000         osd.41              up  1.00000          1.00000
> >>>>>   46 1.00000         osd.46              up  1.00000          1.00000
> >>>>>   51 1.00000         osd.51              up  1.00000          1.00000
> >>>>>   56 1.00000         osd.56              up  1.00000          1.00000
> >>>>>   -5 1.00000     host ix-s6-platter
> >>>>>   12 1.00000         osd.12              up  1.00000          1.00000
> >>>>>   19 1.00000         osd.19              up  1.00000          1.00000
> >>>>>  24 1.00000         osd.24              up  1.00000          1.00000
> >>>>>   29 1.00000         osd.29              up  1.00000          1.00000
> >>>>>   34 1.00000         osd.34              up  1.00000          1.00000
> >>>>>   38 1.00000         osd.38              up  1.00000          1.00000
> >>>>>   42 1.00000         osd.42              up  1.00000          1.00000
> >>>>>   50 1.00000         osd.50              up  1.00000          1.00000
> >>>>>   53 1.00000         osd.53              up  1.00000          1.00000
> >>>>>   57 1.00000         osd.57              up  1.00000          1.00000
> >>>>>   -6 1.00000     host ix-s7-platter
> >>>>>   10 1.00000         osd.10              up  1.00000          1.00000
> >>>>>   15 1.00000         osd.15              up  1.00000          1.00000
> >>>>>   20 1.00000         osd.20              up  1.00000          1.00000
> >>>>>   25 1.00000         osd.25              up  1.00000          1.00000
> >>>>>   30 1.00000         osd.30              up  1.00000          1.00000
> >>>>>   35 1.00000         osd.35              up  1.00000          1.00000
> >>>>>   40 1.00000         osd.40              up  1.00000          1.00000
> >>>>>   45 1.00000         osd.45              up  1.00000          1.00000
> >>>>>   49 1.00000         osd.49              up  1.00000          1.00000
> >>>>>   52 1.00000         osd.52              up  1.00000          1.00000
> >>>>>
> >>>>>
> >>>>> Then create 2 pools, 1 on HDD (platters), 1 on SSD/
> >>>>> and put SSD pul in from of HDD pool (cache tier)
> >>>>>
> >>>>> now we receive very bad performance results from cluster.
> >>>>> Even with rados bench we received very unstable performance with even
> >>>>> zero speed. So it's create very big issues for our clients.
> >>>>>
> >>>>> I try to tune all possible values, including OSD, but still no luck.
> >>>>>
> >>>>> Also very unbelievble situation, when i do
> >>>>> ceph tell... bench on SSD OSD - i receive about 20MB/s
> >>>>> If for HDD - 67 MB/s...
> >>>>>
> >>>>> I don;t understand why cache pools which consist of SSD works so
> bad...
> >>>>> We used Samsung 850 Pro 256 Gb as SSDs
> >>>>>
> >>>>> Can you guys give me advice please...
> >>>>>
> >>>>> Also very idiotic thing, when i set cache-mode to forward and try to
> >>>>> flush-evict all object (not all object evicted, some busy (locked on
> KVM
> >>>>> sides). but now i receive quite stable results for rados bench
> >>>>>
> >>>>>  Total time run:         30.275871
> >>>>> Total writes made:      2076
> >>>>> Write size:             4194304
> >>>>> Bandwidth (MB/sec):     274.278
> >>>>>
> >>>>> Stddev Bandwidth:       75.1445
> >>>>> Max bandwidth (MB/sec): 368
> >>>>> Min bandwidth (MB/sec): 0
> >>>>> Average Latency:        0.232892
> >>>>> Stddev Latency:         0.240356
> >>>>> Max latency:            2.01436
> >>>>> Min latency:            0.0716344
> >>>>>
> >>>>> Without zeros, etc...  So i don't understand how it's possible.
> >>>>>
> >>>>> Also interesting thing, when i disable overlay for pool, rados bench
> >>>>> become around 70MB/s as for ordinary HDD, but in same time rados
> bench for
> >>>>> SSD pool, which not used anymore show same bad results...
> >>>>>
> >>>>> So please, give me some direction to deeg...
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@lists.ceph.com <javascript:;>
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com <javascript:;>
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com <javascript:;>
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com <javascript:;>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CEPH cache layer. Very slow

Reply via email to