72 osd, 60 hdd, 12 ssd Primary workload - rbd, kvm пятница, 14 августа 2015 г. пользователь Ben Hines написал:
> Nice to hear that you have no SSD failures yet in 10months. > > How many OSDs are you running, and what is your primary ceph workload? > (RBD, rgw, etc?) > > -Ben > > On Fri, Aug 14, 2015 at 2:23 AM, Межов Игорь Александрович > <me...@yuterra.ru <javascript:;>> wrote: > > Hi! > > > > > > Of course, it isn't cheap at all, but we use Intel DC S3700 200Gb for > ceph > > journals > > and DC S3700 400Gb in the SSD pool: same hosts, separate root in > crushmap. > > > > SSD pool are not yet in production, journаlling SSDs works under > production > > load > > for 10 months. They're in good condition - no faults, no degradation. > > > > We specially take 200Gb SSD for journals to reduce costs, and also have a > > higher > > than recommended OSD/SSD ratio: 1 SSD per 10-12 ODS, whille recommended > > 1/3 to 1/6. > > > > So, as a conclusion - I'll recommend you to get a bigger budget and buy > > durable > > and fast SSDs for Ceph. > > > > Megov Igor > > CIO, Yuterra > > > > ________________________________ > > От: ceph-users <ceph-users-boun...@lists.ceph.com <javascript:;>> от > имени Voloshanenko > > Igor <igor.voloshane...@gmail.com <javascript:;>> > > Отправлено: 13 августа 2015 г. 15:54 > > Кому: Jan Schermer > > Копия: ceph-users@lists.ceph.com <javascript:;> > > Тема: Re: [ceph-users] CEPH cache layer. Very slow > > > > So, good, but price for 845 DC PRO 400 GB higher in about 2x times than > > intel S3500 240G ((( > > > > Any other models? ((( > > > > 2015-08-13 15:45 GMT+03:00 Jan Schermer <j...@schermer.cz <javascript:;> > >: > >> > >> I tested and can recommend the Samsung 845 DC PRO (make sure it is DC > PRO > >> and not just "PRO" or "DC EVO"!). > >> Those were very cheap but are out of stock at the moment (here). > >> Faster than Intels, cheaper, and slightly different technology (3D > V-NAND) > >> which IMO makes them superior without needing many tricks to do its job. > >> > >> Jan > >> > >> On 13 Aug 2015, at 14:40, Voloshanenko Igor < > igor.voloshane...@gmail.com <javascript:;>> > >> wrote: > >> > >> Tnx, Irek! Will try! > >> > >> but another question to all, which SSD good enough for CEPH now? > >> > >> I'm looking into S3500 240G (I have some S3500 120G which show great > >> results. Around 8x times better than Samsung) > >> > >> Possible you can give advice about other vendors/models with same or > below > >> price level as S3500 240G? > >> > >> 2015-08-13 12:11 GMT+03:00 Irek Fasikhov <malm...@gmail.com > <javascript:;>>: > >>> > >>> Hi, Igor. > >>> Try to roll the patch here: > >>> > >>> > http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov > >>> > >>> P.S. I am no longer tracks changes in this direction(kernel), because > we > >>> use already recommended SSD > >>> > >>> С уважением, Фасихов Ирек Нургаязович > >>> Моб.: +79229045757 > >>> > >>> 2015-08-13 11:56 GMT+03:00 Voloshanenko Igor > >>> <igor.voloshane...@gmail.com <javascript:;>>: > >>>> > >>>> So, after testing SSD (i wipe 1 SSD, and used it for tests) > >>>> > >>>> root@ix-s2:~# sudo fio --filename=/dev/sda --direct=1 --sync=1 > >>>> --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based > >>>> --gr[53/1800] > >>>> ting --name=journal-test > >>>> journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, > >>>> iodepth=1 > >>>> fio-2.1.3 > >>>> Starting 1 process > >>>> Jobs: 1 (f=1): [W] [100.0% done] [0KB/1152KB/0KB /s] [0/288/0 iops] > [eta > >>>> 00m:00s] > >>>> journal-test: (groupid=0, jobs=1): err= 0: pid=2849460: Thu Aug 13 > >>>> 10:46:42 2015 > >>>> write: io=68972KB, bw=1149.6KB/s, iops=287, runt= 60001msec > >>>> clat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 > >>>> lat (msec): min=2, max=15, avg= 3.48, stdev= 1.08 > >>>> clat percentiles (usec): > >>>> | 1.00th=[ 2704], 5.00th=[ 2800], 10.00th=[ 2864], 20.00th=[ > >>>> 2928], > >>>> | 30.00th=[ 3024], 40.00th=[ 3088], 50.00th=[ 3280], 60.00th=[ > >>>> 3408], > >>>> | 70.00th=[ 3504], 80.00th=[ 3728], 90.00th=[ 3856], 95.00th=[ > >>>> 4016], > >>>> | 99.00th=[ 9024], 99.50th=[ 9280], 99.90th=[ 9792], > >>>> 99.95th=[10048], > >>>> | 99.99th=[14912] > >>>> bw (KB /s): min= 1064, max= 1213, per=100.00%, avg=1150.07, > >>>> stdev=34.31 > >>>> lat (msec) : 4=94.99%, 10=4.96%, 20=0.05% > >>>> cpu : usr=0.13%, sys=0.57%, ctx=17248, majf=0, minf=7 > >>>> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, > >>>> >=64=0.0% > >>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >>>> >=64=0.0% > >>>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >>>> >=64=0.0% > >>>> issued : total=r=0/w=17243/d=0, short=r=0/w=0/d=0 > >>>> > >>>> Run status group 0 (all jobs): > >>>> WRITE: io=68972KB, aggrb=1149KB/s, minb=1149KB/s, maxb=1149KB/s, > >>>> mint=60001msec, maxt=60001msec > >>>> > >>>> Disk stats (read/write): > >>>> sda: ios=0/17224, merge=0/0, ticks=0/59584, in_queue=59576, > >>>> util=99.30% > >>>> > >>>> So, it's pain... SSD do only 287 iops on 4K... 1,1 MB/s > >>>> > >>>> I try to change cache mode : > >>>> echo temporary write through > /sys/class/scsi_disk/2:0:0:0/cache_type > >>>> echo temporary write through > /sys/class/scsi_disk/3:0:0:0/cache_type > >>>> > >>>> no luck, still same shit results, also i found this article: > >>>> https://lkml.org/lkml/2013/11/20/264 pointed to old very simple > patch, > >>>> which disable CMD_FLUSH > >>>> https://gist.github.com/TheCodeArtist/93dddcd6a21dc81414ba > >>>> > >>>> Has everybody better ideas, how to improve this? (or disable CMD_FLUSH > >>>> without recompile kernel, i used ubuntu and 4.0.4 for now (4.x branch > >>>> because SSD 850 Pro have issue with NCQ TRIM< and before 4.0.4 this > >>>> exception was not included into libsata.c) > >>>> > >>>> 2015-08-12 19:17 GMT+03:00 Pieter Koorts <pieter.koo...@me.com > <javascript:;>>: > >>>>> > >>>>> Hi Igor > >>>>> > >>>>> I suspect you have very much the same problem as me. > >>>>> > >>>>> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg22260.html > >>>>> > >>>>> Basically Samsung drives (like many SATA SSD's) are very much hit and > >>>>> miss so you will need to test them like described here to see if > they are > >>>>> any good. > >>>>> > http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ > >>>>> > >>>>> To give you an idea my average performance went from 11MB/s (with > >>>>> Samsung SSD) to 30MB/s (without any SSD) on write performance. This > is a > >>>>> very small cluster. > >>>>> > >>>>> Pieter > >>>>> > >>>>> On Aug 12, 2015, at 04:33 PM, Voloshanenko Igor > >>>>> <igor.voloshane...@gmail.com <javascript:;>> wrote: > >>>>> > >>>>> Hi all, we have setup CEPH cluster with 60 OSD (2 diff types) (5 > nodes, > >>>>> 12 disks on each, 10 HDD, 2 SSD) > >>>>> > >>>>> Also we cover this with custom crushmap with 2 root leaf > >>>>> > >>>>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > >>>>> -100 5.00000 root ssd > >>>>> -102 1.00000 host ix-s2-ssd > >>>>> 2 1.00000 osd.2 up 1.00000 1.00000 > >>>>> 9 1.00000 osd.9 up 1.00000 1.00000 > >>>>> -103 1.00000 host ix-s3-ssd > >>>>> 3 1.00000 osd.3 up 1.00000 1.00000 > >>>>> 7 1.00000 osd.7 up 1.00000 1.00000 > >>>>> -104 1.00000 host ix-s5-ssd > >>>>> 1 1.00000 osd.1 up 1.00000 1.00000 > >>>>> 6 1.00000 osd.6 up 1.00000 1.00000 > >>>>> -105 1.00000 host ix-s6-ssd > >>>>> 4 1.00000 osd.4 up 1.00000 1.00000 > >>>>> 8 1.00000 osd.8 up 1.00000 1.00000 > >>>>> -106 1.00000 host ix-s7-ssd > >>>>> 0 1.00000 osd.0 up 1.00000 1.00000 > >>>>> 5 1.00000 osd.5 up 1.00000 1.00000 > >>>>> -1 5.00000 root platter > >>>>> -2 1.00000 host ix-s2-platter > >>>>> 13 1.00000 osd.13 up 1.00000 1.00000 > >>>>> 17 1.00000 osd.17 up 1.00000 1.00000 > >>>>> 21 1.00000 osd.21 up 1.00000 1.00000 > >>>>> 27 1.00000 osd.27 up 1.00000 1.00000 > >>>>> 32 1.00000 osd.32 up 1.00000 1.00000 > >>>>> 37 1.00000 osd.37 up 1.00000 1.00000 > >>>>> 44 1.00000 osd.44 up 1.00000 1.00000 > >>>>> 48 1.00000 osd.48 up 1.00000 1.00000 > >>>>> 55 1.00000 osd.55 up 1.00000 1.00000 > >>>>> 59 1.00000 osd.59 up 1.00000 1.00000 > >>>>> -3 1.00000 host ix-s3-platter > >>>>> 14 1.00000 osd.14 up 1.00000 1.00000 > >>>>> 18 1.00000 osd.18 up 1.00000 1.00000 > >>>>> 23 1.00000 osd.23 up 1.00000 1.00000 > >>>>> 28 1.00000 osd.28 up 1.00000 1.00000 > >>>>> 33 1.00000 osd.33 up 1.00000 1.00000 > >>>>> 39 1.00000 osd.39 up 1.00000 1.00000 > >>>>> 43 1.00000 osd.43 up 1.00000 1.00000 > >>>>> 47 1.00000 osd.47 up 1.00000 1.00000 > >>>>> 54 1.00000 osd.54 up 1.00000 1.00000 > >>>>> 58 1.00000 osd.58 up 1.00000 1.00000 > >>>>> -4 1.00000 host ix-s5-platter > >>>>> 11 1.00000 osd.11 up 1.00000 1.00000 > >>>>> 16 1.00000 osd.16 up 1.00000 1.00000 > >>>>> 22 1.00000 osd.22 up 1.00000 1.00000 > >>>>> 26 1.00000 osd.26 up 1.00000 1.00000 > >>>>> 31 1.00000 osd.31 up 1.00000 1.00000 > >>>>> 36 1.00000 osd.36 up 1.00000 1.00000 > >>>>> 41 1.00000 osd.41 up 1.00000 1.00000 > >>>>> 46 1.00000 osd.46 up 1.00000 1.00000 > >>>>> 51 1.00000 osd.51 up 1.00000 1.00000 > >>>>> 56 1.00000 osd.56 up 1.00000 1.00000 > >>>>> -5 1.00000 host ix-s6-platter > >>>>> 12 1.00000 osd.12 up 1.00000 1.00000 > >>>>> 19 1.00000 osd.19 up 1.00000 1.00000 > >>>>> 24 1.00000 osd.24 up 1.00000 1.00000 > >>>>> 29 1.00000 osd.29 up 1.00000 1.00000 > >>>>> 34 1.00000 osd.34 up 1.00000 1.00000 > >>>>> 38 1.00000 osd.38 up 1.00000 1.00000 > >>>>> 42 1.00000 osd.42 up 1.00000 1.00000 > >>>>> 50 1.00000 osd.50 up 1.00000 1.00000 > >>>>> 53 1.00000 osd.53 up 1.00000 1.00000 > >>>>> 57 1.00000 osd.57 up 1.00000 1.00000 > >>>>> -6 1.00000 host ix-s7-platter > >>>>> 10 1.00000 osd.10 up 1.00000 1.00000 > >>>>> 15 1.00000 osd.15 up 1.00000 1.00000 > >>>>> 20 1.00000 osd.20 up 1.00000 1.00000 > >>>>> 25 1.00000 osd.25 up 1.00000 1.00000 > >>>>> 30 1.00000 osd.30 up 1.00000 1.00000 > >>>>> 35 1.00000 osd.35 up 1.00000 1.00000 > >>>>> 40 1.00000 osd.40 up 1.00000 1.00000 > >>>>> 45 1.00000 osd.45 up 1.00000 1.00000 > >>>>> 49 1.00000 osd.49 up 1.00000 1.00000 > >>>>> 52 1.00000 osd.52 up 1.00000 1.00000 > >>>>> > >>>>> > >>>>> Then create 2 pools, 1 on HDD (platters), 1 on SSD/ > >>>>> and put SSD pul in from of HDD pool (cache tier) > >>>>> > >>>>> now we receive very bad performance results from cluster. > >>>>> Even with rados bench we received very unstable performance with even > >>>>> zero speed. So it's create very big issues for our clients. > >>>>> > >>>>> I try to tune all possible values, including OSD, but still no luck. > >>>>> > >>>>> Also very unbelievble situation, when i do > >>>>> ceph tell... bench on SSD OSD - i receive about 20MB/s > >>>>> If for HDD - 67 MB/s... > >>>>> > >>>>> I don;t understand why cache pools which consist of SSD works so > bad... > >>>>> We used Samsung 850 Pro 256 Gb as SSDs > >>>>> > >>>>> Can you guys give me advice please... > >>>>> > >>>>> Also very idiotic thing, when i set cache-mode to forward and try to > >>>>> flush-evict all object (not all object evicted, some busy (locked on > KVM > >>>>> sides). but now i receive quite stable results for rados bench > >>>>> > >>>>> Total time run: 30.275871 > >>>>> Total writes made: 2076 > >>>>> Write size: 4194304 > >>>>> Bandwidth (MB/sec): 274.278 > >>>>> > >>>>> Stddev Bandwidth: 75.1445 > >>>>> Max bandwidth (MB/sec): 368 > >>>>> Min bandwidth (MB/sec): 0 > >>>>> Average Latency: 0.232892 > >>>>> Stddev Latency: 0.240356 > >>>>> Max latency: 2.01436 > >>>>> Min latency: 0.0716344 > >>>>> > >>>>> Without zeros, etc... So i don't understand how it's possible. > >>>>> > >>>>> Also interesting thing, when i disable overlay for pool, rados bench > >>>>> become around 70MB/s as for ordinary HDD, but in same time rados > bench for > >>>>> SSD pool, which not used anymore show same bad results... > >>>>> > >>>>> So please, give me some direction to deeg... > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> ceph-users@lists.ceph.com <javascript:;> > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> ceph-users mailing list > >>>> ceph-users@lists.ceph.com <javascript:;> > >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > >>> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com <javascript:;> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <javascript:;> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com