On Mon, May 11, 2015 at 05:20:25AM +0000, Somnath Roy wrote:
> Two things..
> 
> 1. You should always use SSD drives for benchmarking after preconditioning it.
well, I don't really understand... ?

> 
> 2. After creating and mapping rbd lun, you need to write data first to read 
> it afterword otherwise fio output will be misleading. In fact, I think you 
> will see IO is not even hitting cluster (check with ceph -s)
yes, so this approves my conjecture. ok.


> 
> Now, if you are saying it's a 3 OSD setup, yes, ~23K is pretty low. Check the 
> following.
> 
> 1. Check client or OSd node cpu is saturating or not.
On OSD nodes, I can see cpeh-osd CPU utilisation of ~110%. On client node 
(which is one
of OSD nodes as well), I can see fio eating quite lot of CPU cycles.. I tried 
stopping
ceph-osd on this node (thus only two nodes are serving data) and performance 
got a bit higher,
to ~33k IOPS. But still I think it's not very good..


> 
> 2. With 4K, hope network BW is fine
I think it's ok..


> 
> 3. Number of PGs/pool should be ~128 or so.
I'm using pg_num 128


> 
> 4. If you are using krbd, you might want to try latest krbd module where 
> TCP_NODELAY problem is fixed. If you don't want that complexity, try with 
> fio-rbd.
I'm not using RBD (only for writing data to volume), for benchmarking, I'm 
using fio-rbd.

anything else I could check?


> 
> Hope this helps,
> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Nikola Ciprich
> Sent: Sunday, May 10, 2015 9:43 PM
> To: ceph-users
> Cc: n...@linuxbox.cz
> Subject: [ceph-users] very different performance on two volumes in the same 
> pool #2
> 
> Hello ceph developers and users,
> 
> some time ago, I posted here a question regarding very different performance 
> for two volumes in one pool (backed by SSD drives).
> 
> After some examination, I probably got to the root of the problem..
> 
> When I create fresh volume (ie rbd create --image-format 2 --size 51200 
> ssd/test) and run random io fio benchmark
> 
> fio  --randrepeat=1 --ioengine=rbd --direct=1 --gtod_reduce=1 --name=test 
> --pool=ssd3r --rbdname=${rbdname} --invalidate=1 --bs=4k --iodepth=64 
> --readwrite=randread
> 
> I get very nice performance of up to 200k IOPS. However once the volume is 
> written to (ie when I map it using rbd map and dd whole volume with some 
> random data), and repeat the benchmark, random performance drops to ~23k IOPS.
> 
> This leads me to conjecture that for unwritten (sparse) volumes, read is just 
> a noop, simply returning zeroes without really having to read data from 
> physical storage, and thus showing nice performance, but once the volume is 
> written, performance drops due to need to physically read the data, right?
> 
> However I'm a bit unhappy about the performance drop, the pool is backed by 3 
> SSD drives (each having random io performance of 100k iops) on three nodes, 
> and object size is set to 3. Cluster is completely idle, nodes are quad core 
> Xeons E3-1220 v3 @ 3.10GHz, 32GB RAM each, centos 6, kernel 3.18.12, ceph 
> 0.94.1. I'm using libtcmalloc (I even tried upgrading gperftools-libs to 2.4) 
> Nodes are connected using 10gb ethernet, with jumbo frames enabled.
> 
> 
> I tried tuning following values:
> 
> osd_op_threads = 5
> filestore_op_threads = 4
> osd_op_num_threads_per_shard = 1
> osd_op_num_shards = 25
> filestore_fd_cache_size = 64
> filestore_fd_cache_shards = 32
> 
> I don't see anything special in perf:
> 
>   5.43%  [kernel]              [k] acpi_processor_ffh_cstate_enter
>   2.93%  libtcmalloc.so.4.2.6  [.] 0x0000000000017d2c
>   2.45%  libpthread-2.12.so    [.] pthread_mutex_lock
>   2.37%  libpthread-2.12.so    [.] pthread_mutex_unlock
>   2.33%  [kernel]              [k] do_raw_spin_lock
>   2.00%  libsoftokn3.so        [.] 0x000000000001f455
>   1.96%  [kernel]              [k] __switch_to
>   1.32%  [kernel]              [k] __schedule
>   1.24%  libstdc++.so.6.0.13   [.] std::basic_ostream<char, 
> std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> 
> >(std::basic_ostream<char, std::char
>   1.24%  libc-2.12.so          [.] memcpy
>   1.19%  libtcmalloc.so.4.2.6  [.] operator delete(void*)
>   1.16%  [kernel]              [k] __d_lookup_rcu
>   1.09%  libstdc++.so.6.0.13   [.] 0x000000000007d6be
>   0.93%  libstdc++.so.6.0.13   [.] std::basic_streambuf<char, 
> std::char_traits<char> >::xsputn(char const*, long)
>   0.93%  ceph-osd              [.] crush_hash32_3
>   0.85%  libc-2.12.so          [.] vfprintf
>   0.84%  libc-2.12.so          [.] __strlen_sse42
>   0.80%  [kernel]              [k] get_futex_key_refs
>   0.80%  libpthread-2.12.so    [.] pthread_mutex_trylock
>   0.78%  libtcmalloc.so.4.2.6  [.] 
> tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
>  unsigned long, int)
>   0.71%  libstdc++.so.6.0.13   [.] std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> >::basic_string(std::string 
> const&)
>   0.68%  ceph-osd              [.] ceph::log::Log::flush()
>   0.66%  libtcmalloc.so.4.2.6  [.] tc_free
>   0.63%  [kernel]              [k] resched_curr
>   0.63%  [kernel]              [k] page_fault
>   0.62%  libstdc++.so.6.0.13   [.] std::string::reserve(unsigned long)
> 
> I'm running benchmark directly on one of nodes, which I know is not optimal, 
> but it's still able to give those 200k iops for empty volume, so I guess it 
> shouldn't be problem..
> 
> Another story is random write performance, which is totally poor, but I't 
> like to deal with read performance first..
> 
> 
> so my question is, are those numbers normal? If not, what should I check?
> 
> I'll be very grateful for all the hints I could get..
> 
> thanks a lot in advance
> 
> nik
> 
> 
> --
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: ser...@linuxbox.cz
> -------------------------------------
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
> 
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-------------------------------------

Attachment: pgpBfiFMPWTTt.pgp
Description: PGP signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to