Hi,
if you look in the archive you'll see I posted something similiar about 2 
months ago.

You can try something experimenting with
1) stock binaries - tcmalloc
2) LD_PRELOADed jemalloc
3) ceph recompiled with neither (glibc malloc)
4) ceph recompiled with jemalloc (?)

We simply recompiled ceph binaries without tcmalloc and CPU usage went down 
considerably and latencies improved. I can't vouch for no adverse effects in 
the long run, though. We went back to tcmalloc a while ago while hunting down a 
problem (to eliminate variables) but it's just temporary and we are going to 
switch back. Disabling tcmalloc saved us a lot of cores.

There is also a variable:
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES

Some people reported upping this variable helps alleviate the issue, but it 
didn't work for us, which might be a bug in our version of tcmalloc, or a Ceph 
bug. Touching it caused serious performance problems right away. It's not the 
ideal solution anyway.

Some references:
http://tracker.ceph.com/issues/12516
http://events.linuxfoundation.org/sites/events/files/slides/optimizing_ceph_flash.pdf
https://www.mail-archive.com/search?l=ceph-de...@vger.kernel.org&q=subject:%22Re%3A+Performance+variation+across+RBD+clients+on+different+pools+in+all+SSD+setup+-+tcmalloc+issue%22&o=newest&f=1

Jan


> On 11 Aug 2015, at 17:13, Межов Игорь Александрович <me...@yuterra.ru> wrote:
> 
> Hi!
> 
> We got some strange performance results when running random read fio test on 
> our test Hammer cluster.
> 
> When we run fio-rbd (4k, randread, 8 jobs, QD=32, 500Gb rbd image) at first 
> time (pagecache is cold/empty) 
> we got ~12kiops sustained performance. It is quite resonable value, as 
> 12kiops/34osd = 352iops per disk. 
> This is rather normal value per 10k sas disk. As most of the data have really 
> read from platters, we also got 
> high iowait - ~45% and average user cpu activity (~35%).
> 
> But when we run the same test second time, some data already stay in a 
> pagecache and can be acessed
> faster, and yes, we got ~25kiops. We have low iowait (~1-3%), but 
> surprisingly high user cpu activity >70%
> 
> Perf top shows us, than most calls are in tcmalloc library:
>  19,61%  libtcmalloc.so.4.2.2              [.] 
> tcmalloc::CentralFreeList::FetchFromOneSpans(int, void**, void**)
>  15,53%  libtcmalloc.so.4.2.2              [.] tcmalloc::SLL_Next(void*)
>   9,03%  libtcmalloc.so.4.2.2              [.] 
> TCMalloc_PageMap3<35>::get(unsigned long) const
>   6,71%  libtcmalloc.so.4.2.2              [.] 
> tcmalloc::CentralFreeList::ReleaseToSpans(void*)
>   1,59%  libtcmalloc.so.4.2.2              [.] 
> tcmalloc::CentralFreeList::ReleaseListToSpans(void*)
>   1,58%  libtcmalloc.so.4.2.2              [.] tcmalloc::SLL_PopRange(void**, 
> int, void**, void**)
>   1,42%  libtcmalloc.so.4.2.2              [.] 
> tcmalloc::PageHeap::GetDescriptor(unsigned long) const
>   1,03%  libtcmalloc.so.4.2.2              [.] 0x0000000000060589
>   0,91%  libtcmalloc.so.4.2.2              [.] 
> tcmalloc::ThreadCache::Scavenge()
>   0,82%  libtcmalloc.so.4.2.2              [.] 
> tcmalloc::DLL_Remove(tcmalloc::Span*)
>   0,80%  libtcmalloc.so.4.2.2              [.] 
> tcmalloc::ThreadCache::IncreaseCacheLimitLocked()
>   0,75%  libtcmalloc.so.4.2.2              [.] tcmalloc::Static::pageheap()
>   0,69%  libtcmalloc.so.4.2.2              [.] PackedCache<35, unsigned 
> long>::GetOrDefault(unsigned long, unsigned long) const
>   0,51%  libpthread-2.19.so                [.] __pthread_mutex_unlock_usercnt 
>        
> 
> 
> Running the same test over an RBD image in SSD pool gives the same 
> 25-30kiops, while every DC S3700 SSD
> we used in ssd pool are easily performing >50k iops. I think, that 25-30kiops 
> limit we got are due to tcmalloc 
> inefficiency. 
> 
> What we can do to improve our results? Is there are some tuning of tcmalloc, 
> or maybe compiling ceph
> with jemalloc will give better results? Have you any thoughts?
> 
> Our small test Hammer install:
> - Debian Jessie;
> - Ceph Hammer 0.94.2 self-built from sources (tcmalloc)
> - 1xE5-2670 + 128Gb RAM
> - 2 nodes shared with mons, system and mon DB are on separate SAS mirror;
> - 17 OSD on each node, SAS 10k;
> - 2 Intel DC S3700 200Gb SSD for journalling on each node
> - 2 Intel DC S3700 400Gb SSD for separate SSD pool
> - 10Gbit interconnect, shared public and cluster metwork, MTU9100
> - 10Gbit client host, fio 2.2.7 compiled with RBD engine
> 
> Megov Igor
> CIO, Yuterra
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to