Re: [ceph-users] Ceph, SSD, and NVMe

Somnath Roy Wed, 30 Sep 2015 12:34:30 -0700

David,
You should move to Hammer to get all the benefits of performance. It's all 
added to Giant and migrated to the present hammer LTS release.
FYI, focus was so far with read performance improvement and what we saw in our 
environment with 6Gb SAS SSDs so far that we are able to saturate drives BW 
wise with 64K onwards. But, with smaller block like 4K we are not able to 
saturate the SAS SSD drives yet.
But, considering Ceph's scale out nature you can get some very good numbers out 
of a cluster. For example, with 8 SAS SSD drives (in a JBOF) and having 2 heads 
in front (So, a 2 node Ceph cluster) we are able to hit ~300K Random read iops 
while 8 SSD aggregated performance would be ~400K. Not too bad. At this point 
we are saturating host cpus.
We have seen almost linear scaling if you add similar setups i.e adding say ~3 
of the above setup, you could hit ~900K RR iops. So, I would say it is 
definitely there in terms read iops and more improvement are coming.
But, write path is very awful compare to read and that's where the problem is. 
Because, in the mainstream, no workload is 100% RR (IMO). So,  even if you have 
say 90-10 read/write the performance numbers would be  ~6/7 X slower.
So, it is very much dependent on your workload/application access pattern and 
obviously the cost you are willing to spend.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark 
Nelson
Sent: Wednesday, September 30, 2015 12:04 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph, SSD, and NVMe

On 09/30/2015 09:34 AM, J David wrote:
> Because we have a good thing going, our Ceph clusters are still
> running Firefly on all of our clusters including our largest, all-SSD
> cluster.
>
> If I understand right, newer versions of Ceph make much better use of
> SSDs and give overall much higher performance on the same equipment.
> However, the impression I get of newer versions is that they are also
> not as stable as Firefly and should only be used with caution.
>
> Given our storage consumers have an effectively unlimited appetite for
> IOPs and throughput, more performance would be very welcome.  But not
> if it leads to cluster crashes and lost data.
>
> What really prompts this is that we are starting to see large-scale
> NVMe equipment appearing in the channel ( e.g.
> http://www.supermicro.com/products/system/1U/1028/SYS-1028U-TN10RT_.cf
> m ).  The cost is significantly higher with commensurately higher
> theoretical perfomance.  But if we're already not pushing our SSD's to
> the max over SAS, the added benefit of NVMe would largely be lost.
>
> On the other hand, if we could safely upgrade to a more recent version
> that is as stable and bulletproof as Firefly has been for us, but has
> better performance with SSDs, that would not only benefit our current
> setup, it would be a necessary first step for moving onto NVMe.
>
> So this raises three questions:
>
> 1) Have I correctly understood that one or more post-FireFly releases
> exist that (c.p.) perform significantly better with all-SSD setups?
>
> 2) Is there any such release that (generally) is as rock-solid as
> FireFly.  Of course this is somewhat situationally dependent, so I
> would settle for: is there any such release that doesn't have any
> known minding-my-own-business-suddenly-lost-data bugs in a 100% RBD
> use case?
>
> 3) Has anyone done anything with NVMe as storage (not just journals)
> who would care to share what kind of performance they experienced?
>
> (Of course if we do upgrade we will do so carefully, do a test cluster
> first, have backups standing by, etc.  But if it's already known that
> doing so will either not improve anything or is likely to blow up in
> our faces, it would be better to leave well enough alone.  The current
> performance is by no means bad, we're just always greedy for more. :)
> )
>
> Thanks for any advice/suggestions!

Hi David,

The single biggest performance improvement we've seen for SSDs has resulted 
from the memory allocator investigation that Chaitanya Hulgol and Somnath Roy 
spearheaded at Sandisk and others including myself have followed up and tried 
to expand on since then.

See:

http://www.spinics.net/lists/ceph-devel/msg25823.html
https://www.mail-archive.com/ceph-devel@vger.kernel.org/msg23100.html
http://www.spinics.net/lists/ceph-devel/msg21582.html

I haven't tested firefly, but there's a good chance that you may see a 
significant performance improvement simply by upgrading your systems to 
tcmalloc 2.4 and loading the OSDs with 128MB of thread cache or LD_PRELOAD 
jemalloc.  This isn't something we officially support in RHCS yet, but we'll 
likely be moving toward it for future releases based on the very positive 
results we are seeing.  The biggest thing to keep in mind is that this does 
increase per-OSD memory usage by several hundred MB, so 3-4X IOPS increase does 
come with a cost.  On the plus side, it also reduces CPU usage, sometimes 
dramatically.  You may be able to offset the increased memory usage somewhat by 
disabling transparent huge pages (especially with jemalloc).

See:

http://www.spinics.net/lists/ceph-devel/msg26483.html

FWIW, between Sage's newstore work, and recent work by Somnath Roy to optimize 
the write path, we may see further improvement, but neither of those are ready 
for production yet.

> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph, SSD, and NVMe

Reply via email to