Re: [ceph-users] Ceph for online file storage

Oliver Dzombic Thu, 30 Jun 2016 02:17:04 -0700

hi Moïn,

two suggestions, based on my experience:


1. max HDD size of GOOD QUALITY 7200 RPM spinning SATA/SAS HDD's is 4 TB.

Everything else will ruin ur performance ( as long as you dont do pure
archiving of files ( writing one time, "never" touching them again )

If you have 8 TB HDDs, just use them for max. 50%

2. use ssd cache tier with ssd's which can sustain continouse IO
operations. Depending on the size of that cache tier you might be able
to use > 4 TB per 7200 RPM spinning HDD.

3. of course, but thats already quiet standard, use ssd's for the
journals and metadata ( take care to use the right ssd's for that )

Look at

https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

to have an idea what i mean.

Good luck !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 30.06.2016 um 10:34 schrieb m.da...@bluewin.ch:
> Thank you all for your prompt answers.
> 
>> firstly, wall of text, makes things incredibly hard to read.
>> Use paragraphs/returns liberally.
> 
> I actually made sure to use paragraphs. For some reason, the formatting was 
> removed.
> 
>> Is that your entire experience with Ceph, ML archives and docs?
> 
> Of course not, I have already been through the whole documentation many 
> times. It's just that I couldn't really decide between the choices I was 
> given.
> 
>> What's an "online storage"?
>> I assume you're talking about what is is commonly referred as "cloud
> storage".
> 
> I try not to use the term "cloud", but if you must, then yes that's the idea 
> behind it. Basically an online hard disk.
> 
>> 10MB is not a small file in my book, 1-4KB (your typical mail) are small
>> files.
>> How much data (volume/space) are you looking at initially and within a
>> year of deployment?
> 
> 10MB is small compared to the larger files, but it is indeed bigger that 
> smaller, IOPS-intensive files (like the emails you pointed out).
> 
> Right now there are two servers, each with 12x8TB. I expect a growth rate of 
> about the same size every 2-3 months.
> 
>> What usage patterns are you looking at, expecting?
> 
> Since my customers will put their files on this "cloud", it's generally write 
> once, read many (or at least more reads than writes).
> As they most likely will store private documents, but some bigger files too, 
> the smaller files are predominant.
> 
>> That's quite the blanket statement and sounds like from A sales brochure. 
>> SSDs for OSD journals are always a good idea.
>> Ceph scales first and foremost by adding more storage nodes and OSDs.
> 
> What I meant by scaling is that as the number of customers grows, the more 
> small files there will be, and so in order to have decent performance at
> that point, SSDs are a must. I can add many OSDs, but if they are all 
> struggling with IOPS then it's no use (except having more space).
> 
>> Are we talking about existing HW or what you're planning?
> 
> That is existing hardware. Given the high capacity of the drives, I went with 
> a more powerful CPU to avoid myself future headaches.
> 
>> Also, avoid large variations in your storage nodes if anyhow possible,
> especially in your OSD sizes.
> 
> Say I have two nodes, one with 12 OSDs and  the other with 24. All drives are 
> the same size. Would that cause any issue ? (except for the failure domain)
> 
> I think it is clear that native calls are the way to go, even the docs point 
> you in that direction. Now the issue is that the clients needs to have a file 
> directory structure.
> 
> The access topology is as follows:
> 
> Customer <-> customer application <-> server application <-> Ceph cluster
> 
> The customer has to be able to make directories, as with an FTP server for 
> example. Using CephFS would make this task very easy, though at the expense 
> of some performance.
> With natives calls, since everything is considered as an object, it gets 
> trickier to provide this feature. Perhaps some naming scheme would make this 
> possible.
> 
> Kind regards,
> 
> Moïn Danai.
> 
> ----Original Message----
> From : ch...@gol.com
> Date : 27/06/2016 - 02:45 (CEST)
> To : ceph-users@lists.ceph.com
> Cc : m.da...@bluewin.ch
> Subject : Re: [ceph-users] Ceph for online file storage
> 
> 
> Hello,
> 
> firstly, wall of text, makes things incredibly hard to read.
> Use paragraphs/returns liberally.
> 
> Secondly, what Yang wrote.
> 
> More inline.
> On Sun, 26 Jun 2016 18:30:35 +0000 (GMT+00:00) m.da...@bluewin.ch wrote:
> 
>> Hi all,
>> After a quick review of the mailing list archive, I have a question that
>> is left unanswered: 
> 
> Is that your entire experience with Ceph, ML archives and docs?
> 
>> Is Ceph suitable for online file storage, and if
>> yes, shall I use RGW/librados or CephFS ? 
> 
> What's an "online storage"? 
> I assume you're talking about what is is commonly referred as "cloud
> storage".
> Which also typically tends to use HTTP, S3 and thus RGW would be the
> classic fit. 
> 
> But that's up to you really.
> 
> For example OwnCloud (and thus NextCloud) can use Ceph RGW as a storage
> backend. 
> 
>> The typical workload here is
>> mostly small files 50kB-10MB and some bigger ones 100MB+ up to 4TB max
>> (roughly 70/30 split). 
> 10MB is not a small file in my book, 1-4KB (your typical mail) are small
> files.
> How much data (volume/space) are you looking at initially and within a
> year of deployment?
> 
> What usage patterns are you looking at, expecting?
> 
>> Caching with SSDs is critical in achieving
>> scalable performance as OSD hosts increase (and files as well). 
> 
> That's quite the blanket statement and sounds like from A sales brochure. 
> SSDs for OSD journals are always a good idea.
> Ceph scales first and foremost by adding more storage nodes and OSDs.
> 
> SSD based cache-tiers (quite a different beast to journals) can help, but
> that's highly dependent on your usage patterns as well as correct sizing
> and configuration of the cache pool.
> 
> For example one of your 4TB files above could potentially wreck havoc with
> a cache pool of similar size.
> 
>> OSD
>> nodes have between 12 and 48 8TB drives. 
> 
> Are we talking about existing HW or what you're planning?
> 12 OSDs per node are a good start and what I aim for usually, 24 are
> feasible if you have some idea what you're doing.
> More than 24 OSDs per node requires quite the insight and significant
> investments in CPU and RAM. Tons of threads about this here.
> 
> Read the current thread "Dramatic performance drop at certain number of
> objects in pool" for example.
> 
> Also, avoid large variations in your storage nodes if anyhow possible,
> especially in your OSD sizes.
> 
> Christian
> 
>> If using CephFS, the hierarchy
>> would include alphabet letters at the root and then a user's directory
>> in the appropriate subfolder folder. With native calls, I'm not quite
>> sure on how to retrieve file A from user A and not user B. Note that the
>> software which processes user data is written in Java and deployed on
>> multiple client-facing servers, so rados integration should be easy.
>> Kind regards, Moïn Danai.
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph for online file storage

Reply via email to