Re: [ceph-users] misc performance tuning queries (related to OpenStack in particular)

Gautam Saxena Tue, 19 Nov 2013 18:21:00 -0800

Thanks Michael.

So quick correction based on Michael's response. In question 4, I should
have not made any reference to Ceph objects, since objects are not striped
(per Michael's response). Instead, I should simply have used the words
"Ceph VM Image" instead of "Ceph objects". A Ceph VM image would constitute
1000s of objects, and the different objects are striped/spread across
multiple OSDs from multiple servers. In that situation, what's answer to
#4....


For question #2, a quick question to clarify Michael's response: if the
underlying filesystem is xfs (and not btrfs), it is still more-or-less
instantaneous because the snapshotting still uses some sort of
copy-on-write technology?


On Tue, Nov 19, 2013 at 8:29 PM, Michael Lowe <j.michael.l...@gmail.com>wrote:

> 1a. I believe it's dependent on format 2 images not btrfs.
> 1b. Snapshot works independent of the backing file system.
> 2. All data goes through the journals.
> 4a. Rbd image objects are not striped, they come in default 4MB chunks,
> consecutive sectors will come from the same object and osd.  I don't know
> what the result of the convolution of crush, vm filesystem sector
> allocation, and ethernet bonding would be.
>
>
>
> Sent from my iPad
>
> On Nov 19, 2013, at 8:12 PM, Gautam Saxena <gsax...@i-a-inc.com> wrote:
>
> 1a) The Ceph documentation on Openstack integration make a big (and
> valuable) point that cloning images should be instantaneous/quick due to
> the copy-on-write functionality. See "Boot from volume" at bottom of
> http://ceph.com/docs/master/rbd/rbd-openstack/. Here's the excerpt:
>
> When Glance and Cinder are both using Ceph block devices, the image is a
> copy-on-write clone, so volume creation is very fast.
>
> However, is this true *only* if we are using btrfs as the underlying file
> system for the OSDs? If so, then I don't think we can get this nice "quick"
> cloning, since CEPH documentation states all over the place that btrfs is
> not yet production ready.
>
> 1b) Ceph also describes snapshoting/layering as being super quick due to
> "copy on write". http://ceph.com/docs/master/rbd/rbd-snapshot/
>
> Does this feature also depend on btrfs being used as underlying filesystem
> for OSDs?
>
> 2) If we have about 10 TB of data to transfer to CEPH (initial migration),
> would all 10 TB pass through the journals? If so, would it make sense to
> initially put the journals on each disk's separate partition (instead of an
> SSD), then once the 10 TB have been copied, to then change the Ceph
> configuration to now use SSDs for journaling instead of a partition on each
> disk? In this way, we don't "kill" (or significantly reduce) the SSDs life
> expectancy on day 1? (It's ok if the intiial migration takes longer if
> we're not using SSDs -- and I'm not sure that it will take more than twice
> as long anyways....)
>
> 3) Ceph documentation recommends multiple networks (front-side and
> back-side). I was wondering though which is "better": one large bonded
> interface of 6*1 GB/s = 6 GB/s or two or three interfaces, each of which
> would only be 2 or 3 GB/s (after bonding). My initial instincts is to just
> go for the nice fat 6 GB/s one, since I'm not worried about denial of
> service attacks (DOS) on my internal network and I figure this way I'll get
> excellent performance *most* of the time with some (minor?) risk that
> occassionally a client request may (or may not?) experience latency due to
> network traffic from back-end activities like replication? (My replication
> level will most likely be 2.)
>
> 4a) Regarding bonding: If I understood Ceph architecture correctly, any
> client request will automatically be routed to the individual OSDs that
> contain the a piece (a stripe) of the overall object that is being sought.
> So a single client request for an object could generate "n" requests to "n"
> OSDs. Since the OSDs (in a perfect world) will reside equally on all
> servers, then the normal hashing algorithm that Linux + LACP switches uses
> should balance these "n" requests accross "m" physical ethernet ports. So
> if I have 6 ethernet ports per server and say 6 servers, then in a perfect
> world, my "n" requests would use 6 ethernet ports. (In a real world, I
> imagine the hashing is not perfect and so maybe only 4 ethernet ports get
> used and the other two do nothing....). Is this understanding correct? If
> so, normal LACP hashing should suffice for my needs.
>
> 4b) A variation of the above question: if the 6 servers I have are NOT of
> equal size, such that the storage distributions are 24TB, 16TB, 12TB, 6TB,
> 4TB and 4TB (for a total of 68 TB hard disks across all servers) -- would
> it be reasonably to assume that CEPH would balance any object data roughly
> proportionally to the size of each server? (You can assume that the CRUSH
> setup is just using the default setup that comes with ceph-deploy, and that
> each server typically has 6 to 8 disks.) So a 1 TB vm, for example, would
> be split 24/68 on server 1; 16/68 on server 2; 12/68 on server 3; 4/68 on
> server 4; and 4/68 on servers 5 and 6?
>
>
>
> --
> *Gautam Saxena *
> President & CEO
> Integrated Analysis Inc.
>
> Making Sense of Data.™
> Biomarker Discovery Software | Bioinformatics Services | Data Warehouse
> Consulting | Data Migration Consulting
> www.i-a-inc.com  <http://www.i-a-inc.com/>
> gsax...@i-a-inc.com
> (301) 760-3077  office
> (240) 479-4272  direct
> (301) 560-3463  fax
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
*Gautam Saxena *
President & CEO
Integrated Analysis Inc.

Making Sense of Data.™
Biomarker Discovery Software | Bioinformatics Services | Data Warehouse
Consulting | Data Migration Consulting
www.i-a-inc.com  <http://www.i-a-inc.com/>
gsax...@i-a-inc.com
(301) 760-3077  office
(240) 479-4272  direct
(301) 560-3463  fax

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] misc performance tuning queries (related to OpenStack in particular)

Reply via email to