Thanks Michael. So quick correction based on Michael's response. In question 4, I should have not made any reference to Ceph objects, since objects are not striped (per Michael's response). Instead, I should simply have used the words "Ceph VM Image" instead of "Ceph objects". A Ceph VM image would constitute 1000s of objects, and the different objects are striped/spread across multiple OSDs from multiple servers. In that situation, what's answer to #4....
For question #2, a quick question to clarify Michael's response: if the underlying filesystem is xfs (and not btrfs), it is still more-or-less instantaneous because the snapshotting still uses some sort of copy-on-write technology? On Tue, Nov 19, 2013 at 8:29 PM, Michael Lowe <j.michael.l...@gmail.com>wrote: > 1a. I believe it's dependent on format 2 images not btrfs. > 1b. Snapshot works independent of the backing file system. > 2. All data goes through the journals. > 4a. Rbd image objects are not striped, they come in default 4MB chunks, > consecutive sectors will come from the same object and osd. I don't know > what the result of the convolution of crush, vm filesystem sector > allocation, and ethernet bonding would be. > > > > Sent from my iPad > > On Nov 19, 2013, at 8:12 PM, Gautam Saxena <gsax...@i-a-inc.com> wrote: > > 1a) The Ceph documentation on Openstack integration make a big (and > valuable) point that cloning images should be instantaneous/quick due to > the copy-on-write functionality. See "Boot from volume" at bottom of > http://ceph.com/docs/master/rbd/rbd-openstack/. Here's the excerpt: > > When Glance and Cinder are both using Ceph block devices, the image is a > copy-on-write clone, so volume creation is very fast. > > However, is this true *only* if we are using btrfs as the underlying file > system for the OSDs? If so, then I don't think we can get this nice "quick" > cloning, since CEPH documentation states all over the place that btrfs is > not yet production ready. > > 1b) Ceph also describes snapshoting/layering as being super quick due to > "copy on write". http://ceph.com/docs/master/rbd/rbd-snapshot/ > > Does this feature also depend on btrfs being used as underlying filesystem > for OSDs? > > 2) If we have about 10 TB of data to transfer to CEPH (initial migration), > would all 10 TB pass through the journals? If so, would it make sense to > initially put the journals on each disk's separate partition (instead of an > SSD), then once the 10 TB have been copied, to then change the Ceph > configuration to now use SSDs for journaling instead of a partition on each > disk? In this way, we don't "kill" (or significantly reduce) the SSDs life > expectancy on day 1? (It's ok if the intiial migration takes longer if > we're not using SSDs -- and I'm not sure that it will take more than twice > as long anyways....) > > 3) Ceph documentation recommends multiple networks (front-side and > back-side). I was wondering though which is "better": one large bonded > interface of 6*1 GB/s = 6 GB/s or two or three interfaces, each of which > would only be 2 or 3 GB/s (after bonding). My initial instincts is to just > go for the nice fat 6 GB/s one, since I'm not worried about denial of > service attacks (DOS) on my internal network and I figure this way I'll get > excellent performance *most* of the time with some (minor?) risk that > occassionally a client request may (or may not?) experience latency due to > network traffic from back-end activities like replication? (My replication > level will most likely be 2.) > > 4a) Regarding bonding: If I understood Ceph architecture correctly, any > client request will automatically be routed to the individual OSDs that > contain the a piece (a stripe) of the overall object that is being sought. > So a single client request for an object could generate "n" requests to "n" > OSDs. Since the OSDs (in a perfect world) will reside equally on all > servers, then the normal hashing algorithm that Linux + LACP switches uses > should balance these "n" requests accross "m" physical ethernet ports. So > if I have 6 ethernet ports per server and say 6 servers, then in a perfect > world, my "n" requests would use 6 ethernet ports. (In a real world, I > imagine the hashing is not perfect and so maybe only 4 ethernet ports get > used and the other two do nothing....). Is this understanding correct? If > so, normal LACP hashing should suffice for my needs. > > 4b) A variation of the above question: if the 6 servers I have are NOT of > equal size, such that the storage distributions are 24TB, 16TB, 12TB, 6TB, > 4TB and 4TB (for a total of 68 TB hard disks across all servers) -- would > it be reasonably to assume that CEPH would balance any object data roughly > proportionally to the size of each server? (You can assume that the CRUSH > setup is just using the default setup that comes with ceph-deploy, and that > each server typically has 6 to 8 disks.) So a 1 TB vm, for example, would > be split 24/68 on server 1; 16/68 on server 2; 12/68 on server 3; 4/68 on > server 4; and 4/68 on servers 5 and 6? > > > > -- > *Gautam Saxena * > President & CEO > Integrated Analysis Inc. > > Making Sense of Data.™ > Biomarker Discovery Software | Bioinformatics Services | Data Warehouse > Consulting | Data Migration Consulting > www.i-a-inc.com <http://www.i-a-inc.com/> > gsax...@i-a-inc.com > (301) 760-3077 office > (240) 479-4272 direct > (301) 560-3463 fax > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- *Gautam Saxena * President & CEO Integrated Analysis Inc. Making Sense of Data.™ Biomarker Discovery Software | Bioinformatics Services | Data Warehouse Consulting | Data Migration Consulting www.i-a-inc.com <http://www.i-a-inc.com/> gsax...@i-a-inc.com (301) 760-3077 office (240) 479-4272 direct (301) 560-3463 fax
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com