On Fri, 12 Sep 2014, Somnath Roy wrote:
> Thanks Sage...
> Basically, we are doing similar chunking in our current implementation which 
> is derived from objectstore. 
> Moving to Key/value will save us from that :-)
> Also, I was thinking, we may want to do compression (later may be dedupe ?) 
> on that Key/value layer as well.
> 
> Yes, partial read/write is definitely performance killer for object stores 
> and our objectstore is no exception. We need to see how we can counter that.
> 
> But, I think these are enough reason for me now to move our implementation to 
> the key/value interfaces. 

Sounds good.

By the way, hopefully this is a pretty painless process of wrapping your 
kv library with the KeyValueDB interface.  If not, that will be good to 
know.  I'm hoping it will fit well with a broad range of backends, but so 
far we've only done leveldb/rocksdb (same interface) and kinetic.  I'd 
like to see us try LMDB in this context as well...

sage

> 
> Regards
> Somnath
> 
> 
> -----Original Message-----
> From: Sage Weil [mailto:sw...@redhat.com] 
> Sent: Thursday, September 11, 2014 6:55 PM
> To: Somnath Roy
> Cc: Haomai Wang (haomaiw...@gmail.com); ceph-users@lists.ceph.com; 
> ceph-de...@vger.kernel.org
> Subject: RE: Regarding key/value interface
> 
> On Fri, 12 Sep 2014, Somnath Roy wrote:
> > Make perfect sense Sage..
> > 
> > Regarding striping of filedata, You are saying KeyValue interface will do 
> > the following for me?
> > 
> > 1. Say in case of rbd image of order 4 MB, a write request coming to 
> > Key/Value interface, it will  chunk the object (say full 4MB) in smaller 
> > sizes (configurable ?) and stripe it as multiple key/value pair ?
> > 
> > 2. Also, while reading it will take care of accumulating and send it back.
> 
> Precisely.
> 
> A smarter thing we might want to make it do in the future would be to take a 
> 4 KB write create a new key that logically overwrites part of the larger, 
> say, 1MB key, and apply it on read.  And maybe give up and rewrite the entire 
> 1MB stripe after too many small overwrites have accumulated.  
> Something along those lines to reduce the cost of small IOs to large objects.
> 
> sage
> 
> 
> 
>  > 
> > Thanks & Regards
> > Somnath
> > 
> > 
> > -----Original Message-----
> > From: Sage Weil [mailto:sw...@redhat.com]
> > Sent: Thursday, September 11, 2014 6:31 PM
> > To: Somnath Roy
> > Cc: Haomai Wang (haomaiw...@gmail.com); ceph-users@lists.ceph.com; 
> > ceph-de...@vger.kernel.org
> > Subject: Re: Regarding key/value interface
> > 
> > Hi Somnath,
> > 
> > On Fri, 12 Sep 2014, Somnath Roy wrote:
> > >
> > > Hi Sage/Haomai,
> > >
> > > If I have a key/value backend that support transaction, range 
> > > queries (and I don?t need any explicit caching etc.) and I want to 
> > > replace filestore (and leveldb omap) with that,  which interface you 
> > > recommend me to derive from , directly ObjectStore or  KeyValueDB ?
> > >
> > > I have already integrated this backend by deriving from ObjectStore 
> > > interfaces earlier (pre keyvalueinteface days) but not tested 
> > > thoroughly enough to see what functionality is broken (Basic 
> > > functionalities of RGW/RBD are working fine).
> > >
> > > Basically, I want to know what are the advantages (and 
> > > disadvantages) of deriving it from the new key/value interfaces ?
> > >
> > > Also, what state is it in ? Is it feature complete and supporting 
> > > all the ObjectStore interfaces like clone and all ?
> > 
> > Everything is supported, I think, for perhaps some IO hints that don't make 
> > sense in a k/v context.  The big things that you get by using KeyValueStore 
> > and plugging into the lower-level interface are:
> > 
> >  - striping of file data across keys
> >  - efficient clone
> >  - a zillion smaller methods that aren't conceptually difficult to 
> > implement bug tedious and to do so.
> > 
> > The other nice thing about reusing this code is that you can use a leveldb 
> > or rocksdb backend as a reference for testing or performance or whatever.
> > 
> > The main thing that will be a challenge going forward, I predict, is making 
> > storage of the object byte payload in key/value pairs efficient.  I think 
> > KeyValuestore is doing some simple striping, but it will suffer for small 
> > overwrites (like 512-byte or 4k writes from an RBD).  There are probably 
> > some pretty simple heuristics and tricks that can be done to mitigate the 
> > most common patterns, but there is no simple solution since the backends 
> > generally don't support partial value updates (I assume yours doesn't 
> > either?).  But, any work done here will benefit the other backends too so 
> > that would be a win..
> > 
> > sage
> > 
> > ________________________________
> > 
> > PLEASE NOTE: The information contained in this electronic mail message is 
> > intended only for the use of the designated recipient(s) named above. If 
> > the reader of this message is not the intended recipient, you are hereby 
> > notified that you have received this message in error and that any review, 
> > dissemination, distribution, or copying of this message is strictly 
> > prohibited. If you have received this communication in error, please notify 
> > the sender by telephone or e-mail (as shown above) immediately and destroy 
> > any and all copies of this message in your possession (whether hard copies 
> > or electronically stored copies).
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to