Re: Document storage

Ben McCann Thu, 29 Mar 2012 15:14:02 -0700

Cool.  How were you thinking we should store the data?  As a stanardized
composite column (e.g. potentially a list as ["fieldName", <TimeUUID>]:
"fieldValue" and a set as  ["fieldName",  "fieldValue" ]:"")?  Or as a new
column type?



On Thu, Mar 29, 2012 at 12:35 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> I kind of hijacked
> https://issues.apache.org/jira/browse/CASSANDRA-3647 ("Sylvain
> suggests we start with (non-nested) lists, maps, and sets. I agree
> that this is a great 80/20 approach to the problem") but we could
> split it out to another ticket.
>
> On Thu, Mar 29, 2012 at 2:24 PM, Ben McCann <b...@benmccann.com> wrote:
> > Thanks Jonathan.  The only reason I suggested JSON was because it already
> > has support for lists.  Native support for lists in Cassandra would more
> > than satisfy me.  Are there any existing proposals or a bug I can follow?
> >  I'm not familiar with the Cassandra codebase, so I'm not entirely sure
> how
> > helpful I can be, but I'd certainly be interested in taking a look to see
> > what's required.
> >
> > -Ben
> >
> >
> > On Thu, Mar 29, 2012 at 12:19 PM, Brian O'Neill <b...@alumni.brown.edu
> >wrote:
> >
> >> Jonathan,
> >>
> >> I was actually going to take this up with Nate McCall a few weeks back.
>  I
> >> think it might make sense to get the client development community
> together
> >> (Netflix w/ Astyanax, Hector, Pycassa, Virgil, etc.)
> >>
> >> I agree whole-heartedly that it shouldn't go into the database for all
> the
> >> reasons you point out.
> >>
> >> If we can all decide on some standards for data storage (e.g. composite
> >> types), indexing strategies, etc.  We can provide higher-level functions
> >> through the client libraries and also provide interoperability between
> >> them.  (without bloating Cassandra)
> >>
> >> CCing Nate.  Nate, thoughts?
> >> I wouldn't mind coordinating/facilitating the conversation.  If we know
> >> who should be involved.
> >>
> >> -brian
> >>
> >> ----
> >> Brian O'Neill
> >> Lead Architect, Software Development
> >> Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
> >> p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
> >> blog: http://brianoneill.blogspot.com/
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 3/29/12 3:06 PM, "Ben McCann" <b...@benmccann.com> wrote:
> >>
> >> >Jonathan, I asked Brian about his REST
> >> >API<
> >> https://groups.google.com/forum/?fromgroups#!topic/virgil-users/oncBas
> >> >9C8Us>and
> >> >he said he does not take the json objects and split them because the
> >> >client libraries do not agree on implementations.  This was exactly my
> >> >concern as well with this solution.  I would be perfectly happy to do
> it
> >> >this way instead of using JSON if it were standardized.  The reason I
> >> >suggested JSON is that it is standardized.  As far as I can tell,
> >> >Cassandra
> >> >doesn't support maps and lists in a standardized way today, which is
> the
> >> >root of my problem.
> >> >
> >> >-Ben
> >> >
> >> >
> >> >On Thu, Mar 29, 2012 at 11:30 AM, Drew Kutcharian <d...@venarc.com>
> >> wrote:
> >> >
> >> >> Yes, I meant the "row header index". What I have done is that I'm
> >> >>storing
> >> >> an object (i.e. UserProfile) where you read or write it as a whole (a
> >> >>user
> >> >> updates their user details in a single page in the UI). So I
> serialize
> >> >>that
> >> >> object into a binary JSON using SMILE format. I then compress it
> using
> >> >> Snappy on the client side. So as far as Cassandra cares it's storing
> a
> >> >> byte[].
> >> >>
> >> >> Now on the client side, I'm using cassandra-cli with a custom type
> that
> >> >> knows how to turn a byte[] into a JSON text and back. The only issue
> was
> >> >> CASSANDRA-4081 where "assume" doesn't work with custom types. If
> >> >> CASSANDRA-4081 gets fixed, I'll get the best of both worlds.
> >> >>
> >> >> Also advantages of this vs. the thrift based Super Column families
> are:
> >> >>
> >> >> 1. Saving extra CPU usage on the Cassandra nodes. Since
> >> >> serialize/deserialize and compression/decompression happens on the
> >> >>client
> >> >> nodes where there is plenty idle CPU time
> >> >>
> >> >> 2. Saving network bandwidth since I'm sending over a compressed
> byte[]
> >> >>
> >> >>
> >> >> -- Drew
> >> >>
> >> >>
> >> >>
> >> >> On Mar 29, 2012, at 11:16 AM, Jonathan Ellis wrote:
> >> >>
> >> >> > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian <d...@venarc.com>
> >> >> wrote:
> >> >> >>> I think this is a much better approach because that gives you the
> >> >> >>> ability to update or retrieve just parts of objects efficiently,
> >> >> >>> rather than making column values just blobs with a bunch of
> special
> >> >> >>> case logic to introspect them.  Which feels like a big step
> >> >>backwards
> >> >> >>> to me.
> >> >> >>
> >> >> >> Unless your access pattern involves reading/writing the whole
> >> >>document
> >> >> each time. In that case you're better off serializing the whole
> document
> >> >> and storing it in a column as a byte[] without incurring the
> overhead of
> >> >> column indexes. Right?
> >> >> >
> >> >> > Hmm, not sure what you're thinking of there.
> >> >> >
> >> >> > If you mean the "index" that's part of the row header for random
> >> >> > access within a row, then no, serializing to byte[] doesn't save
> you
> >> >> > anything.
> >> >> >
> >> >> > If you mean secondary indexes, don't declare any if you don't want
> >> >>any.
> >> >> :)
> >> >> >
> >> >> > Just telling C* to store a byte[] *will* be slightly lighter-weight
> >> >> > than giving it named columns, but we're talking negligible
> compared to
> >> >> > the overhead of actually moving the data on or off disk in the
> first
> >> >> > place.  Not even close to being worth giving up being able to deal
> >> >> > with your data from standard tools like cqlsh, IMO.
> >> >> >
> >> >> > --
> >> >> > Jonathan Ellis
> >> >> > Project Chair, Apache Cassandra
> >> >> > co-founder of DataStax, the source for professional Cassandra
> support
> >> >> > http://www.datastax.com
> >> >>
> >> >>
> >>
> >>
> >>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Document storage

Reply via email to