Cool. How were you thinking we should store the data? As a stanardized composite column (e.g. potentially a list as ["fieldName", <TimeUUID>]: "fieldValue" and a set as ["fieldName", "fieldValue" ]:"")? Or as a new column type?
On Thu, Mar 29, 2012 at 12:35 PM, Jonathan Ellis <jbel...@gmail.com> wrote: > I kind of hijacked > https://issues.apache.org/jira/browse/CASSANDRA-3647 ("Sylvain > suggests we start with (non-nested) lists, maps, and sets. I agree > that this is a great 80/20 approach to the problem") but we could > split it out to another ticket. > > On Thu, Mar 29, 2012 at 2:24 PM, Ben McCann <b...@benmccann.com> wrote: > > Thanks Jonathan. The only reason I suggested JSON was because it already > > has support for lists. Native support for lists in Cassandra would more > > than satisfy me. Are there any existing proposals or a bug I can follow? > > I'm not familiar with the Cassandra codebase, so I'm not entirely sure > how > > helpful I can be, but I'd certainly be interested in taking a look to see > > what's required. > > > > -Ben > > > > > > On Thu, Mar 29, 2012 at 12:19 PM, Brian O'Neill <b...@alumni.brown.edu > >wrote: > > > >> Jonathan, > >> > >> I was actually going to take this up with Nate McCall a few weeks back. > I > >> think it might make sense to get the client development community > together > >> (Netflix w/ Astyanax, Hector, Pycassa, Virgil, etc.) > >> > >> I agree whole-heartedly that it shouldn't go into the database for all > the > >> reasons you point out. > >> > >> If we can all decide on some standards for data storage (e.g. composite > >> types), indexing strategies, etc. We can provide higher-level functions > >> through the client libraries and also provide interoperability between > >> them. (without bloating Cassandra) > >> > >> CCing Nate. Nate, thoughts? > >> I wouldn't mind coordinating/facilitating the conversation. If we know > >> who should be involved. > >> > >> -brian > >> > >> ---- > >> Brian O'Neill > >> Lead Architect, Software Development > >> Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406 > >> p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/ > >> blog: http://brianoneill.blogspot.com/ > >> > >> > >> > >> > >> > >> > >> > >> On 3/29/12 3:06 PM, "Ben McCann" <b...@benmccann.com> wrote: > >> > >> >Jonathan, I asked Brian about his REST > >> >API< > >> https://groups.google.com/forum/?fromgroups#!topic/virgil-users/oncBas > >> >9C8Us>and > >> >he said he does not take the json objects and split them because the > >> >client libraries do not agree on implementations. This was exactly my > >> >concern as well with this solution. I would be perfectly happy to do > it > >> >this way instead of using JSON if it were standardized. The reason I > >> >suggested JSON is that it is standardized. As far as I can tell, > >> >Cassandra > >> >doesn't support maps and lists in a standardized way today, which is > the > >> >root of my problem. > >> > > >> >-Ben > >> > > >> > > >> >On Thu, Mar 29, 2012 at 11:30 AM, Drew Kutcharian <d...@venarc.com> > >> wrote: > >> > > >> >> Yes, I meant the "row header index". What I have done is that I'm > >> >>storing > >> >> an object (i.e. UserProfile) where you read or write it as a whole (a > >> >>user > >> >> updates their user details in a single page in the UI). So I > serialize > >> >>that > >> >> object into a binary JSON using SMILE format. I then compress it > using > >> >> Snappy on the client side. So as far as Cassandra cares it's storing > a > >> >> byte[]. > >> >> > >> >> Now on the client side, I'm using cassandra-cli with a custom type > that > >> >> knows how to turn a byte[] into a JSON text and back. The only issue > was > >> >> CASSANDRA-4081 where "assume" doesn't work with custom types. If > >> >> CASSANDRA-4081 gets fixed, I'll get the best of both worlds. > >> >> > >> >> Also advantages of this vs. the thrift based Super Column families > are: > >> >> > >> >> 1. Saving extra CPU usage on the Cassandra nodes. Since > >> >> serialize/deserialize and compression/decompression happens on the > >> >>client > >> >> nodes where there is plenty idle CPU time > >> >> > >> >> 2. Saving network bandwidth since I'm sending over a compressed > byte[] > >> >> > >> >> > >> >> -- Drew > >> >> > >> >> > >> >> > >> >> On Mar 29, 2012, at 11:16 AM, Jonathan Ellis wrote: > >> >> > >> >> > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian <d...@venarc.com> > >> >> wrote: > >> >> >>> I think this is a much better approach because that gives you the > >> >> >>> ability to update or retrieve just parts of objects efficiently, > >> >> >>> rather than making column values just blobs with a bunch of > special > >> >> >>> case logic to introspect them. Which feels like a big step > >> >>backwards > >> >> >>> to me. > >> >> >> > >> >> >> Unless your access pattern involves reading/writing the whole > >> >>document > >> >> each time. In that case you're better off serializing the whole > document > >> >> and storing it in a column as a byte[] without incurring the > overhead of > >> >> column indexes. Right? > >> >> > > >> >> > Hmm, not sure what you're thinking of there. > >> >> > > >> >> > If you mean the "index" that's part of the row header for random > >> >> > access within a row, then no, serializing to byte[] doesn't save > you > >> >> > anything. > >> >> > > >> >> > If you mean secondary indexes, don't declare any if you don't want > >> >>any. > >> >> :) > >> >> > > >> >> > Just telling C* to store a byte[] *will* be slightly lighter-weight > >> >> > than giving it named columns, but we're talking negligible > compared to > >> >> > the overhead of actually moving the data on or off disk in the > first > >> >> > place. Not even close to being worth giving up being able to deal > >> >> > with your data from standard tools like cqlsh, IMO. > >> >> > > >> >> > -- > >> >> > Jonathan Ellis > >> >> > Project Chair, Apache Cassandra > >> >> > co-founder of DataStax, the source for professional Cassandra > support > >> >> > http://www.datastax.com > >> >> > >> >> > >> > >> > >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >