Do we also need to consider the client API?
If we don't adjust thrift, the client just gets bytes right?
The client is on their own to marshal back into a structure.  In this
case, it seems like we would want to chose a standard that is efficient
and for which there are common libraries.  Protobuf seems to fit the bill
here.  

Or do we pass back some other structure?  (Native lists/maps? JSON
strings?)

Do we ignore sorting/comparators?
(similar to SOLR, I'm not sure people have defined a good sort for
multi-valued items)

-brian

---- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



On 3/30/12 12:01 PM, "Daniel Doubleday" <daniel.double...@gmx.net> wrote:

>> Just telling C* to store a byte[] *will* be slightly lighter-weight
>> than giving it named columns, but we're talking negligible compared to
>> the overhead of actually moving the data on or off disk in the first
>> place. 
>Hm - but isn't this exactly the point? You don't want to move data off
>disk.
>But decomposing into columns will lead to more of that:
>
>- Total amount of serialized data is (in most cases a lot) larger than
>protobuffed / compressed version
>- If you do selective updates the document will be scattered over
>multiple ssts plus if you do sliced reads you can't optimize reads as
>opposed to the single column version that when updated is automatically
>superseding older versions so most reads will hit only one sst
>
>All these reads make the hot dataset. If it fits the page cache your
>fine. If it doesn't you need to buy more iron.
>
>Really could not resist because your statement seems to be contrary to
>all our tests / learnings.
>
>Cheers,
>Daniel
>
>From dev list:
>
>Re: Document storage
>On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian <d...@venarc.com> wrote:
>>> I think this is a much better approach because that gives you the
>>> ability to update or retrieve just parts of objects efficiently,
>>> rather than making column values just blobs with a bunch of special
>>> case logic to introspect them.  Which feels like a big step backwards
>>> to me.
>>
>> Unless your access pattern involves reading/writing the whole document
>>each time. In
>that case you're better off serializing the whole document and storing it
>in a column as a
>byte[] without incurring the overhead of column indexes. Right?
>
>Hmm, not sure what you're thinking of there.
>
>If you mean the "index" that's part of the row header for random
>access within a row, then no, serializing to byte[] doesn't save you
>anything.
>
>If you mean secondary indexes, don't declare any if you don't want any. :)
>
>Just telling C* to store a byte[] *will* be slightly lighter-weight
>than giving it named columns, but we're talking negligible compared to
>the overhead of actually moving the data on or off disk in the first
>place.  Not even close to being worth giving up being able to deal
>with your data from standard tools like cqlsh, IMO.
>
>-- 
>Jonathan Ellis
>Project Chair, Apache Cassandra
>co-founder of DataStax, the source for professional Cassandra support
>http://www.datastax.com
>


Reply via email to