Chunk the blobs and store them in a separate table from the metadata. Here's an old attempt at a chunked object store, for reference: https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store
Picking an appropriate chunk size may be key (or not). Somewhere between 8K and 512K, I would guess, but it probably doesn't matter a lot and could be tuned and even configured dynamically. With a smaller chunk size you have the option of reading lots of small chunks with separate requests or as a slice. It seems like there has been a fair amount of negative sentiment about using Cassandra as a blob/object store, but I personally do think it is workable to at least some extent. A lot of my background is with Solr for search, including a little with DSE Search. -- Jack Krupansky On Mon, Jan 18, 2016 at 9:52 PM, Kevin Burton <bur...@spinn3r.com> wrote: > Internally we have the need for a blob store for web content. It's MOSTLY > key, ,value based but we'd like to have lookups by coarse grained tags. > > This needs to store normal web content like HTML , CSS, JPEG, SVG, etc. > > Highly doubt that anything over 5MB would need to be stored. > > We also need the ability to store older versions of the same URL for > features like "time travel" where we can see what the web looks like over > time. > > I initially wrote this for Elasticsearch (and it works well for that) but > it looks like binaries snuck into the set of requirements. > > I could Base64 encode/decode them in ES I guess but that seems ugly. > > I was thinking of porting this over to CS but I'm not up to date on the > current state of blobs in C*... > > Any advice? > > -- > > We’re hiring if you know of any awesome Java Devops or Linux Operations > Engineers! > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > >