Chunk the blobs and store them in a separate table from the metadata.

Here's an old attempt at a chunked object store, for reference:
https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store

Picking an appropriate chunk size may be key (or not). Somewhere between 8K
and 512K, I would guess, but it probably doesn't matter a lot and could be
tuned and even configured dynamically.
With a smaller chunk size you have the option of reading lots of small
chunks with separate requests or as a slice.

It seems like there has been a fair amount of negative sentiment about
using Cassandra as a blob/object store, but I personally do think it is
workable to at least some extent.

A lot of my background is with Solr for search, including a little with DSE
Search.


-- Jack Krupansky

On Mon, Jan 18, 2016 at 9:52 PM, Kevin Burton <bur...@spinn3r.com> wrote:

> Internally we have the need for a blob store for web content.  It's MOSTLY
> key, ,value based but we'd like to have lookups by coarse grained tags.
>
> This needs to store normal web content like HTML , CSS, JPEG, SVG, etc.
>
> Highly doubt that anything over 5MB would need to be stored.
>
> We also need the ability to store older versions of the same URL for
> features like "time travel" where we can see what the web looks like over
> time.
>
> I initially wrote this for Elasticsearch (and it works well for that) but
> it looks like binaries snuck into the set of requirements.
>
> I could Base64 encode/decode them in ES I guess but that seems ugly.
>
> I was thinking of porting this over to CS but I'm not up to date on the
> current state of blobs in C*...
>
> Any advice?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
>
>

Reply via email to