We store objects that are a couple of tens of K, sometimes 100K, and we
store quite a few of these per row, sometimes hundreds of thousands.

One problem we encountered early was that these rows would become so big
that C* couldn't compact the rows in-memory and had to revert to slow
two-pass compactions where it spills partially compacted rows to disk. we
solved that in two ways, first by
increasing in_memory_compaction_limit_in_mb from 64 to 128, and although it
helped a little bit we quickly realized didn't have much effect because
most of the time was taken up by really huge rows many times larger than
that.

We ended up implementing a simple sharding scheme where each row is
actually 36 rows that each contain 1/36 of the range (we take the first
letter in the column key and stick that on the row key on writes, and on
reads we read all 36 rows -- 36 because there are 36 letters and numbers in
the ascii alphabet and our column keys happen to distribute over that quite
nicely).

Cassandra works well with semi-large objects, and it works well with wide
rows, but you have to be careful about the combination where rows get
larger than 64 Mb.

T#


On Mon, Jul 8, 2013 at 8:13 PM, S Ahmed <sahmed1...@gmail.com> wrote:

> Hi Peter,
>
> Can you describe your environment, # of documents and what kind of usage
> pattern you have?
>
>
>
>
> On Mon, Jul 8, 2013 at 2:06 PM, Peter Lin <wool...@gmail.com> wrote:
>
>> I regularly store word and pdf docs in cassandra without any issues.
>>
>>
>>
>>
>> On Mon, Jul 8, 2013 at 1:46 PM, S Ahmed <sahmed1...@gmail.com> wrote:
>>
>>> I'm guessing that most people use cassandra to store relatively smaller
>>> payloads like 1-5kb in size.
>>>
>>> Is there anyone using it to store say 100kb (1/10 of a megabyte) and if
>>> so, was there any tweaking or gotchas that you ran into?
>>>
>>
>>
>

Reply via email to