Some discussion of large data here 
http://wiki.apache.org/cassandra/LargeDataSetConsiderations

When creating large rows you also need to be aware of 
in_memory_compaction_limit_in_mb (see the yaml) and that all columns for a row 
are stored on the same node. So if you store one file in a one row you may not 
get the best load distribution. 

I've heard mention before that 10MB is a reasonable max for a row if you have 
no natural partitions. 

That said CFS in Brisk put each block on a row, and used columns for the sub 
blocks. And the default settings for HFS are 

 <!-- 64 MB default --> 
<property>
  <name>fs.local.block.size</name>
  <value>67108864</value> 
</property>

<!-- 2 MB SubBlock Size -->
<property>
  <name>fs.local.subblock.size</name>
  <value>2097152</value> 
</property>

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 24/09/2011, at 9:27 PM, Radim Kolar wrote:

> Dne 24.9.2011 0:05, Jonathan Ellis napsal(a):
>> Really large messages are not encouraged because they will fragment
>> your heap quickly.  Other than that, no.
> what is recommended chunk size for storing multi gigabyte files in cassandra? 
> 64MB is okay or its too large?

Reply via email to