Re: Very large rows VS small rows

Jeremiah Jordan Thu, 29 Sep 2011 12:39:26 -0700

So I need to read what I write before hitting send. Should have been,"If A works for YOUR use case." and "Wide rows DON'T spread across nodeswell"


On 09/29/2011 02:34 PM, Jeremiah Jordan wrote:

If A works for our use case, it is a much better option. A given rowhas to be read in full to return data from it, there used to belimitations that a row had to fit in memory, but there is now code topage through the data, so while that isn't a limitation any more, itmeans rows that don't fit in memory are very slow to use. Also widerows spread across nodes. You should also consider more nodes in yourcluster. From our experience node perform better when they are onlymanaging a few Hundred GB each. Pretty sure that 10TB+ of data (100's* 100GB) will not perform very well on a 3 node cluster, especially ifyou plan to have RF=3, making it 10TB+ per node.
-Jeremiah

On 09/29/2011 12:20 PM, M Vieira wrote:
What would be the best approach
A) millions of ~2Kb rows, where each row could have ~6 columns
B) hundreds of ~100Gb rows, where each row could have ~1million columns

Considerarions:
Most entries will be searched for (read+write) at least once a daybut no more than 3 times a day.Cheap hardware accross the cluster of 3 nodes each with 16Gb mem(heap = 8Gb)
Any input would be appreciated
M.

Re: Very large rows VS small rows

Reply via email to