****** 3. In my test below, I see there is now 8Gig of data and 9,000,000 rows. 
 Does that sound right?,  nearly 1MB of space is used per row for a 50 column 
row????  That sounds like a huge amount of overhead. (my values are long on 
every column, but that is still not much).  I was expecting KB / row maybe, but 
MB / row?  My column names are "col"+I as well so they are very short too.

A common configuration is 1T drives per node, so I was wondering if anyone ran 
any tests with map/reduce on reading in all those rows(not doing anything with 
it, just reading it in).

****** 1. How long does it take to go through the 500MB that would be on that 
node?

I ran some tests on just writing a fake table in 50 columns wide and am seeing 
it will take about 31 hours to write 500MB of information (a node is about full 
at 500MB since need to reserve 50-30% space for compaction and such).  Ie. If I 
need to rerun any kind of indexing, it will take 31 hours…does this sound about 
normal/ballpark?  Obviously many nodes will be below so that would be worst 
case with 1 T drives.

****** 2. Anyone have any other data?

Thanks,
Dean

Reply via email to