****** 3. In my test below, I see there is now 8Gig of data and 9,000,000 rows. Does that sound right?, nearly 1MB of space is used per row for a 50 column row???? That sounds like a huge amount of overhead. (my values are long on every column, but that is still not much). I was expecting KB / row maybe, but MB / row? My column names are "col"+I as well so they are very short too.
A common configuration is 1T drives per node, so I was wondering if anyone ran any tests with map/reduce on reading in all those rows(not doing anything with it, just reading it in). ****** 1. How long does it take to go through the 500MB that would be on that node? I ran some tests on just writing a fake table in 50 columns wide and am seeing it will take about 31 hours to write 500MB of information (a node is about full at 500MB since need to reserve 50-30% space for compaction and such). Ie. If I need to rerun any kind of indexing, it will take 31 hours…does this sound about normal/ballpark? Obviously many nodes will be below so that would be worst case with 1 T drives. ****** 2. Anyone have any other data? Thanks, Dean