Has anyone done performance tests on sstable reading vs. M/R? I did a quick test on reading all SSTAbles in a LCS column family on 23 tables and took the average time it took sstable2json(to /dev/null to make it faster) which was 7 seconds per table. (reading to stdout took 16 seconds per table). This then worked out to an estimation of 12.5 hours up to 27 hours(from to stdout calculation). I am suspecting the map/reduce time may be much worse since there are not as many repeated rows in LCS????
Ie. I am wondering if I should just read from SSTAbles directly instead of map/reduce? I am about to dig around in the code of M/R and sstable2json to see what each is doing specifically. Thanks, Dean