Jean-Daniel Cryans wrote:
You expect a MapReduce job to be faster than a Scan on small data, your expectation is wrong.
never expected a MR job to be faster for every context
There's a minimal cost to every MR job, which is of a few seconds, and you can't go around it.
for sure there is an overhead for MR job, and a few seconds are OK, but not a whole minute... so what time can be expected for processing a full scan of i.e. 1.000.000.000 rows in an hbase cluster with i.e. 3 region servers? i'm just wondering, if its worth to run the full scan only once a day, and to persist the results i hoped to be able to process it on demand, but if it takes too much time, its not acceptable andre
