Hi all,

We are currently using hbase to store user data and periodically doing a
full scan to aggregate data. The reason we use hbase is that we need a
single user's data to be contiguous, so as user data comes in, we need the
ability to update a random access store.

The performance of a full hbase scan with MapReduce is frustratingly slow,
despite implementing recommended optimizations. I see that it is possible
to scan hbase with Spark, but am not familiar with how Spark interfaces
with hbase. Would you expect the scan to perform similarly if used as a
Spark input as a MapReduce input?

Thanks,
Dave

Reply via email to