Hi all, We are currently using hbase to store user data and periodically doing a full scan to aggregate data. The reason we use hbase is that we need a single user's data to be contiguous, so as user data comes in, we need the ability to update a random access store.
The performance of a full hbase scan with MapReduce is frustratingly slow, despite implementing recommended optimizations. I see that it is possible to scan hbase with Spark, but am not familiar with how Spark interfaces with hbase. Would you expect the scan to perform similarly if used as a Spark input as a MapReduce input? Thanks, Dave