Re: hbase scan performance

Jerry Lam Wed, 09 Apr 2014 10:11:41 -0700

Hi Dave,

This is HBase solution to the poor scan performance issue:
https://issues.apache.org/jira/browse/HBASE-8369

I encountered the same issue before.
To the best of my knowledge, this is not a mapreduce issue. It is hbase
issue. If you are planning to swap out mapreduce and replace it with spark,
I don't think you can get a lot of performance from scanning HBase unless
you are talking about caching the results from HBase in spark and reuse it
over and over.

HTH,

Jerry

On Wed, Apr 9, 2014 at 12:02 PM, David Quigley <[email protected]> wrote:

> Hi all,
>
> We are currently using hbase to store user data and periodically doing a
> full scan to aggregate data. The reason we use hbase is that we need a
> single user's data to be contiguous, so as user data comes in, we need the
> ability to update a random access store.
>
> The performance of a full hbase scan with MapReduce is frustratingly slow,
> despite implementing recommended optimizations. I see that it is possible
> to scan hbase with Spark, but am not familiar with how Spark interfaces
> with hbase. Would you expect the scan to perform similarly if used as a
> Spark input as a MapReduce input?
>
> Thanks,
> Dave
>

Re: hbase scan performance

Reply via email to