Re: HBase read performance

Ted Yu Sun, 12 Oct 2014 21:05:38 -0700

Thanks for the confirmation. 

Looks like the combination of bucket cache and cache on write has a bug.


On Oct 12, 2014, at 8:58 PM, Khaled Elmeleegy <kd...@hotmail.com> wrote:

> It goes away, but i think that's because i am basically churning the cache 
> far less. My workload is mostly writes.
> 
>> Date: Sun, 12 Oct 2014 19:43:16 -0700
>> Subject: Re: HBase read performance
>> From: yuzhih...@gmail.com
>> To: user@hbase.apache.org
>> 
>> Hi,
>> Can you turn off cacheonwrite and keep bucket cache to see if the problem
>> goes away ?
>> 
>> Cheers
>> 
>> On Fri, Oct 10, 2014 at 10:59 AM, Khaled Elmeleegy <kd...@hotmail.com>
>> wrote:
>> 
>>> Yes, I can reproduce it with some work.
>>> The workload is basically as follows:
>>> There are writers streaming writes to a table. Then, there is a reader
>>> (invoked via a web interface). The reader does a 1000 parallel reverse
>>> scans, all end up hitting the same region in my case. The scans are
>>> effectively "gets" as I just need to get one record off of each of them. I
>>> just need to do a "reverse" get, which is not supported (would be great to
>>> have :)), so I do it via reverse scan. After few tries, the reader
>>> consistently hits this bug.
>>> 
>>> This happens with these config changes:
>>> hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G -XX:MaxDirectMemorySize=5G
>>> -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts -verbose:gc
>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog
>>> gc:/tmp/hbase-regionserver-gc.loghbase-site:
>>> hbase.bucketcache.ioengine=offheap
>>> hbase.bucketcache.size=4196
>>> hbase.rs.cacheblocksonwrite=true
>>> hfile.block.index.cacheonwrite=true
>>> hfile.block.bloom.cacheonwrite=true
>>> 
>>> Interestingly, without these config changes, I can't reproduce the problem.
>>> Khaled
>>> 
>>> 
>>>> Date: Fri, 10 Oct 2014 10:05:14 -0700
>>>> Subject: Re: HBase read performance
>>>> From: st...@duboce.net
>>>> To: user@hbase.apache.org
>>>> 
>>>> It looks like we are messing up our positioning math:
>>>> 
>>>> 233 <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#233
>>>> 
>>>> 
>>>> <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
>>>> 
>>>> 
>>>>    public final Buffer
>>>> <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer
>>>> 
>>>> <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
>>>> position(int
>>>> newPosition) {
>>>> 
>>>> 234 <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#234
>>>> 
>>>> 
>>>> <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
>>>> 
>>>> 
>>>>        if ((newPosition > limit
>>>> <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer.0limit
>>>> )
>>>> || (newPosition < 0))
>>>> 
>>>> 235 <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#235
>>>> 
>>>> 
>>>> <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#
>>>> 
>>>> 
>>>>            throw new IllegalArgumentException
>>>> <
>>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/IllegalArgumentException.java#IllegalArgumentException
>>>> ();
>>>> 
>>>> 
>>>> Is it easy to reproduce Khaled?  Always same region/store or spread
>>>> across all reads?
>>>> 
>>>> 
>>>> St.Ack
>>>> 
>>>> 
>>>> 
>>>> On Fri, Oct 10, 2014 at 8:31 AM, Khaled Elmeleegy <kd...@hotmail.com>
>>> wrote:
>>>> 
>>>>> Andrew thanks. I think this indeed was the problem.
>>>>> To get over it, I increased the amount of memory given the region
>>> server
>>>>> to avoid IO on reads. I used the configs below. In my experiments, I
>>> have
>>>>> writers streaming their output to HBase. The reader powers a web page
>>> and
>>>>> does this scatter/gather, where it reads 1000 keys written last and
>>> passes
>>>>> them the the front end. With this workload, I get the exception below
>>> at
>>>>> the region server. Again, I am using HBAse (0.98.6.1). Any help is
>>>>> appreciated.
>>>>> hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G -XX:MaxDirectMemorySize=5G
>>>>> -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts -verbose:gc
>>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog
>>>>> gc:/tmp/hbase-regionserver-gc.log
>>>>> hbase-site:hbase.bucketcache.ioengine=offheap
>>>>> 
>>>>>                                   hfile.block.cache.size=0.4
>>>>> 
>>>>> 
>>>>> hbase.bucketcache.size=4196
>>>>> 
>>>>>                        hbase.rs.cacheblocksonwrite=true
>>>>> 
>>>>> 
>>>>> hfile.block.index.cacheonwrite=true
>>>>> 
>>>>>                        hfile.block.bloom.cacheonwrite=true
>>>>> 
>>>>> 2014-10-10 15:06:44,173 ERROR
>>>>> [B.DefaultRpcServer.handler=62,queue=2,port=60020] ipc.RpcServer:
>>>>> Unexpected throwable object
>>>>> java.lang.IllegalArgumentException
>>>>>        at java.nio.Buffer.position(Buffer.java:236)
>>>>>        at
>>> org.apache.hadoop.hbase.util.ByteBufferUtils.skip(ByteBufferUtils.java:434)
>>>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:849)
>>>>>        at
>>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:760)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:248)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:176)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1780)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3758)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1950)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1936)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1913)
>>>>>        at
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3157)
>>>>>        at
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
>>>>>        at
>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
>>>>>        at
>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>>>>>        at
>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>>>>>        at
>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>>>>>        at java.lang.Thread.run(Thread.java:744)
>>>>>> From: apurt...@apache.org
>>>>>> Date: Tue, 7 Oct 2014 17:09:35 -0700
>>>>>> Subject: Re: HBase read performance
>>>>>> To: user@hbase.apache.org
>>>>>> 
>>>>>>> The cluster has 2 m1.large nodes.
>>>>>> 
>>>>>> That's the problem right there.
>>>>>> 
>>>>>> You need to look at c3.4xlarge or i2 instances as a minimum
>>> requirement.
>>>>> M1
>>>>>> and even M3 instance types have ridiculously poor IO.
>>>>>> 
>>>>>> 
>>>>>> On Tue, Oct 7, 2014 at 3:01 PM, Khaled Elmeleegy <kd...@hotmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Thanks Nicolas, Qiang.
>>>>>>> 
>>>>>>> I was able to write a simple program that reproduces the problem
>>> on a
>>>>> tiny
>>>>>>> HBase cluster on ec2. The cluster has 2 m1.large nodes. One node
>>> runs
>>>>> the
>>>>>>> master, name node and zookeeper. The other node runs a data node
>>> and a
>>>>>>> region server, with heap size configured to be 6GB. There, the 1000
>>>>>>> parallel reverse gets (reverse scans) take 7-8 seconds. The data
>>> set is
>>>>>>> tiny (10M records, each having a small number of bytes). As I said
>>>>> before,
>>>>>>> all hardware resources are very idle there.
>>>>>>> 
>>>>>>> Interestingly, running the same workload on my macbook, the 1000
>>>>> parallel
>>>>>>> gets take ~200ms on a pseudo-distributed installation.
>>>>>>> 
>>>>>>> Any help to resolve this mystery is highly appreciated.
>>>>>>> 
>>>>>>> P.S. please find my test program attached.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Khaled
>>>>>>> 
>>>>>>>> From: nkey...@gmail.com
>>>>>>>> Date: Mon, 6 Oct 2014 09:40:48 +0200
>>>>>>>> Subject: Re: HBase read performance
>>>>>>>> To: user@hbase.apache.org
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I haven't seen it mentioned, but if I understand correctly each
>>> scan
>>>>>>>> returns a single row? If so you should use Scan#setSmall to save
>>>>> some rpc
>>>>>>>> calls.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> 
>>>>>>>> Nicolas
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Oct 5, 2014 at 11:28 AM, Qiang Tian <tian...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> when using separate HConnection instance, both its
>>>>>>>>> RpcClient instance(maintain connection to a regionserver) and
>>>>> Registry
>>>>>>>>> instance(maintain connection to zookeeper) will be separate..
>>>>>>>>> 
>>>>>>>>> see
>>> http://shammijayasinghe.blogspot.com/2012/02/zookeeper-increase-maximum-number-of.html
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sun, Oct 5, 2014 at 2:24 PM, Khaled Elmeleegy <
>>>>> kd...@hotmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I tried creating my own HConnections pool to use for my HBase
>>>>> calls,
>>>>>>> so
>>>>>>>>>> that not all the (2K) threads share the same HConnection.
>>>>> However, I
>>>>>>>>> could
>>>>>>>>>> only have 10 HConnections. Beyond that I get ZK exceptions,
>>>>> please
>>>>>>> find
>>>>>>>>> it
>>>>>>>>>> below. Also, with 10 HConnections, I don't see noticeable
>>>>>>> improvement in
>>>>>>>>>> performance so far.
>>>>>>>>>> 2014-10-05 06:11:26,490 WARN [main]
>>>>> zookeeper.RecoverableZooKeeper
>>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
>>>>> transient
>>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181,
>>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05
>>>>>>>>> 06:11:26,490
>>>>>>>>>> INFO [main] util.RetryCounter
>>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155))
>>>>>>>>>> - Sleeping 1000ms before retry #0...2014-10-05 06:11:27,845
>>> WARN
>>>>>>> [main]
>>>>>>>>>> zookeeper.RecoverableZooKeeper
>>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
>>>>> transient
>>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181,
>>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05
>>>>>>>>> 06:11:27,849
>>>>>>>>>> INFO [main] util.RetryCounter
>>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155))
>>>>>>>>>> - Sleeping 2000ms before retry #1...2014-10-05 06:11:30,405
>>> WARN
>>>>>>> [main]
>>>>>>>>>> zookeeper.RecoverableZooKeeper
>>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
>>>>> transient
>>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181,
>>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05
>>>>>>>>> 06:11:30,405
>>>>>>>>>> INFO [main] util.RetryCounter
>>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155))
>>>>>>>>>> - Sleeping 4000ms before retry #2...2014-10-05 06:11:35,278
>>> WARN
>>>>>>> [main]
>>>>>>>>>> zookeeper.RecoverableZooKeeper
>>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
>>>>> transient
>>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181,
>>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05
>>>>>>>>> 06:11:35,279
>>>>>>>>>> INFO [main] util.RetryCounter
>>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155))
>>>>>>>>>> - Sleeping 8000ms before retry #3...2014-10-05 06:11:44,393
>>> WARN
>>>>>>> [main]
>>>>>>>>>> zookeeper.RecoverableZooKeeper
>>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly
>>>>> transient
>>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181,
>>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05
>>>>>>>>> 06:11:44,393
>>>>>>>>>> ERROR [main] zookeeper.RecoverableZooKeeper
>>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(255)) - ZooKeeper
>>> exists
>>>>>>> failed
>>>>>>>>>> after 4 attempts2014-10-05 06:11:44,394 WARN [main]
>>>>> zookeeper.ZKUtil
>>>>>>>>>> (ZKUtil.java:checkExists(482)) - hconnection-0x4e174f3b,
>>> quorum=
>>>>>>>>>> 54.68.206.252:2181, baseZNode=/hbase Unable to set watcher
>>> on
>>>>> znode
>>> (/hbase/hbaseid)org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid at
>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>>>>>>> at
>>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>>>>>>> at
>>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at
>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199)
>>>>>>>>>> at
>>> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479)
>>>>>>>>>> at
>>> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
>>>>>>>>>> at
>>> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
>>>>>>>>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:857)
>>>>>>>>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:662)
>>>>>>>>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>>>>>>> Method) at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>>>>>>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>>>>>>> at
>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>>>>>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:414)
>>>>>>>>>> at
>>> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:335)...
>>>>>>>>>>> From: kd...@hotmail.com
>>>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>>>> Subject: RE: HBase read performance
>>>>>>>>>>> Date: Fri, 3 Oct 2014 12:37:39 -0700
>>>>>>>>>>> 
>>>>>>>>>>> Lars, Ted, and Qiang,
>>>>>>>>>>> Thanks for all the input.
>>>>>>>>>>> Qiang: yes all the threads are in the same client process
>>>>> sharing
>>>>>>> the
>>>>>>>>>> same connection. And since I don't see hardware contention,
>>> may
>>>>> be
>>>>>>> there
>>>>>>>>> is
>>>>>>>>>> contention over this code path. I'll try using many
>>> connections
>>>>> and
>>>>>>> see
>>>>>>>>> if
>>>>>>>>>> it alleviates the problems and I'll report back.
>>>>>>>>>>> Thanks again,Khaled
>>>>>>>>>>> 
>>>>>>>>>>>> Date: Fri, 3 Oct 2014 15:18:30 +0800
>>>>>>>>>>>> Subject: Re: HBase read performance
>>>>>>>>>>>> From: tian...@gmail.com
>>>>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding to profiling, Andrew introduced
>>>>>>> http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
>>>>>>>>>> months
>>>>>>>>>>>> ago.
>>>>>>>>>>>> 
>>>>>>>>>>>> processCallTime comes from RpcServer#call, so it looks
>>> good?
>>>>>>>>>>>> 
>>>>>>>>>>>> I have a suspect:
>>>>>>> https://issues.apache.org/jira/browse/HBASE-11306
>>>>>>>>>>>> 
>>>>>>>>>>>> how many processes do you have for your 2000 threads?
>>>>>>>>>>>> if olny 1 process, those threads will share just 1
>>>>> connection to
>>>>>>> that
>>>>>>>>>>>> regionserver, there might be big contention on the RPC
>>> code
>>>>> path.
>>>>>>>>>> ---for
>>>>>>>>>>>> such case, could you try using different connections?
>>> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Fri, Oct 3, 2014 at 9:55 AM, Ted Yu <
>>> yuzhih...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Khaled:
>>>>>>>>>>>>> Do you have profiler such as jprofiler ?
>>>>>>>>>>>>> Profiling would give us more hint.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Otherwise capturing stack trace during the period of
>>>>> reverse
>>>>>>> scan
>>>>>>>>>> would
>>>>>>>>>>>>> help.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Oct 2, 2014 at 4:52 PM, lars hofhansl <
>>>>>>> la...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You might have the data in the OS buffer cache,
>>> without
>>>>> short
>>>>>>>>>> circuit
>>>>>>>>>>>>>> reading the region server has to request the block
>>> from
>>>>> the
>>>>>>> data
>>>>>>>>>> node
>>>>>>>>>>>>>> process, which then reads it from the block cache.
>>>>>>>>>>>>>> That is a few context switches per RPC that do not
>>> show
>>>>> up
>>>>>>> in CPU
>>>>>>>>>>>>> metrics.
>>>>>>>>>>>>>> In that you also would not see disk IO.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> If - as you say - you see a lot of evicted blocks the
>>>>> data
>>>>>>> *has*
>>>>>>>>>> to come
>>>>>>>>>>>>>> from the OS. If you do not see disk IO is *has* to
>>> come
>>>>> from
>>>>>>> the
>>>>>>>>> OS
>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>> I.e. there's more RAM on your boxes, and you should
>>>>> increase
>>>>>>> the
>>>>>>>>>> heap
>>>>>>>>>>>>> block
>>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> You can measure the context switches with vmstat.
>>> Other
>>>>> than
>>>>>>> that
>>>>>>>>>> I have
>>>>>>>>>>>>>> no suggestion until I reproduce the problem.
>>>>>>>>>>>>>> Also check the data locality index of the region
>>> server
>>>>> it
>>>>>>> should
>>>>>>>>>> be
>>>>>>>>>>>>> close
>>>>>>>>>>>>>> to 100%.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -- Lars
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ________________________________
>>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com>
>>>>>>>>>>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 3:24 PM
>>>>>>>>>>>>>> Subject: RE: HBase read performance
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Lars thanks a lot about all the tips. I'll make sure
>>> I
>>>>> cover
>>>>>>> all
>>>>>>>>>> of them
>>>>>>>>>>>>>> and get back to you. I am not sure they are the
>>>>> bottleneck
>>>>>>> though
>>>>>>>>>> as they
>>>>>>>>>>>>>> all are about optimizing physical resource usage. As
>>> I
>>>>> said,
>>>>>>> I
>>>>>>>>>> don't see
>>>>>>>>>>>>>> any contended physical resources now. I'll also try
>>> to
>>>>>>> reproduce
>>>>>>>>>> this
>>>>>>>>>>>>>> problem in a simpler environment and pass to you the
>>> test
>>>>>>> program
>>>>>>>>>> to play
>>>>>>>>>>>>>> with.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Couple of high level points to make. You are right
>>> that
>>>>> my
>>>>>>> use
>>>>>>>>>> case is
>>>>>>>>>>>>>> kind of a worst case for HBase reads. But, if things
>>> go
>>>>> the
>>>>>>> way
>>>>>>>>> you
>>>>>>>>>>>>>> described them, there should be tons of disk IO and
>>> that
>>>>>>> should
>>>>>>>>> be
>>>>>>>>>>>>> clearly
>>>>>>>>>>>>>> the bottleneck. This is not the case though. That's
>>> for
>>>>> the
>>>>>>>>> simple
>>>>>>>>>> reason
>>>>>>>>>>>>>> that this is done in a test environment (I am still
>>>>>>> prototyping),
>>>>>>>>>> and
>>>>>>>>>>>>> not a
>>>>>>>>>>>>>> lot of data is yet written to HBase. However for the
>>>>> real use
>>>>>>>>>> case, there
>>>>>>>>>>>>>> should writers constantly writing data to HBase and
>>>>> readers
>>>>>>>>>> occasionally
>>>>>>>>>>>>>> doing this scatter/gather. At steady state, things
>>> should
>>>>>>> only
>>>>>>>>> get
>>>>>>>>>> worse
>>>>>>>>>>>>>> and all the issues you mentioned should get far more
>>>>>>> pronounced.
>>>>>>>>>> At this
>>>>>>>>>>>>>> point, one can try to mitigate it using more memory
>>> or
>>>>> so. I
>>>>>>> am
>>>>>>>>>> not there
>>>>>>>>>>>>>> yet as I think I am hitting some software bottleneck,
>>>>> which I
>>>>>>>>>> don't know
>>>>>>>>>>>>>> how to work around.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Khaled
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ----------------------------------------
>>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 14:20:47 -0700
>>>>>>>>>>>>>>> From: la...@apache.org
>>>>>>>>>>>>>>> Subject: Re: HBase read performance
>>>>>>>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> OK... We might need to investigate this.
>>>>>>>>>>>>>>> Any chance that you can provide a minimal test
>>> program
>>>>> and
>>>>>>>>>> instruction
>>>>>>>>>>>>>> about how to set it up.
>>>>>>>>>>>>>>> We can do some profiling then.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> One thing to note is that with scanning HBase
>>> cannot
>>>>> use
>>>>>>> bloom
>>>>>>>>>> filters
>>>>>>>>>>>>>> to rule out HFiles ahead of time, it needs to look
>>> into
>>>>> all
>>>>>>> of
>>>>>>>>>> them.
>>>>>>>>>>>>>>> So you kind of hit on the absolute worst case:
>>>>>>>>>>>>>>> - random reads that do not fit into the block cache
>>>>>>>>>>>>>>> - cannot use bloom filters
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Few more question/comments:
>>>>>>>>>>>>>>> - Do you have short circuit reading enabled? If
>>> not,
>>>>> you
>>>>>>>>> should.
>>>>>>>>>>>>>>> - Is your table major compacted? That will reduce
>>> the
>>>>>>> number of
>>>>>>>>>> files
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> look at.
>>>>>>>>>>>>>>> - Did you disable Nagle's everywhere (enabled
>>>>> tcpnodelay)?
>>>>>>> It
>>>>>>>>>> disabled
>>>>>>>>>>>>>> by default in HBase, but necessarily in your install
>>> of
>>>>> HDFS.
>>>>>>>>>>>>>>> - Which version of HDFS are you using as backing
>>>>>>> filesystem?
>>>>>>>>>>>>>>> - If your disk is idle, it means the data fits into
>>>>> the OS
>>>>>>>>> buffer
>>>>>>>>>>>>> cache.
>>>>>>>>>>>>>> In turn that means that you increase the heap for the
>>>>> region
>>>>>>>>>> servers. You
>>>>>>>>>>>>>> can also use block encoding (FAST_DIFF) to try to
>>> make
>>>>> sure
>>>>>>> the
>>>>>>>>>> entire
>>>>>>>>>>>>>> working set fits into the cache.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> - Also try to reduce the block size - although if
>>> your
>>>>>>> overall
>>>>>>>>>> working
>>>>>>>>>>>>>> set does not fit in the heap it won't help much.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> This is a good section of the book to read through
>>>>>>> generally
>>>>>>>>>> (even
>>>>>>>>>>>>>> though you might know most of this already):
>>> http://hbase.apache.org/book.html#perf.configurations
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -- Lars
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com>
>>>>>>>>>>>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org
>>>> 
>>>>>>>>>>>>>>> Cc:
>>>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 11:27 AM
>>>>>>>>>>>>>>> Subject: RE: HBase read performance
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I do see a very brief spike in CPU (user/system),
>>> but
>>>>> it's
>>>>>>> no
>>>>>>>>>> where
>>>>>>>>>>>>> near
>>>>>>>>>>>>>> 0% idle. It goes from 99% idle down to something
>>> like 40%
>>>>>>> idle
>>>>>>>>> for
>>>>>>>>>> a
>>>>>>>>>>>>> second
>>>>>>>>>>>>>> or so. The thing to note, this is all on a test
>>> cluster,
>>>>> so
>>>>>>> no
>>>>>>>>>> real load.
>>>>>>>>>>>>>> Things are generally idle until i issue 2-3 of these
>>>>>>>>>> multi-scan-requests
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> render a web page. Then, you see the spike in the
>>> cpu and
>>>>>>> some
>>>>>>>>>> activity
>>>>>>>>>>>>> in
>>>>>>>>>>>>>> the network and disk, but nowhere near saturation.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If there are specific tests you'd like me to do to
>>>>> debug
>>>>>>> this,
>>>>>>>>>> I'd be
>>>>>>>>>>>>>> more than happy to do it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Khaled
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> ----------------------------------------
>>>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 11:15:59 -0700
>>>>>>>>>>>>>>>> From: la...@apache.org
>>>>>>>>>>>>>>>> Subject: Re: HBase read performance
>>>>>>>>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I still think you're waiting on disk. No IOWAIT?
>>> So
>>>>> CPU
>>>>>>> is not
>>>>>>>>>> waiting
>>>>>>>>>>>>>> a lot for IO. No high User/System CPU either?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If you see a lot of evicted block then each RPC
>>> has a
>>>>> high
>>>>>>>>>> chance of
>>>>>>>>>>>>>> requiring to bring an entire 64k block in. You'll
>>> see bad
>>>>>>>>>> performance
>>>>>>>>>>>>> with
>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We might need to trace this specific scenario.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -- Lars
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> ________________________________
>>>>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com>
>>>>>>>>>>>>>>>> To: "user@hbase.apache.org" <
>>> user@hbase.apache.org>
>>>>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 10:46 AM
>>>>>>>>>>>>>>>> Subject: RE: HBase read performance
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I've set the heap size to 6GB and I do have gc
>>>>> logging. No
>>>>>>>>> long
>>>>>>>>>> pauses
>>>>>>>>>>>>>> there -- occasional 0.1s or 0.2s.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Other than the discrepancy between what's
>>> reported on
>>>>> the
>>>>>>>>>> client and
>>>>>>>>>>>>>> what's reported at the RS, there is also the issue
>>> of not
>>>>>>> getting
>>>>>>>>>> proper
>>>>>>>>>>>>>> concurrency. So, even if a reverse get takes 100ms
>>> or so
>>>>>>> (this
>>>>>>>>> has
>>>>>>>>>> to be
>>>>>>>>>>>>>> mostly blocking on various things as no physical
>>>>> resource is
>>>>>>>>>> contended),
>>>>>>>>>>>>>> then the other gets/scans should be able to proceed
>>> in
>>>>>>> parallel,
>>>>>>>>>> so a
>>>>>>>>>>>>>> thousand concurrent gets/scans should finish in few
>>>>> hundreds
>>>>>>> of
>>>>>>>>> ms
>>>>>>>>>> not
>>>>>>>>>>>>> many
>>>>>>>>>>>>>> seconds. That's why I thought I'd increase the
>>> handlers
>>>>>>> count to
>>>>>>>>>> try to
>>>>>>>>>>>>> get
>>>>>>>>>>>>>> more concurrency, but it didn't help. So, there must
>>> be
>>>>>>> something
>>>>>>>>>> else.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Khaled
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> ----------------------------------------
>>>>>>>>>>>>>>>>> From: ndimi...@gmail.com
>>>>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 10:36:39 -0700
>>>>>>>>>>>>>>>>> Subject: Re: HBase read performance
>>>>>>>>>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Do check again on the heap size of the region
>>>>> servers.
>>>>>>> The
>>>>>>>>>> default
>>>>>>>>>>>>>>>>> unconfigured size is 1G; too small for much of
>>>>> anything.
>>>>>>>>> Check
>>>>>>>>>> your
>>>>>>>>>>>>> RS
>>>>>>>>>>>>>> logs
>>>>>>>>>>>>>>>>> -- look for lines produced by the JVMPauseMonitor
>>>>> thread.
>>>>>>>>> They
>>>>>>>>>>>>> usually
>>>>>>>>>>>>>>>>> correlate with long GC pauses or other
>>> process-freeze
>>>>>>> events.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Get is implemented as a Scan of a single row, so
>>> a
>>>>>>> reverse
>>>>>>>>>> scan of a
>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>> row should be functionally equivalent.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> In practice, I have seen discrepancy between the
>>>>>>> latencies
>>>>>>>>>> reported
>>>>>>>>>>>>> by
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> RS and the latencies experienced by the client.
>>> I've
>>>>> not
>>>>>>>>>> investigated
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> area thoroughly.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Thu, Oct 2, 2014 at 10:05 AM, Khaled
>>> Elmeleegy <
>>>>>>>>>> kd...@hotmail.com
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks Lars for your quick reply.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Yes performance is similar with less handlers (I
>>>>> tried
>>>>>>> with
>>>>>>>>>> 100
>>>>>>>>>>>>>> first).
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The payload is not big ~1KB or so. The working
>>> set
>>>>>>> doesn't
>>>>>>>>>> seem to
>>>>>>>>>>>>>> fit in
>>>>>>>>>>>>>>>>>> memory as there are many cache misses. However,
>>>>> disk is
>>>>>>> far
>>>>>>>>>> from
>>>>>>>>>>>>>> being a
>>>>>>>>>>>>>>>>>> bottleneck. I checked using iostat. I also
>>> verified
>>>>> that
>>>>>>>>>> neither the
>>>>>>>>>>>>>>>>>> network nor the CPU of the region server or the
>>>>> client
>>>>>>> are a
>>>>>>>>>>>>>> bottleneck.
>>>>>>>>>>>>>>>>>> This leads me to believe that likely this is a
>>>>> software
>>>>>>>>>> bottleneck,
>>>>>>>>>>>>>>>>>> possibly due to a misconfiguration on my side. I
>>>>> just
>>>>>>> don't
>>>>>>>>>> know how
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> debug it. A clear disconnect I see is the
>>> individual
>>>>>>> request
>>>>>>>>>> latency
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> reported by metrics on the region server (IPC
>>>>>>>>> processCallTime
>>>>>>>>>> vs
>>>>>>>>>>>>>> scanNext)
>>>>>>>>>>>>>>>>>> vs what's measured on the client. Does this
>>> sound
>>>>>>> right? Any
>>>>>>>>>> ideas
>>>>>>>>>>>>> on
>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>>>> to better debug it?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> About this trick with the timestamps to be able
>>> to
>>>>> do a
>>>>>>>>>> forward
>>>>>>>>>>>>> scan,
>>>>>>>>>>>>>>>>>> thanks for pointing it out. Actually, I am
>>> aware of
>>>>> it.
>>>>>>> The
>>>>>>>>>> problem
>>>>>>>>>>>>> I
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>> is, sometimes I want to get the key after a
>>>>> particular
>>>>>>>>>> timestamp and
>>>>>>>>>>>>>>>>>> sometimes I want to get the key before, so just
>>>>> relying
>>>>>>> on
>>>>>>>>>> the key
>>>>>>>>>>>>>> order
>>>>>>>>>>>>>>>>>> doesn't work. Ideally, I want a reverse get(). I
>>>>> thought
>>>>>>>>>> reverse
>>>>>>>>>>>>> scan
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>> do the trick though.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Khaled
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> ----------------------------------------
>>>>>>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 09:40:37 -0700
>>>>>>>>>>>>>>>>>>> From: la...@apache.org
>>>>>>>>>>>>>>>>>>> Subject: Re: HBase read performance
>>>>>>>>>>>>>>>>>>> To: user@hbase.apache.org
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi Khaled,
>>>>>>>>>>>>>>>>>>> is it the same with fewer threads? 1500 handler
>>>>> threads
>>>>>>>>>> seems to
>>>>>>>>>>>>> be a
>>>>>>>>>>>>>>>>>> lot. Typically a good number of threads depends
>>> on
>>>>> the
>>>>>>>>>> hardware
>>>>>>>>>>>>>> (number of
>>>>>>>>>>>>>>>>>> cores, number of spindles, etc). I cannot think
>>> of
>>>>> any
>>>>>>> type
>>>>>>>>> of
>>>>>>>>>>>>>> scenario
>>>>>>>>>>>>>>>>>> where more than 100 would give any improvement.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> How large is the payload per KV retrieved that
>>>>> way? If
>>>>>>>>> large
>>>>>>>>>> (as
>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>>>>> few 100k) you definitely want to lower the
>>> number
>>>>> of the
>>>>>>>>>> handler
>>>>>>>>>>>>>> threads.
>>>>>>>>>>>>>>>>>>> How much heap do you give the region server?
>>> Does
>>>>> the
>>>>>>>>>> working set
>>>>>>>>>>>>> fit
>>>>>>>>>>>>>>>>>> into the cache? (i.e. in the metrics, do you
>>> see the
>>>>>>>>> eviction
>>>>>>>>>> count
>>>>>>>>>>>>>> going
>>>>>>>>>>>>>>>>>> up, if so it does not fit into the cache).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> If the working set does not fit into the cache
>>>>>>> (eviction
>>>>>>>>>> count goes
>>>>>>>>>>>>>> up)
>>>>>>>>>>>>>>>>>> then HBase will need to bring a new block in
>>> from
>>>>> disk
>>>>>>> on
>>>>>>>>>> each Get
>>>>>>>>>>>>>>>>>> (assuming the Gets are more or less random as
>>> far
>>>>> as the
>>>>>>>>>> server is
>>>>>>>>>>>>>>>>>> concerned).
>>>>>>>>>>>>>>>>>>> In case you'll benefit from reducing the HFile
>>>>> block
>>>>>>> size
>>>>>>>>>> (from 64k
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> 8k or even 4k).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Lastly I don't think we tested the performance
>>> of
>>>>> using
>>>>>>>>>> reverse
>>>>>>>>>>>>> scan
>>>>>>>>>>>>>>>>>> this way, there is probably room to optimize
>>> this.
>>>>>>>>>>>>>>>>>>> Can you restructure your keys to allow forwards
>>>>>>> scanning?
>>>>>>>>> For
>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>>>> you could store the time as MAX_LONG-time. Or
>>> you
>>>>> could
>>>>>>>>>> invert all
>>>>>>>>>>>>>> the bits
>>>>>>>>>>>>>>>>>> of the time portion of the key, so that it sort
>>> the
>>>>>>> other
>>>>>>>>>> way. Then
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>> could do a forward scan.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Let us know how it goes.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> -- Lars
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com>
>>>>>>>>>>>>>>>>>>> To: "user@hbase.apache.org" <
>>> user@hbase.apache.org
>>>>>> 
>>>>>>>>>>>>>>>>>>> Cc:
>>>>>>>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 12:12 AM
>>>>>>>>>>>>>>>>>>> Subject: HBase read performance
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I am trying to do a scatter/gather on hbase
>>>>> (0.98.6.1),
>>>>>>>>>> where I
>>>>>>>>>>>>> have
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> client reading ~1000 keys from an HBase table.
>>> These
>>>>>>> keys
>>>>>>>>>> happen to
>>>>>>>>>>>>>> fall on
>>>>>>>>>>>>>>>>>> the same region server. For my reads I use
>>> reverse
>>>>> scan
>>>>>>> to
>>>>>>>>>> read each
>>>>>>>>>>>>>> key as
>>>>>>>>>>>>>>>>>> I want the key prior to a specific time stamp
>>> (time
>>>>>>> stamps
>>>>>>>>> are
>>>>>>>>>>>>> stored
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> reverse order). I don't believe gets can
>>> accomplish
>>>>>>> that,
>>>>>>>>>> right? so
>>>>>>>>>>>>> I
>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>> scan, with caching set to 1.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I use 2000 reader threads in the client and on
>>>>> HBase,
>>>>>>> I've
>>>>>>>>>> set
>>>>>>>>>>>>>>>>>> hbase.regionserver.handler.count to 1500. With
>>> this
>>>>>>> setup,
>>>>>>>>> my
>>>>>>>>>>>>> scatter
>>>>>>>>>>>>>>>>>> gather is very slow and can take up to 10s in
>>> total.
>>>>>>> Timing
>>>>>>>>> an
>>>>>>>>>>>>>> individual
>>>>>>>>>>>>>>>>>> getScanner(..) call on the client side, it can
>>>>> easily
>>>>>>> take
>>>>>>>>> few
>>>>>>>>>>>>>> hundreds of
>>>>>>>>>>>>>>>>>> ms. I also got the following metrics from the
>>> region
>>>>>>> server
>>>>>>>>> in
>>>>>>>>>>>>>> question:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> "queueCallTime_mean" : 2.190855525775637,
>>>>>>>>>>>>>>>>>>> "queueCallTime_median" : 0.0,
>>>>>>>>>>>>>>>>>>> "queueCallTime_75th_percentile" : 0.0,
>>>>>>>>>>>>>>>>>>> "queueCallTime_95th_percentile" : 1.0,
>>>>>>>>>>>>>>>>>>> "queueCallTime_99th_percentile" :
>>>>> 556.9799999999818,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> "processCallTime_min" : 0,
>>>>>>>>>>>>>>>>>>> "processCallTime_max" : 12755,
>>>>>>>>>>>>>>>>>>> "processCallTime_mean" : 105.64873440912682,
>>>>>>>>>>>>>>>>>>> "processCallTime_median" : 0.0,
>>>>>>>>>>>>>>>>>>> "processCallTime_75th_percentile" : 2.0,
>>>>>>>>>>>>>>>>>>> "processCallTime_95th_percentile" : 7917.95,
>>>>>>>>>>>>>>>>>>> "processCallTime_99th_percentile" : 8876.89,
>>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_min"
>>>>>>>>>>>>>>>>>> : 89,
>>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_max"
>>>>>>>>>>>>>>>>>> : 11300,
>>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_mean"
>>>>>>>>>>>>>>>>>> : 654.4949739797315,
>>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_median"
>>>>>>>>>>>>>>>>>> : 101.0,
>>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_75th_percentile"
>>>>>>>>>>>>>>>>>> : 101.0,
>>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_95th_percentile"
>>>>>>>>>>>>>>>>>> : 101.0,
>>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_99th_percentile"
>>>>>>>>>>>>>>>>>> : 113.0,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Where "delta" is the name of the table I am
>>>>> querying.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> In addition to all this, i monitored the
>>> hardware
>>>>>>> resources
>>>>>>>>>> (CPU,
>>>>>>>>>>>>>> disk,
>>>>>>>>>>>>>>>>>> and network) of both the client and the region
>>>>> server
>>>>>>> and
>>>>>>>>>> nothing
>>>>>>>>>>>>>> seems
>>>>>>>>>>>>>>>>>> anywhere near saturation. So I am puzzled by
>>> what's
>>>>>>> going on
>>>>>>>>>> and
>>>>>>>>>>>>>> where this
>>>>>>>>>>>>>>>>>> time is going.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Few things to note based on the above
>>> measurements:
>>>>>>> both
>>>>>>>>>> medians of
>>>>>>>>>>>>>> IPC
>>>>>>>>>>>>>>>>>> processCallTime and queueCallTime are basically
>>> zero
>>>>>>> (ms I
>>>>>>>>>> presume,
>>>>>>>>>>>>>>>>>> right?). However, scanNext_median is 101 (ms
>>> too,
>>>>>>> right?). I
>>>>>>>>>> am not
>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>> how this adds up. Also, even though the 101
>>> figure
>>>>> seems
>>>>>>>>>>>>> outrageously
>>>>>>>>>>>>>> high
>>>>>>>>>>>>>>>>>> and I don't know why, still all these scans
>>> should
>>>>> be
>>>>>>>>>> happening in
>>>>>>>>>>>>>>>>>> parallel, so the overall call should finish
>>> fast,
>>>>> given
>>>>>>> that
>>>>>>>>>> no
>>>>>>>>>>>>>> hardware
>>>>>>>>>>>>>>>>>> resource is contended, right? but this is not
>>> what's
>>>>>>>>>> happening, so I
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>>>> to be missing something(s).
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> So, any help is appreciated there.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Khaled
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> 
>>>>>>   - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>>>>> (via Tom White)
>

Re: HBase read performance

Reply via email to