Yes, please file a JIRA and attach the test there. -- Sean On Oct 13, 2014 10:22 PM, "Khaled Elmeleegy" <kd...@hotmail.com> wrote:
> Actually, it's very easy to reproduce. I have a simple program that > reproduces it if run against a simple toy cluster with one RS. If shared > the program with Ted. I can file a jira ticket about it if you want. > Khaled > > > Date: Mon, 13 Oct 2014 19:12:13 -0700 > > From: la...@apache.org > > Subject: Re: HBase read performance > > To: user@hbase.apache.org > > > > This looks like something to track. > > If there's an easy way to reproduce this, could you file a jira with > instructions? > > (Although from the conversations here, it looks like it's not trivial to > reproduce). > > > > -- Lars > > > > > > > > ________________________________ > > From: Khaled Elmeleegy <kd...@hotmail.com> > > To: "user@hbase.apache.org" <user@hbase.apache.org> > > Sent: Monday, October 13, 2014 12:13 PM > > Subject: RE: HBase read performance > > > > > > I went through a lengthy testing process with Ted. I tried both > disabling just cache on write and both COW and BC. Disabling them makes the > problem go away. I think disabling COW in my workload effectively doesn't > exercise the block cache as my workload is mostly writes, so disabling COW > doesn't cause the cache blocks to churn. > > I tried both overcommitting memory and not. This is irrelevant though. > The cache operates on the virtual memory offered to it and it's agnostic to > the physical memory size. Memory over commitment may only hurt performance > due to swapping by the OS, but this has nothing to do with the bug. > > Khaled > > > > > Date: Sun, 12 Oct 2014 23:17:24 -0700 > > > Subject: Re: HBase read performance > > > From: st...@duboce.net > > > To: user@hbase.apache.org > > > > > > On Sun, Oct 12, 2014 at 8:58 PM, Khaled Elmeleegy <kd...@hotmail.com> > wrote: > > > > > > > It goes away, but i think that's because i am basically churning the > cache > > > > far less. My workload is mostly writes. > > > > > > > > > > > What exactly did you change? > > > > > > Did you disable both BC and cacheonwrite or just one of these configs? > > > > > > Disabling BC and/or cacheonwrite would churn your cache more? > > > > > > What about the heap config? Did you over commit mem? If so, did you > try > > > doing a direct memory sizing that was w/i the bounds of physical memory > > > (well within)? Did you still see exceptions? > > > > > > St.Ack > > > > > > > > > > > > > > > > > Date: Sun, 12 Oct 2014 19:43:16 -0700 > > > > > Subject: Re: HBase read performance > > > > > From: yuzhih...@gmail.com > > > > > To: user@hbase.apache.org > > > > > > > > > > Hi, > > > > > Can you turn off cacheonwrite and keep bucket cache to see if the > problem > > > > > goes away ? > > > > > > > > > > Cheers > > > > > > > > > > On Fri, Oct 10, 2014 at 10:59 AM, Khaled Elmeleegy < > kd...@hotmail.com> > > > > > wrote: > > > > > > > > > > > Yes, I can reproduce it with some work. > > > > > > The workload is basically as follows: > > > > > > There are writers streaming writes to a table. Then, there is a > reader > > > > > > (invoked via a web interface). The reader does a 1000 parallel > reverse > > > > > > scans, all end up hitting the same region in my case. The scans > are > > > > > > effectively "gets" as I just need to get one record off of each > of > > > > them. I > > > > > > just need to do a "reverse" get, which is not supported (would be > > > > great to > > > > > > have :)), so I do it via reverse scan. After few tries, the > reader > > > > > > consistently hits this bug. > > > > > > > > > > > > This happens with these config changes: > > > > > > hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G > -XX:MaxDirectMemorySize=5G > > > > > > -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts > -verbose:gc > > > > > > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog > > > > > > gc:/tmp/hbase-regionserver-gc.loghbase-site: > > > > > > hbase.bucketcache.ioengine=offheap > > > > > > hbase.bucketcache.size=4196 > > > > > > hbase.rs.cacheblocksonwrite=true > > > > > > hfile.block.index.cacheonwrite=true > > > > > > hfile.block.bloom.cacheonwrite=true > > > > > > > > > > > > Interestingly, without these config changes, I can't reproduce > the > > > > problem. > > > > > > Khaled > > > > > > > > > > > > > > > > > > > Date: Fri, 10 Oct 2014 10:05:14 -0700 > > > > > > > Subject: Re: HBase read performance > > > > > > > From: st...@duboce.net > > > > > > > To: user@hbase.apache.org > > > > > > > > > > > > > > It looks like we are messing up our positioning math: > > > > > > > > > > > > > > 233 < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#233 > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# > > > > > > > > > > > > > > > > > > > > > public final Buffer > > > > > > > < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer > > > > > > > > > > > > > > < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# > > > > > > >position(int > > > > > > > newPosition) { > > > > > > > > > > > > > > 234 < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#234 > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# > > > > > > > > > > > > > > > > > > > > > if ((newPosition > limit > > > > > > > < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer.0limit > > > > > > >) > > > > > > > || (newPosition < 0)) > > > > > > > > > > > > > > 235 < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#235 > > > > > > > > > > > > > > > > > > > > > < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# > > > > > > > > > > > > > > > > > > > > > throw new IllegalArgumentException > > > > > > > < > > > > > > > > > > > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/IllegalArgumentException.java#IllegalArgumentException > > > > > > >(); > > > > > > > > > > > > > > > > > > > > > Is it easy to reproduce Khaled? Always same region/store or > spread > > > > > > > across all reads? > > > > > > > > > > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 10, 2014 at 8:31 AM, Khaled Elmeleegy < > kd...@hotmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Andrew thanks. I think this indeed was the problem. > > > > > > > > To get over it, I increased the amount of memory given the > region > > > > > > server > > > > > > > > to avoid IO on reads. I used the configs below. In my > experiments, > > > > I > > > > > > have > > > > > > > > writers streaming their output to HBase. The reader powers a > web > > > > page > > > > > > and > > > > > > > > does this scatter/gather, where it reads 1000 keys written > last and > > > > > > passes > > > > > > > > them the the front end. With this workload, I get the > exception > > > > below > > > > > > at > > > > > > > > the region server. Again, I am using HBAse (0.98.6.1). Any > help is > > > > > > > > appreciated. > > > > > > > > hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G > -XX:MaxDirectMemorySize=5G > > > > > > > > -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts > > > > -verbose:gc > > > > > > > > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog > > > > > > > > gc:/tmp/hbase-regionserver-gc.log > > > > > > > > hbase-site:hbase.bucketcache.ioengine=offheap > > > > > > > > > > > > > > > > hfile.block.cache.size=0.4 > > > > > > > > > > > > > > > > > > > > > > > > hbase.bucketcache.size=4196 > > > > > > > > > > > > > > > > hbase.rs.cacheblocksonwrite=true > > > > > > > > > > > > > > > > > > > > > > > > hfile.block.index.cacheonwrite=true > > > > > > > > > > > > > > > > hfile.block.bloom.cacheonwrite=true > > > > > > > > > > > > > > > > 2014-10-10 15:06:44,173 ERROR > > > > > > > > [B.DefaultRpcServer.handler=62,queue=2,port=60020] > ipc.RpcServer: > > > > > > > > Unexpected throwable object > > > > > > > > java.lang.IllegalArgumentException > > > > > > > > at java.nio.Buffer.position(Buffer.java:236) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.util.ByteBufferUtils.skip(ByteBufferUtils.java:434) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:849) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:760) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:248) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:176) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1780) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3758) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1950) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1936) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1913) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3157) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587) > > > > > > > > at > > > > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) > > > > > > > > at > > > > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) > > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) > > > > > > > > at > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) > > > > > > > > at java.lang.Thread.run(Thread.java:744) > > > > > > > > > From: apurt...@apache.org > > > > > > > > > Date: Tue, 7 Oct 2014 17:09:35 -0700 > > > > > > > > > Subject: Re: HBase read performance > > > > > > > > > To: user@hbase.apache.org > > > > > > > > > > > > > > > > > > > The cluster has 2 m1.large nodes. > > > > > > > > > > > > > > > > > > That's the problem right there. > > > > > > > > > > > > > > > > > > You need to look at c3.4xlarge or i2 instances as a minimum > > > > > > requirement. > > > > > > > > M1 > > > > > > > > > and even M3 instance types have ridiculously poor IO. > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 7, 2014 at 3:01 PM, Khaled Elmeleegy < > > > > kd...@hotmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks Nicolas, Qiang. > > > > > > > > > > > > > > > > > > > > I was able to write a simple program that reproduces the > > > > problem > > > > > > on a > > > > > > > > tiny > > > > > > > > > > HBase cluster on ec2. The cluster has 2 m1.large nodes. > One > > > > node > > > > > > runs > > > > > > > > the > > > > > > > > > > master, name node and zookeeper. The other node runs a > data > > > > node > > > > > > and a > > > > > > > > > > region server, with heap size configured to be 6GB. > There, the > > > > 1000 > > > > > > > > > > parallel reverse gets (reverse scans) take 7-8 seconds. > The > > > > data > > > > > > set is > > > > > > > > > > tiny (10M records, each having a small number of bytes). > As I > > > > said > > > > > > > > before, > > > > > > > > > > all hardware resources are very idle there. > > > > > > > > > > > > > > > > > > > > Interestingly, running the same workload on my macbook, > the > > > > 1000 > > > > > > > > parallel > > > > > > > > > > gets take ~200ms on a pseudo-distributed installation. > > > > > > > > > > > > > > > > > > > > Any help to resolve this mystery is highly appreciated. > > > > > > > > > > > > > > > > > > > > P.S. please find my test program attached. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Khaled > > > > > > > > > > > > > > > > > > > > > From: nkey...@gmail.com > > > > > > > > > > > Date: Mon, 6 Oct 2014 09:40:48 +0200 > > > > > > > > > > > Subject: Re: HBase read performance > > > > > > > > > > > To: user@hbase.apache.org > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > I haven't seen it mentioned, but if I understand > correctly > > > > each > > > > > > scan > > > > > > > > > > > returns a single row? If so you should use > Scan#setSmall to > > > > save > > > > > > > > some rpc > > > > > > > > > > > calls. > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > > > Nicolas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Oct 5, 2014 at 11:28 AM, Qiang Tian < > > > > tian...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > when using separate HConnection instance, both its > > > > > > > > > > > > RpcClient instance(maintain connection to a > regionserver) > > > > and > > > > > > > > Registry > > > > > > > > > > > > instance(maintain connection to zookeeper) will be > > > > separate.. > > > > > > > > > > > > > > > > > > > > > > > > see > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://shammijayasinghe.blogspot.com/2012/02/zookeeper-increase-maximum-number-of.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Oct 5, 2014 at 2:24 PM, Khaled Elmeleegy < > > > > > > > > kd...@hotmail.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > I tried creating my own HConnections pool to use > for my > > > > HBase > > > > > > > > calls, > > > > > > > > > > so > > > > > > > > > > > > > that not all the (2K) threads share the same > HConnection. > > > > > > > > However, I > > > > > > > > > > > > could > > > > > > > > > > > > > only have 10 HConnections. Beyond that I get ZK > > > > exceptions, > > > > > > > > please > > > > > > > > > > find > > > > > > > > > > > > it > > > > > > > > > > > > > below. Also, with 10 HConnections, I don't see > noticeable > > > > > > > > > > improvement in > > > > > > > > > > > > > performance so far. > > > > > > > > > > > > > 2014-10-05 06:11:26,490 WARN [main] > > > > > > > > zookeeper.RecoverableZooKeeper > > > > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - > Possibly > > > > > > > > transient > > > > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException: > > > > > > > > > > > > > KeeperErrorCode = ConnectionLoss for > > > > /hbase/hbaseid2014-10-05 > > > > > > > > > > > > 06:11:26,490 > > > > > > > > > > > > > INFO [main] util.RetryCounter > > > > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155)) > > > > > > > > > > > > > - Sleeping 1000ms before retry #0...2014-10-05 > > > > 06:11:27,845 > > > > > > WARN > > > > > > > > > > [main] > > > > > > > > > > > > > zookeeper.RecoverableZooKeeper > > > > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - > Possibly > > > > > > > > transient > > > > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException: > > > > > > > > > > > > > KeeperErrorCode = ConnectionLoss for > > > > /hbase/hbaseid2014-10-05 > > > > > > > > > > > > 06:11:27,849 > > > > > > > > > > > > > INFO [main] util.RetryCounter > > > > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155)) > > > > > > > > > > > > > - Sleeping 2000ms before retry #1...2014-10-05 > > > > 06:11:30,405 > > > > > > WARN > > > > > > > > > > [main] > > > > > > > > > > > > > zookeeper.RecoverableZooKeeper > > > > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - > Possibly > > > > > > > > transient > > > > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException: > > > > > > > > > > > > > KeeperErrorCode = ConnectionLoss for > > > > /hbase/hbaseid2014-10-05 > > > > > > > > > > > > 06:11:30,405 > > > > > > > > > > > > > INFO [main] util.RetryCounter > > > > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155)) > > > > > > > > > > > > > - Sleeping 4000ms before retry #2...2014-10-05 > > > > 06:11:35,278 > > > > > > WARN > > > > > > > > > > [main] > > > > > > > > > > > > > zookeeper.RecoverableZooKeeper > > > > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - > Possibly > > > > > > > > transient > > > > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException: > > > > > > > > > > > > > KeeperErrorCode = ConnectionLoss for > > > > /hbase/hbaseid2014-10-05 > > > > > > > > > > > > 06:11:35,279 > > > > > > > > > > > > > INFO [main] util.RetryCounter > > > > > > > > > > > > (RetryCounter.java:sleepUntilNextRetry(155)) > > > > > > > > > > > > > - Sleeping 8000ms before retry #3...2014-10-05 > > > > 06:11:44,393 > > > > > > WARN > > > > > > > > > > [main] > > > > > > > > > > > > > zookeeper.RecoverableZooKeeper > > > > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(253)) - > Possibly > > > > > > > > transient > > > > > > > > > > > > > ZooKeeper, quorum=54.68.206.252:2181, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > exception=org.apache.zookeeper.KeeperException$ConnectionLossException: > > > > > > > > > > > > > KeeperErrorCode = ConnectionLoss for > > > > /hbase/hbaseid2014-10-05 > > > > > > > > > > > > 06:11:44,393 > > > > > > > > > > > > > ERROR [main] zookeeper.RecoverableZooKeeper > > > > > > > > > > > > > (RecoverableZooKeeper.java:retryOrThrow(255)) - > ZooKeeper > > > > > > exists > > > > > > > > > > failed > > > > > > > > > > > > > after 4 attempts2014-10-05 06:11:44,394 WARN [main] > > > > > > > > zookeeper.ZKUtil > > > > > > > > > > > > > (ZKUtil.java:checkExists(482)) - > hconnection-0x4e174f3b, > > > > > > quorum= > > > > > > > > > > > > > 54.68.206.252:2181, baseZNode=/hbase Unable to set > > > > watcher > > > > > > on > > > > > > > > znode > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (/hbase/hbaseid)org.apache.zookeeper.KeeperException$ConnectionLossException: > > > > > > > > > > > > > KeeperErrorCode = ConnectionLoss for > /hbase/hbaseid at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > > > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199) > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479) > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83) > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:857) > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:662) > > > > > > > > > > > > > at > > > > > > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > > > > > > > > > > > > Method) at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > > > > > > > > > > > > > at > > > > > > > > > java.lang.reflect.Constructor.newInstance(Constructor.java:526) > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:414) > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:335)... > > > > > > > > > > > > > > From: kd...@hotmail.com > > > > > > > > > > > > > > To: user@hbase.apache.org > > > > > > > > > > > > > > Subject: RE: HBase read performance > > > > > > > > > > > > > > Date: Fri, 3 Oct 2014 12:37:39 -0700 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Lars, Ted, and Qiang, > > > > > > > > > > > > > > Thanks for all the input. > > > > > > > > > > > > > > Qiang: yes all the threads are in the same client > > > > process > > > > > > > > sharing > > > > > > > > > > the > > > > > > > > > > > > > same connection. And since I don't see hardware > > > > contention, > > > > > > may > > > > > > > > be > > > > > > > > > > there > > > > > > > > > > > > is > > > > > > > > > > > > > contention over this code path. I'll try using many > > > > > > connections > > > > > > > > and > > > > > > > > > > see > > > > > > > > > > > > if > > > > > > > > > > > > > it alleviates the problems and I'll report back. > > > > > > > > > > > > > > Thanks again,Khaled > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Date: Fri, 3 Oct 2014 15:18:30 +0800 > > > > > > > > > > > > > > > Subject: Re: HBase read performance > > > > > > > > > > > > > > > From: tian...@gmail.com > > > > > > > > > > > > > > > To: user@hbase.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding to profiling, Andrew introduced > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html > > > > > > > > > > > > > months > > > > > > > > > > > > > > > ago. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > processCallTime comes from RpcServer#call, so > it > > > > looks > > > > > > good? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have a suspect: > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-11306 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > how many processes do you have for your 2000 > threads? > > > > > > > > > > > > > > > if olny 1 process, those threads will share > just 1 > > > > > > > > connection to > > > > > > > > > > that > > > > > > > > > > > > > > > regionserver, there might be big contention on > the > > > > RPC > > > > > > code > > > > > > > > path. > > > > > > > > > > > > > ---for > > > > > > > > > > > > > > > such case, could you try using different > connections? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 3, 2014 at 9:55 AM, Ted Yu < > > > > > > yuzhih...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Khaled: > > > > > > > > > > > > > > > > Do you have profiler such as jprofiler ? > > > > > > > > > > > > > > > > Profiling would give us more hint. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Otherwise capturing stack trace during the > period > > > > of > > > > > > > > reverse > > > > > > > > > > scan > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > help. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Oct 2, 2014 at 4:52 PM, lars > hofhansl < > > > > > > > > > > la...@apache.org> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You might have the data in the OS buffer > cache, > > > > > > without > > > > > > > > short > > > > > > > > > > > > > circuit > > > > > > > > > > > > > > > > > reading the region server has to request > the > > > > block > > > > > > from > > > > > > > > the > > > > > > > > > > data > > > > > > > > > > > > > node > > > > > > > > > > > > > > > > > process, which then reads it from the block > > > > cache. > > > > > > > > > > > > > > > > > That is a few context switches per RPC > that do > > > > not > > > > > > show > > > > > > > > up > > > > > > > > > > in CPU > > > > > > > > > > > > > > > > metrics. > > > > > > > > > > > > > > > > > In that you also would not see disk IO. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If - as you say - you see a lot of evicted > > > > blocks the > > > > > > > > data > > > > > > > > > > *has* > > > > > > > > > > > > > to come > > > > > > > > > > > > > > > > > from the OS. If you do not see disk IO is > *has* > > > > to > > > > > > come > > > > > > > > from > > > > > > > > > > the > > > > > > > > > > > > OS > > > > > > > > > > > > > > > > cache. > > > > > > > > > > > > > > > > > I.e. there's more RAM on your boxes, and > you > > > > should > > > > > > > > increase > > > > > > > > > > the > > > > > > > > > > > > > heap > > > > > > > > > > > > > > > > block > > > > > > > > > > > > > > > > > cache. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You can measure the context switches with > vmstat. > > > > > > Other > > > > > > > > than > > > > > > > > > > that > > > > > > > > > > > > > I have > > > > > > > > > > > > > > > > > no suggestion until I reproduce the > problem. > > > > > > > > > > > > > > > > > Also check the data locality index of the > region > > > > > > server > > > > > > > > it > > > > > > > > > > should > > > > > > > > > > > > > be > > > > > > > > > > > > > > > > close > > > > > > > > > > > > > > > > > to 100%. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Lars > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ________________________________ > > > > > > > > > > > > > > > > > From: Khaled Elmeleegy <kd...@hotmail.com> > > > > > > > > > > > > > > > > > To: "user@hbase.apache.org" < > > > > user@hbase.apache.org> > > > > > > > > > > > > > > > > > Sent: Thursday, October 2, 2014 3:24 PM > > > > > > > > > > > > > > > > > Subject: RE: HBase read performance > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Lars thanks a lot about all the tips. I'll > make > > > > sure > > > > > > I > > > > > > > > cover > > > > > > > > > > all > > > > > > > > > > > > > of them > > > > > > > > > > > > > > > > > and get back to you. I am not sure they > are the > > > > > > > > bottleneck > > > > > > > > > > though > > > > > > > > > > > > > as they > > > > > > > > > > > > > > > > > all are about optimizing physical resource > > > > usage. As > > > > > > I > > > > > > > > said, > > > > > > > > > > I > > > > > > > > > > > > > don't see > > > > > > > > > > > > > > > > > any contended physical resources now. I'll > also > > > > try > > > > > > to > > > > > > > > > > reproduce > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > problem in a simpler environment and pass > to you > > > > the > > > > > > test > > > > > > > > > > program > > > > > > > > > > > > > to play > > > > > > > > > > > > > > > > > with. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Couple of high level points to make. You > are > > > > right > > > > > > that > > > > > > > > my > > > > > > > > > > use > > > > > > > > > > > > > case is > > > > > > > > > > > > > > > > > kind of a worst case for HBase reads. But, > if > > > > things > > > > > > go > > > > > > > > the > > > > > > > > > > way > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > described them, there should be tons of > disk IO > > > > and > > > > > > that > > > > > > > > > > should > > > > > > > > > > > > be > > > > > > > > > > > > > > > > clearly > > > > > > > > > > > > > > > > > the bottleneck. This is not the case > though. > > > > That's > > > > > > for > > > > > > > > the > > > > > > > > > > > > simple > > > > > > > > > > > > > reason > > > > > > > > > > > > > > > > > that this is done in a test environment (I > am > > > > still > > > > > > > > > > prototyping), > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > not a > > > > > > > > > > > > > > > > > lot of data is yet written to HBase. > However for > > > > the > > > > > > > > real use > > > > > > > > > > > > > case, there > > > > > > > > > > > > > > > > > should writers constantly writing data to > HBase > > > > and > > > > > > > > readers > > > > > > > > > > > > > occasionally > > > > > > > > > > > > > > > > > doing this scatter/gather. At steady state, > > > > things > > > > > > should > > > > > > > > > > only > > > > > > > > > > > > get > > > > > > > > > > > > > worse > > > > > > > > > > > > > > > > > and all the issues you mentioned should > get far > > > > more > > > > > > > > > > pronounced. > > > > > > > > > > > > > At this > > > > > > > > > > > > > > > > > point, one can try to mitigate it using > more > > > > memory > > > > > > or > > > > > > > > so. I > > > > > > > > > > am > > > > > > > > > > > > > not there > > > > > > > > > > > > > > > > > yet as I think I am hitting some software > > > > bottleneck, > > > > > > > > which I > > > > > > > > > > > > > don't know > > > > > > > > > > > > > > > > > how to work around. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Khaled > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------- > > > > > > > > > > > > > > > > > > Date: Thu, 2 Oct 2014 14:20:47 -0700 > > > > > > > > > > > > > > > > > > From: la...@apache.org > > > > > > > > > > > > > > > > > > Subject: Re: HBase read performance > > > > > > > > > > > > > > > > > > To: user@hbase.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK... We might need to investigate this. > > > > > > > > > > > > > > > > > > Any chance that you can provide a > minimal test > > > > > > program > > > > > > > > and > > > > > > > > > > > > > instruction > > > > > > > > > > > > > > > > > about how to set it up. > > > > > > > > > > > > > > > > > > We can do some profiling then. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One thing to note is that with scanning > HBase > > > > > > cannot > > > > > > > > use > > > > > > > > > > bloom > > > > > > > > > > > > > filters > > > > > > > > > > > > > > > > > to rule out HFiles ahead of time, it needs > to > > > > look > > > > > > into > > > > > > > > all > > > > > > > > > > of > > > > > > > > > > > > > them. > > > > > > > > > > > > > > > > > > So you kind of hit on the absolute worst > case: > > > > > > > > > > > > > > > > > > - random reads that do not fit into the > block > > > > cache > > > > > > > > > > > > > > > > > > - cannot use bloom filters > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Few more question/comments: > > > > > > > > > > > > > > > > > > - Do you have short circuit reading > enabled? If > > > > > > not, > > > > > > > > you > > > > > > > > > > > > should. > > > > > > > > > > > > > > > > > > - Is your table major compacted? That > will > > > > reduce > > > > > > the > > > > > > > > > > number of > > > > > > > > > > > > > files > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > look at. > > > > > > > > > > > > > > > > > > - Did you disable Nagle's everywhere > (enabled > > > > > > > > tcpnodelay)? > > > > > > > > > > It > > > > > > > > > > > > > disabled > > > > > > > > > > > > > > > > > by default in HBase, but necessarily in > your > > > > install > > > > > > of > > > > > > > > HDFS. > > > > > > > > > > > > > > > > > > - Which version of HDFS are you using as > > > > backing > > > > > > > > > > filesystem? > > > > > > > > > > > > > > > > > > - If your disk is idle, it means the > data fits > > > > into > > > > > > > > the OS > > > > > > > > > > > > buffer > > > > > > > > > > > > > > > > cache. > > > > > > > > > > > > > > > > > In turn that means that you increase the > heap > > > > for the > > > > > > > > region > > > > > > > > > > > > > servers. You > > > > > > > > > > > > > > > > > can also use block encoding (FAST_DIFF) to > try to > > > > > > make > > > > > > > > sure > > > > > > > > > > the > > > > > > > > > > > > > entire > > > > > > > > > > > > > > > > > working set fits into the cache. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Also try to reduce the block size - > although > > > > if > > > > > > your > > > > > > > > > > overall > > > > > > > > > > > > > working > > > > > > > > > > > > > > > > > set does not fit in the heap it won't help > much. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This is a good section of the book to > read > > > > through > > > > > > > > > > generally > > > > > > > > > > > > > (even > > > > > > > > > > > > > > > > > though you might know most of this > already): > > > > > > > > > > > > > > > > > > > > > > > http://hbase.apache.org/book.html#perf.configurations > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- Lars > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > > > > > > From: Khaled Elmeleegy < > kd...@hotmail.com> > > > > > > > > > > > > > > > > > > To: "user@hbase.apache.org" < > > > > user@hbase.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > Cc: > > > > > > > > > > > > > > > > > > Sent: Thursday, October 2, 2014 11:27 AM > > > > > > > > > > > > > > > > > > Subject: RE: HBase read performance > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I do see a very brief spike in CPU > > > > (user/system), > > > > > > but > > > > > > > > it's > > > > > > > > > > no > > > > > > > > > > > > > where > > > > > > > > > > > > > > > > near > > > > > > > > > > > > > > > > > 0% idle. It goes from 99% idle down to > something > > > > > > like 40% > > > > > > > > > > idle > > > > > > > > > > > > for > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > second > > > > > > > > > > > > > > > > > or so. The thing to note, this is all on a > test > > > > > > cluster, > > > > > > > > so > > > > > > > > > > no > > > > > > > > > > > > > real load. > > > > > > > > > > > > > > > > > Things are generally idle until i issue > 2-3 of > > > > these > > > > > > > > > > > > > multi-scan-requests > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > render a web page. Then, you see the spike > in the > > > > > > cpu and > > > > > > > > > > some > > > > > > > > > > > > > activity > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > the network and disk, but nowhere near > > > > saturation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If there are specific tests you'd like > me to > > > > do to > > > > > > > > debug > > > > > > > > > > this, > > > > > > > > > > > > > I'd be > > > > > > > > > > > > > > > > > more than happy to do it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Khaled > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------- > > > > > > > > > > > > > > > > > >> Date: Thu, 2 Oct 2014 11:15:59 -0700 > > > > > > > > > > > > > > > > > >> From: la...@apache.org > > > > > > > > > > > > > > > > > >> Subject: Re: HBase read performance > > > > > > > > > > > > > > > > > >> To: user@hbase.apache.org > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> I still think you're waiting on disk. No > > > > IOWAIT? > > > > > > So > > > > > > > > CPU > > > > > > > > > > is not > > > > > > > > > > > > > waiting > > > > > > > > > > > > > > > > > a lot for IO. No high User/System CPU > either? > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> If you see a lot of evicted block then > each > > > > RPC > > > > > > has a > > > > > > > > high > > > > > > > > > > > > > chance of > > > > > > > > > > > > > > > > > requiring to bring an entire 64k block in. > You'll > > > > > > see bad > > > > > > > > > > > > > performance > > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > > this. > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> We might need to trace this specific > scenario. > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> -- Lars > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> ________________________________ > > > > > > > > > > > > > > > > > >> From: Khaled Elmeleegy < > kd...@hotmail.com> > > > > > > > > > > > > > > > > > >> To: "user@hbase.apache.org" < > > > > > > user@hbase.apache.org> > > > > > > > > > > > > > > > > > >> Sent: Thursday, October 2, 2014 10:46 AM > > > > > > > > > > > > > > > > > >> Subject: RE: HBase read performance > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> I've set the heap size to 6GB and I do > have gc > > > > > > > > logging. No > > > > > > > > > > > > long > > > > > > > > > > > > > pauses > > > > > > > > > > > > > > > > > there -- occasional 0.1s or 0.2s. > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> Other than the discrepancy between > what's > > > > > > reported on > > > > > > > > the > > > > > > > > > > > > > client and > > > > > > > > > > > > > > > > > what's reported at the RS, there is also > the > > > > issue > > > > > > of not > > > > > > > > > > getting > > > > > > > > > > > > > proper > > > > > > > > > > > > > > > > > concurrency. So, even if a reverse get > takes > > > > 100ms > > > > > > or so > > > > > > > > > > (this > > > > > > > > > > > > has > > > > > > > > > > > > > to be > > > > > > > > > > > > > > > > > mostly blocking on various things as no > physical > > > > > > > > resource is > > > > > > > > > > > > > contended), > > > > > > > > > > > > > > > > > then the other gets/scans should be able to > > > > proceed > > > > > > in > > > > > > > > > > parallel, > > > > > > > > > > > > > so a > > > > > > > > > > > > > > > > > thousand concurrent gets/scans should > finish in > > > > few > > > > > > > > hundreds > > > > > > > > > > of > > > > > > > > > > > > ms > > > > > > > > > > > > > not > > > > > > > > > > > > > > > > many > > > > > > > > > > > > > > > > > seconds. That's why I thought I'd increase > the > > > > > > handlers > > > > > > > > > > count to > > > > > > > > > > > > > try to > > > > > > > > > > > > > > > > get > > > > > > > > > > > > > > > > > more concurrency, but it didn't help. So, > there > > > > must > > > > > > be > > > > > > > > > > something > > > > > > > > > > > > > else. > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> Khaled > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > >> ---------------------------------------- > > > > > > > > > > > > > > > > > >>> From: ndimi...@gmail.com > > > > > > > > > > > > > > > > > >>> Date: Thu, 2 Oct 2014 10:36:39 -0700 > > > > > > > > > > > > > > > > > >>> Subject: Re: HBase read performance > > > > > > > > > > > > > > > > > >>> To: user@hbase.apache.org > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > >>> Do check again on the heap size of the > region > > > > > > > > servers. > > > > > > > > > > The > > > > > > > > > > > > > default > > > > > > > > > > > > > > > > > >>> unconfigured size is 1G; too small for > much > > > > of > > > > > > > > anything. > > > > > > > > > > > > Check > > > > > > > > > > > > > your > > > > > > > > > > > > > > > > RS > > > > > > > > > > > > > > > > > logs > > > > > > > > > > > > > > > > > >>> -- look for lines produced by the > > > > JVMPauseMonitor > > > > > > > > thread. > > > > > > > > > > > > They > > > > > > > > > > > > > > > > usually > > > > > > > > > > > > > > > > > >>> correlate with long GC pauses or other > > > > > > process-freeze > > > > > > > > > > events. > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > >>> Get is implemented as a Scan of a > single > > > > row, so > > > > > > a > > > > > > > > > > reverse > > > > > > > > > > > > > scan of a > > > > > > > > > > > > > > > > > single > > > > > > > > > > > > > > > > > >>> row should be functionally equivalent. > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > >>> In practice, I have seen discrepancy > between > > > > the > > > > > > > > > > latencies > > > > > > > > > > > > > reported > > > > > > > > > > > > > > > > by > > > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > >>> RS and the latencies experienced by the > > > > client. > > > > > > I've > > > > > > > > not > > > > > > > > > > > > > investigated > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > >>> area thoroughly. > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > >>> On Thu, Oct 2, 2014 at 10:05 AM, Khaled > > > > > > Elmeleegy < > > > > > > > > > > > > > kd...@hotmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > >>>> Thanks Lars for your quick reply. > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > >>>> Yes performance is similar with less > > > > handlers (I > > > > > > > > tried > > > > > > > > > > with > > > > > > > > > > > > > 100 > > > > > > > > > > > > > > > > > first). > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > >>>> The payload is not big ~1KB or so. The > > > > working > > > > > > set > > > > > > > > > > doesn't > > > > > > > > > > > > > seem to > > > > > > > > > > > > > > > > > fit in > > > > > > > > > > > > > > > > > >>>> memory as there are many cache misses. > > > > However, > > > > > > > > disk is > > > > > > > > > > far > > > > > > > > > > > > > from > > > > > > > > > > > > > > > > > being a > > > > > > > > > > > > > > > > > >>>> bottleneck. I checked using iostat. I > also > > > > > > verified > > > > > > > > that > > > > > > > > > > > > > neither the > > > > > > > > > > > > > > > > > >>>> network nor the CPU of the region > server or > > > > the > > > > > > > > client > > > > > > > > > > are a > > > > > > > > > > > > > > > > > bottleneck. > > > > > > > > > > > > > > > > > >>>> This leads me to believe that likely > this > > > > is a > > > > > > > > software > > > > > > > > > > > > > bottleneck, > > > > > > > > > > > > > > > > > >>>> possibly due to a misconfiguration on > my > > > > side. I > > > > > > > > just > > > > > > > > > > don't > > > > > > > > > > > > > know how > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > >>>> debug it. A clear disconnect I see is > the > > > > > > individual > > > > > > > > > > request > > > > > > > > > > > > > latency > > > > > > > > > > > > > > > > > as > > > > > > > > > > > > > > > > > >>>> reported by metrics on the region > server > > > > (IPC > > > > > > > > > > > > processCallTime > > > > > > > > > > > > > vs > > > > > > > > > > > > > > > > > scanNext) > > > > > > > > > > > > > > > > > >>>> vs what's measured on the client. > Does this > > > > > > sound > > > > > > > > > > right? Any > > > > > > > > > > > > > ideas > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > > > >>>> to better debug it? > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > >>>> About this trick with the timestamps > to be > > > > able > > > > > > to > > > > > > > > do a > > > > > > > > > > > > > forward > > > > > > > > > > > > > > > > scan, > > > > > > > > > > > > > > > > > >>>> thanks for pointing it out. Actually, > I am > > > > > > aware of > > > > > > > > it. > > > > > > > > > > The > > > > > > > > > > > > > problem > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > >>>> is, sometimes I want to get the key > after a > > > > > > > > particular > > > > > > > > > > > > > timestamp and > > > > > > > > > > > > > > > > > >>>> sometimes I want to get the key > before, so > > > > just > > > > > > > > relying > > > > > > > > > > on > > > > > > > > > > > > > the key > > > > > > > > > > > > > > > > > order > > > > > > > > > > > > > > > > > >>>> doesn't work. Ideally, I want a > reverse > > > > get(). I > > > > > > > > thought > > > > > > > > > > > > > reverse > > > > > > > > > > > > > > > > scan > > > > > > > > > > > > > > > > > can > > > > > > > > > > > > > > > > > >>>> do the trick though. > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > >>>> Khaled > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > >>>> > ---------------------------------------- > > > > > > > > > > > > > > > > > >>>>> Date: Thu, 2 Oct 2014 09:40:37 -0700 > > > > > > > > > > > > > > > > > >>>>> From: la...@apache.org > > > > > > > > > > > > > > > > > >>>>> Subject: Re: HBase read performance > > > > > > > > > > > > > > > > > >>>>> To: user@hbase.apache.org > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> Hi Khaled, > > > > > > > > > > > > > > > > > >>>>> is it the same with fewer threads? > 1500 > > > > handler > > > > > > > > threads > > > > > > > > > > > > > seems to > > > > > > > > > > > > > > > > be a > > > > > > > > > > > > > > > > > >>>> lot. Typically a good number of > threads > > > > depends > > > > > > on > > > > > > > > the > > > > > > > > > > > > > hardware > > > > > > > > > > > > > > > > > (number of > > > > > > > > > > > > > > > > > >>>> cores, number of spindles, etc). I > cannot > > > > think > > > > > > of > > > > > > > > any > > > > > > > > > > type > > > > > > > > > > > > of > > > > > > > > > > > > > > > > > scenario > > > > > > > > > > > > > > > > > >>>> where more than 100 would give any > > > > improvement. > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> How large is the payload per KV > retrieved > > > > that > > > > > > > > way? If > > > > > > > > > > > > large > > > > > > > > > > > > > (as > > > > > > > > > > > > > > > > in a > > > > > > > > > > > > > > > > > >>>> few 100k) you definitely want to > lower the > > > > > > number > > > > > > > > of the > > > > > > > > > > > > > handler > > > > > > > > > > > > > > > > > threads. > > > > > > > > > > > > > > > > > >>>>> How much heap do you give the region > > > > server? > > > > > > Does > > > > > > > > the > > > > > > > > > > > > > working set > > > > > > > > > > > > > > > > fit > > > > > > > > > > > > > > > > > >>>> into the cache? (i.e. in the metrics, > do you > > > > > > see the > > > > > > > > > > > > eviction > > > > > > > > > > > > > count > > > > > > > > > > > > > > > > > going > > > > > > > > > > > > > > > > > >>>> up, if so it does not fit into the > cache). > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> If the working set does not fit into > the > > > > cache > > > > > > > > > > (eviction > > > > > > > > > > > > > count goes > > > > > > > > > > > > > > > > > up) > > > > > > > > > > > > > > > > > >>>> then HBase will need to bring a new > block in > > > > > > from > > > > > > > > disk > > > > > > > > > > on > > > > > > > > > > > > > each Get > > > > > > > > > > > > > > > > > >>>> (assuming the Gets are more or less > random > > > > as > > > > > > far > > > > > > > > as the > > > > > > > > > > > > > server is > > > > > > > > > > > > > > > > > >>>> concerned). > > > > > > > > > > > > > > > > > >>>>> In case you'll benefit from reducing > the > > > > HFile > > > > > > > > block > > > > > > > > > > size > > > > > > > > > > > > > (from 64k > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > >>>> 8k or even 4k). > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> Lastly I don't think we tested the > > > > performance > > > > > > of > > > > > > > > using > > > > > > > > > > > > > reverse > > > > > > > > > > > > > > > > scan > > > > > > > > > > > > > > > > > >>>> this way, there is probably room to > optimize > > > > > > this. > > > > > > > > > > > > > > > > > >>>>> Can you restructure your keys to > allow > > > > forwards > > > > > > > > > > scanning? > > > > > > > > > > > > For > > > > > > > > > > > > > > > > example > > > > > > > > > > > > > > > > > >>>> you could store the time as > MAX_LONG-time. > > > > Or > > > > > > you > > > > > > > > could > > > > > > > > > > > > > invert all > > > > > > > > > > > > > > > > > the bits > > > > > > > > > > > > > > > > > >>>> of the time portion of the key, so > that it > > > > sort > > > > > > the > > > > > > > > > > other > > > > > > > > > > > > > way. Then > > > > > > > > > > > > > > > > > you > > > > > > > > > > > > > > > > > >>>> could do a forward scan. > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> Let us know how it goes. > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> -- Lars > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> ----- Original Message ----- > > > > > > > > > > > > > > > > > >>>>> From: Khaled Elmeleegy < > kd...@hotmail.com> > > > > > > > > > > > > > > > > > >>>>> To: "user@hbase.apache.org" < > > > > > > user@hbase.apache.org > > > > > > > > > > > > > > > > > > > > > > > > > > >>>>> Cc: > > > > > > > > > > > > > > > > > >>>>> Sent: Thursday, October 2, 2014 > 12:12 AM > > > > > > > > > > > > > > > > > >>>>> Subject: HBase read performance > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> Hi, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> I am trying to do a scatter/gather > on hbase > > > > > > > > (0.98.6.1), > > > > > > > > > > > > > where I > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > > >>>> client reading ~1000 keys from an > HBase > > > > table. > > > > > > These > > > > > > > > > > keys > > > > > > > > > > > > > happen to > > > > > > > > > > > > > > > > > fall on > > > > > > > > > > > > > > > > > >>>> the same region server. For my reads > I use > > > > > > reverse > > > > > > > > scan > > > > > > > > > > to > > > > > > > > > > > > > read each > > > > > > > > > > > > > > > > > key as > > > > > > > > > > > > > > > > > >>>> I want the key prior to a specific > time > > > > stamp > > > > > > (time > > > > > > > > > > stamps > > > > > > > > > > > > are > > > > > > > > > > > > > > > > stored > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > >>>> reverse order). I don't believe gets > can > > > > > > accomplish > > > > > > > > > > that, > > > > > > > > > > > > > right? so > > > > > > > > > > > > > > > > I > > > > > > > > > > > > > > > > > use > > > > > > > > > > > > > > > > > >>>> scan, with caching set to 1. > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> I use 2000 reader threads in the > client > > > > and on > > > > > > > > HBase, > > > > > > > > > > I've > > > > > > > > > > > > > set > > > > > > > > > > > > > > > > > >>>> hbase.regionserver.handler.count to > 1500. > > > > With > > > > > > this > > > > > > > > > > setup, > > > > > > > > > > > > my > > > > > > > > > > > > > > > > scatter > > > > > > > > > > > > > > > > > >>>> gather is very slow and can take up > to 10s > > > > in > > > > > > total. > > > > > > > > > > Timing > > > > > > > > > > > > an > > > > > > > > > > > > > > > > > individual > > > > > > > > > > > > > > > > > >>>> getScanner(..) call on the client > side, it > > > > can > > > > > > > > easily > > > > > > > > > > take > > > > > > > > > > > > few > > > > > > > > > > > > > > > > > hundreds of > > > > > > > > > > > > > > > > > >>>> ms. I also got the following metrics > from > > > > the > > > > > > region > > > > > > > > > > server > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > question: > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> "queueCallTime_mean" : > 2.190855525775637, > > > > > > > > > > > > > > > > > >>>>> "queueCallTime_median" : 0.0, > > > > > > > > > > > > > > > > > >>>>> "queueCallTime_75th_percentile" : > 0.0, > > > > > > > > > > > > > > > > > >>>>> "queueCallTime_95th_percentile" : > 1.0, > > > > > > > > > > > > > > > > > >>>>> "queueCallTime_99th_percentile" : > > > > > > > > 556.9799999999818, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> "processCallTime_min" : 0, > > > > > > > > > > > > > > > > > >>>>> "processCallTime_max" : 12755, > > > > > > > > > > > > > > > > > >>>>> "processCallTime_mean" : > > > > 105.64873440912682, > > > > > > > > > > > > > > > > > >>>>> "processCallTime_median" : 0.0, > > > > > > > > > > > > > > > > > >>>>> "processCallTime_75th_percentile" : > 2.0, > > > > > > > > > > > > > > > > > >>>>> "processCallTime_95th_percentile" : > > > > 7917.95, > > > > > > > > > > > > > > > > > >>>>> "processCallTime_99th_percentile" : > > > > 8876.89, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_min" > > > > > > > > > > > > > > > > > >>>> : 89, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_max" > > > > > > > > > > > > > > > > > >>>> : 11300, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_mean" > > > > > > > > > > > > > > > > > >>>> : 654.4949739797315, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_median" > > > > > > > > > > > > > > > > > >>>> : 101.0, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_75th_percentile" > > > > > > > > > > > > > > > > > >>>> : 101.0, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_95th_percentile" > > > > > > > > > > > > > > > > > >>>> : 101.0, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_99th_percentile" > > > > > > > > > > > > > > > > > >>>> : 113.0, > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> Where "delta" is the name of the > table I am > > > > > > > > querying. > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> In addition to all this, i monitored > the > > > > > > hardware > > > > > > > > > > resources > > > > > > > > > > > > > (CPU, > > > > > > > > > > > > > > > > > disk, > > > > > > > > > > > > > > > > > >>>> and network) of both the client and > the > > > > region > > > > > > > > server > > > > > > > > > > and > > > > > > > > > > > > > nothing > > > > > > > > > > > > > > > > > seems > > > > > > > > > > > > > > > > > >>>> anywhere near saturation. So I am > puzzled by > > > > > > what's > > > > > > > > > > going on > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > where this > > > > > > > > > > > > > > > > > >>>> time is going. > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> Few things to note based on the above > > > > > > measurements: > > > > > > > > > > both > > > > > > > > > > > > > medians of > > > > > > > > > > > > > > > > > IPC > > > > > > > > > > > > > > > > > >>>> processCallTime and queueCallTime are > > > > basically > > > > > > zero > > > > > > > > > > (ms I > > > > > > > > > > > > > presume, > > > > > > > > > > > > > > > > > >>>> right?). However, scanNext_median is > 101 (ms > > > > > > too, > > > > > > > > > > right?). I > > > > > > > > > > > > > am not > > > > > > > > > > > > > > > > > sure > > > > > > > > > > > > > > > > > >>>> how this adds up. Also, even though > the 101 > > > > > > figure > > > > > > > > seems > > > > > > > > > > > > > > > > outrageously > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > > >>>> and I don't know why, still all these > scans > > > > > > should > > > > > > > > be > > > > > > > > > > > > > happening in > > > > > > > > > > > > > > > > > >>>> parallel, so the overall call should > finish > > > > > > fast, > > > > > > > > given > > > > > > > > > > that > > > > > > > > > > > > > no > > > > > > > > > > > > > > > > > hardware > > > > > > > > > > > > > > > > > >>>> resource is contended, right? but > this is > > > > not > > > > > > what's > > > > > > > > > > > > > happening, so I > > > > > > > > > > > > > > > > > have > > > > > > > > > > > > > > > > > >>>> to be missing something(s). > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> So, any help is appreciated there. > > > > > > > > > > > > > > > > > >>>>> > > > > > > > > > > > > > > > > > >>>>> Thanks, > > > > > > > > > > > > > > > > > >>>>> Khaled > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > >>>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > - Andy > > > > > > > > > > > > > > > > > > Problems worthy of attack prove their worth by hitting > back. - > > > > Piet > > > > > > Hein > > > > > > > > > (via Tom White) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >