Thanks for the confirmation. Looks like the combination of bucket cache and cache on write has a bug.
On Oct 12, 2014, at 8:58 PM, Khaled Elmeleegy <kd...@hotmail.com> wrote: > It goes away, but i think that's because i am basically churning the cache > far less. My workload is mostly writes. > >> Date: Sun, 12 Oct 2014 19:43:16 -0700 >> Subject: Re: HBase read performance >> From: yuzhih...@gmail.com >> To: user@hbase.apache.org >> >> Hi, >> Can you turn off cacheonwrite and keep bucket cache to see if the problem >> goes away ? >> >> Cheers >> >> On Fri, Oct 10, 2014 at 10:59 AM, Khaled Elmeleegy <kd...@hotmail.com> >> wrote: >> >>> Yes, I can reproduce it with some work. >>> The workload is basically as follows: >>> There are writers streaming writes to a table. Then, there is a reader >>> (invoked via a web interface). The reader does a 1000 parallel reverse >>> scans, all end up hitting the same region in my case. The scans are >>> effectively "gets" as I just need to get one record off of each of them. I >>> just need to do a "reverse" get, which is not supported (would be great to >>> have :)), so I do it via reverse scan. After few tries, the reader >>> consistently hits this bug. >>> >>> This happens with these config changes: >>> hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G -XX:MaxDirectMemorySize=5G >>> -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts -verbose:gc >>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog >>> gc:/tmp/hbase-regionserver-gc.loghbase-site: >>> hbase.bucketcache.ioengine=offheap >>> hbase.bucketcache.size=4196 >>> hbase.rs.cacheblocksonwrite=true >>> hfile.block.index.cacheonwrite=true >>> hfile.block.bloom.cacheonwrite=true >>> >>> Interestingly, without these config changes, I can't reproduce the problem. >>> Khaled >>> >>> >>>> Date: Fri, 10 Oct 2014 10:05:14 -0700 >>>> Subject: Re: HBase read performance >>>> From: st...@duboce.net >>>> To: user@hbase.apache.org >>>> >>>> It looks like we are messing up our positioning math: >>>> >>>> 233 < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#233 >>>> >>>> >>>> < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# >>>> >>>> >>>> public final Buffer >>>> < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer >>>> >>>> < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# >>>> position(int >>>> newPosition) { >>>> >>>> 234 < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#234 >>>> >>>> >>>> < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# >>>> >>>> >>>> if ((newPosition > limit >>>> < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#Buffer.0limit >>>> ) >>>> || (newPosition < 0)) >>>> >>>> 235 < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java#235 >>>> >>>> >>>> < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/Buffer.java# >>>> >>>> >>>> throw new IllegalArgumentException >>>> < >>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/IllegalArgumentException.java#IllegalArgumentException >>>> (); >>>> >>>> >>>> Is it easy to reproduce Khaled? Always same region/store or spread >>>> across all reads? >>>> >>>> >>>> St.Ack >>>> >>>> >>>> >>>> On Fri, Oct 10, 2014 at 8:31 AM, Khaled Elmeleegy <kd...@hotmail.com> >>> wrote: >>>> >>>>> Andrew thanks. I think this indeed was the problem. >>>>> To get over it, I increased the amount of memory given the region >>> server >>>>> to avoid IO on reads. I used the configs below. In my experiments, I >>> have >>>>> writers streaming their output to HBase. The reader powers a web page >>> and >>>>> does this scatter/gather, where it reads 1000 keys written last and >>> passes >>>>> them the the front end. With this workload, I get the exception below >>> at >>>>> the region server. Again, I am using HBAse (0.98.6.1). Any help is >>>>> appreciated. >>>>> hbase-env:HBASE_REGIONSERVER_OPTS=-Xmx6G -XX:MaxDirectMemorySize=5G >>>>> -XX:CMSInitiatingOccupancyFraction=88 -XX:+AggressiveOpts -verbose:gc >>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xlog >>>>> gc:/tmp/hbase-regionserver-gc.log >>>>> hbase-site:hbase.bucketcache.ioengine=offheap >>>>> >>>>> hfile.block.cache.size=0.4 >>>>> >>>>> >>>>> hbase.bucketcache.size=4196 >>>>> >>>>> hbase.rs.cacheblocksonwrite=true >>>>> >>>>> >>>>> hfile.block.index.cacheonwrite=true >>>>> >>>>> hfile.block.bloom.cacheonwrite=true >>>>> >>>>> 2014-10-10 15:06:44,173 ERROR >>>>> [B.DefaultRpcServer.handler=62,queue=2,port=60020] ipc.RpcServer: >>>>> Unexpected throwable object >>>>> java.lang.IllegalArgumentException >>>>> at java.nio.Buffer.position(Buffer.java:236) >>>>> at >>> org.apache.hadoop.hbase.util.ByteBufferUtils.skip(ByteBufferUtils.java:434) >>>>> at >>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.readKeyValueLen(HFileReaderV2.java:849) >>>>> at >>> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:760) >>>>> at >>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:248) >>>>> at >>> org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:152) >>>>> at >>> org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:317) >>>>> at >>> org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:176) >>>>> at >>> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1780) >>>>> at >>> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3758) >>>>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1950) >>>>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1936) >>>>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1913) >>>>> at >>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3157) >>>>> at >>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587) >>>>> at >>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) >>>>> at >>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) >>>>> at >>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) >>>>> at >>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>>> From: apurt...@apache.org >>>>>> Date: Tue, 7 Oct 2014 17:09:35 -0700 >>>>>> Subject: Re: HBase read performance >>>>>> To: user@hbase.apache.org >>>>>> >>>>>>> The cluster has 2 m1.large nodes. >>>>>> >>>>>> That's the problem right there. >>>>>> >>>>>> You need to look at c3.4xlarge or i2 instances as a minimum >>> requirement. >>>>> M1 >>>>>> and even M3 instance types have ridiculously poor IO. >>>>>> >>>>>> >>>>>> On Tue, Oct 7, 2014 at 3:01 PM, Khaled Elmeleegy <kd...@hotmail.com> >>>>> wrote: >>>>>> >>>>>>> Thanks Nicolas, Qiang. >>>>>>> >>>>>>> I was able to write a simple program that reproduces the problem >>> on a >>>>> tiny >>>>>>> HBase cluster on ec2. The cluster has 2 m1.large nodes. One node >>> runs >>>>> the >>>>>>> master, name node and zookeeper. The other node runs a data node >>> and a >>>>>>> region server, with heap size configured to be 6GB. There, the 1000 >>>>>>> parallel reverse gets (reverse scans) take 7-8 seconds. The data >>> set is >>>>>>> tiny (10M records, each having a small number of bytes). As I said >>>>> before, >>>>>>> all hardware resources are very idle there. >>>>>>> >>>>>>> Interestingly, running the same workload on my macbook, the 1000 >>>>> parallel >>>>>>> gets take ~200ms on a pseudo-distributed installation. >>>>>>> >>>>>>> Any help to resolve this mystery is highly appreciated. >>>>>>> >>>>>>> P.S. please find my test program attached. >>>>>>> >>>>>>> Best, >>>>>>> Khaled >>>>>>> >>>>>>>> From: nkey...@gmail.com >>>>>>>> Date: Mon, 6 Oct 2014 09:40:48 +0200 >>>>>>>> Subject: Re: HBase read performance >>>>>>>> To: user@hbase.apache.org >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I haven't seen it mentioned, but if I understand correctly each >>> scan >>>>>>>> returns a single row? If so you should use Scan#setSmall to save >>>>> some rpc >>>>>>>> calls. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Nicolas >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Oct 5, 2014 at 11:28 AM, Qiang Tian <tian...@gmail.com> >>>>> wrote: >>>>>>>> >>>>>>>>> when using separate HConnection instance, both its >>>>>>>>> RpcClient instance(maintain connection to a regionserver) and >>>>> Registry >>>>>>>>> instance(maintain connection to zookeeper) will be separate.. >>>>>>>>> >>>>>>>>> see >>> http://shammijayasinghe.blogspot.com/2012/02/zookeeper-increase-maximum-number-of.html >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Oct 5, 2014 at 2:24 PM, Khaled Elmeleegy < >>>>> kd...@hotmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I tried creating my own HConnections pool to use for my HBase >>>>> calls, >>>>>>> so >>>>>>>>>> that not all the (2K) threads share the same HConnection. >>>>> However, I >>>>>>>>> could >>>>>>>>>> only have 10 HConnections. Beyond that I get ZK exceptions, >>>>> please >>>>>>> find >>>>>>>>> it >>>>>>>>>> below. Also, with 10 HConnections, I don't see noticeable >>>>>>> improvement in >>>>>>>>>> performance so far. >>>>>>>>>> 2014-10-05 06:11:26,490 WARN [main] >>>>> zookeeper.RecoverableZooKeeper >>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly >>>>> transient >>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181, >>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05 >>>>>>>>> 06:11:26,490 >>>>>>>>>> INFO [main] util.RetryCounter >>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155)) >>>>>>>>>> - Sleeping 1000ms before retry #0...2014-10-05 06:11:27,845 >>> WARN >>>>>>> [main] >>>>>>>>>> zookeeper.RecoverableZooKeeper >>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly >>>>> transient >>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181, >>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05 >>>>>>>>> 06:11:27,849 >>>>>>>>>> INFO [main] util.RetryCounter >>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155)) >>>>>>>>>> - Sleeping 2000ms before retry #1...2014-10-05 06:11:30,405 >>> WARN >>>>>>> [main] >>>>>>>>>> zookeeper.RecoverableZooKeeper >>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly >>>>> transient >>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181, >>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05 >>>>>>>>> 06:11:30,405 >>>>>>>>>> INFO [main] util.RetryCounter >>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155)) >>>>>>>>>> - Sleeping 4000ms before retry #2...2014-10-05 06:11:35,278 >>> WARN >>>>>>> [main] >>>>>>>>>> zookeeper.RecoverableZooKeeper >>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly >>>>> transient >>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181, >>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05 >>>>>>>>> 06:11:35,279 >>>>>>>>>> INFO [main] util.RetryCounter >>>>>>>>> (RetryCounter.java:sleepUntilNextRetry(155)) >>>>>>>>>> - Sleeping 8000ms before retry #3...2014-10-05 06:11:44,393 >>> WARN >>>>>>> [main] >>>>>>>>>> zookeeper.RecoverableZooKeeper >>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(253)) - Possibly >>>>> transient >>>>>>>>>> ZooKeeper, quorum=54.68.206.252:2181, >>> exception=org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid2014-10-05 >>>>>>>>> 06:11:44,393 >>>>>>>>>> ERROR [main] zookeeper.RecoverableZooKeeper >>>>>>>>>> (RecoverableZooKeeper.java:retryOrThrow(255)) - ZooKeeper >>> exists >>>>>>> failed >>>>>>>>>> after 4 attempts2014-10-05 06:11:44,394 WARN [main] >>>>> zookeeper.ZKUtil >>>>>>>>>> (ZKUtil.java:checkExists(482)) - hconnection-0x4e174f3b, >>> quorum= >>>>>>>>>> 54.68.206.252:2181, baseZNode=/hbase Unable to set watcher >>> on >>>>> znode >>> (/hbase/hbaseid)org.apache.zookeeper.KeeperException$ConnectionLossException: >>>>>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/hbaseid at >>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >>>>>>> at >>>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >>>>>>> at >>>>>>>>>> org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at >>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:199) >>>>>>>>>> at >>> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:479) >>>>>>>>>> at >>> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65) >>>>>>>>>> at >>> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83) >>>>>>>>>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:857) >>>>>>>>>> at >>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:662) >>>>>>>>>> at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native >>>>>>>>>> Method) at >>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >>>>>>>>>> at >>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >>>>>>>>>> at >>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:526) >>>>>>>>> at >>> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:414) >>>>>>>>>> at >>> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:335)... >>>>>>>>>>> From: kd...@hotmail.com >>>>>>>>>>> To: user@hbase.apache.org >>>>>>>>>>> Subject: RE: HBase read performance >>>>>>>>>>> Date: Fri, 3 Oct 2014 12:37:39 -0700 >>>>>>>>>>> >>>>>>>>>>> Lars, Ted, and Qiang, >>>>>>>>>>> Thanks for all the input. >>>>>>>>>>> Qiang: yes all the threads are in the same client process >>>>> sharing >>>>>>> the >>>>>>>>>> same connection. And since I don't see hardware contention, >>> may >>>>> be >>>>>>> there >>>>>>>>> is >>>>>>>>>> contention over this code path. I'll try using many >>> connections >>>>> and >>>>>>> see >>>>>>>>> if >>>>>>>>>> it alleviates the problems and I'll report back. >>>>>>>>>>> Thanks again,Khaled >>>>>>>>>>> >>>>>>>>>>>> Date: Fri, 3 Oct 2014 15:18:30 +0800 >>>>>>>>>>>> Subject: Re: HBase read performance >>>>>>>>>>>> From: tian...@gmail.com >>>>>>>>>>>> To: user@hbase.apache.org >>>>>>>>>>>> >>>>>>>>>>>> Regarding to profiling, Andrew introduced >>>>>>> http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html >>>>>>>>>> months >>>>>>>>>>>> ago. >>>>>>>>>>>> >>>>>>>>>>>> processCallTime comes from RpcServer#call, so it looks >>> good? >>>>>>>>>>>> >>>>>>>>>>>> I have a suspect: >>>>>>> https://issues.apache.org/jira/browse/HBASE-11306 >>>>>>>>>>>> >>>>>>>>>>>> how many processes do you have for your 2000 threads? >>>>>>>>>>>> if olny 1 process, those threads will share just 1 >>>>> connection to >>>>>>> that >>>>>>>>>>>> regionserver, there might be big contention on the RPC >>> code >>>>> path. >>>>>>>>>> ---for >>>>>>>>>>>> such case, could you try using different connections? >>> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Oct 3, 2014 at 9:55 AM, Ted Yu < >>> yuzhih...@gmail.com> >>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Khaled: >>>>>>>>>>>>> Do you have profiler such as jprofiler ? >>>>>>>>>>>>> Profiling would give us more hint. >>>>>>>>>>>>> >>>>>>>>>>>>> Otherwise capturing stack trace during the period of >>>>> reverse >>>>>>> scan >>>>>>>>>> would >>>>>>>>>>>>> help. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Oct 2, 2014 at 4:52 PM, lars hofhansl < >>>>>>> la...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You might have the data in the OS buffer cache, >>> without >>>>> short >>>>>>>>>> circuit >>>>>>>>>>>>>> reading the region server has to request the block >>> from >>>>> the >>>>>>> data >>>>>>>>>> node >>>>>>>>>>>>>> process, which then reads it from the block cache. >>>>>>>>>>>>>> That is a few context switches per RPC that do not >>> show >>>>> up >>>>>>> in CPU >>>>>>>>>>>>> metrics. >>>>>>>>>>>>>> In that you also would not see disk IO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> If - as you say - you see a lot of evicted blocks the >>>>> data >>>>>>> *has* >>>>>>>>>> to come >>>>>>>>>>>>>> from the OS. If you do not see disk IO is *has* to >>> come >>>>> from >>>>>>> the >>>>>>>>> OS >>>>>>>>>>>>> cache. >>>>>>>>>>>>>> I.e. there's more RAM on your boxes, and you should >>>>> increase >>>>>>> the >>>>>>>>>> heap >>>>>>>>>>>>> block >>>>>>>>>>>>>> cache. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can measure the context switches with vmstat. >>> Other >>>>> than >>>>>>> that >>>>>>>>>> I have >>>>>>>>>>>>>> no suggestion until I reproduce the problem. >>>>>>>>>>>>>> Also check the data locality index of the region >>> server >>>>> it >>>>>>> should >>>>>>>>>> be >>>>>>>>>>>>> close >>>>>>>>>>>>>> to 100%. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- Lars >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ________________________________ >>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com> >>>>>>>>>>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org> >>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 3:24 PM >>>>>>>>>>>>>> Subject: RE: HBase read performance >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Lars thanks a lot about all the tips. I'll make sure >>> I >>>>> cover >>>>>>> all >>>>>>>>>> of them >>>>>>>>>>>>>> and get back to you. I am not sure they are the >>>>> bottleneck >>>>>>> though >>>>>>>>>> as they >>>>>>>>>>>>>> all are about optimizing physical resource usage. As >>> I >>>>> said, >>>>>>> I >>>>>>>>>> don't see >>>>>>>>>>>>>> any contended physical resources now. I'll also try >>> to >>>>>>> reproduce >>>>>>>>>> this >>>>>>>>>>>>>> problem in a simpler environment and pass to you the >>> test >>>>>>> program >>>>>>>>>> to play >>>>>>>>>>>>>> with. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Couple of high level points to make. You are right >>> that >>>>> my >>>>>>> use >>>>>>>>>> case is >>>>>>>>>>>>>> kind of a worst case for HBase reads. But, if things >>> go >>>>> the >>>>>>> way >>>>>>>>> you >>>>>>>>>>>>>> described them, there should be tons of disk IO and >>> that >>>>>>> should >>>>>>>>> be >>>>>>>>>>>>> clearly >>>>>>>>>>>>>> the bottleneck. This is not the case though. That's >>> for >>>>> the >>>>>>>>> simple >>>>>>>>>> reason >>>>>>>>>>>>>> that this is done in a test environment (I am still >>>>>>> prototyping), >>>>>>>>>> and >>>>>>>>>>>>> not a >>>>>>>>>>>>>> lot of data is yet written to HBase. However for the >>>>> real use >>>>>>>>>> case, there >>>>>>>>>>>>>> should writers constantly writing data to HBase and >>>>> readers >>>>>>>>>> occasionally >>>>>>>>>>>>>> doing this scatter/gather. At steady state, things >>> should >>>>>>> only >>>>>>>>> get >>>>>>>>>> worse >>>>>>>>>>>>>> and all the issues you mentioned should get far more >>>>>>> pronounced. >>>>>>>>>> At this >>>>>>>>>>>>>> point, one can try to mitigate it using more memory >>> or >>>>> so. I >>>>>>> am >>>>>>>>>> not there >>>>>>>>>>>>>> yet as I think I am hitting some software bottleneck, >>>>> which I >>>>>>>>>> don't know >>>>>>>>>>>>>> how to work around. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Khaled >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 14:20:47 -0700 >>>>>>>>>>>>>>> From: la...@apache.org >>>>>>>>>>>>>>> Subject: Re: HBase read performance >>>>>>>>>>>>>>> To: user@hbase.apache.org >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OK... We might need to investigate this. >>>>>>>>>>>>>>> Any chance that you can provide a minimal test >>> program >>>>> and >>>>>>>>>> instruction >>>>>>>>>>>>>> about how to set it up. >>>>>>>>>>>>>>> We can do some profiling then. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One thing to note is that with scanning HBase >>> cannot >>>>> use >>>>>>> bloom >>>>>>>>>> filters >>>>>>>>>>>>>> to rule out HFiles ahead of time, it needs to look >>> into >>>>> all >>>>>>> of >>>>>>>>>> them. >>>>>>>>>>>>>>> So you kind of hit on the absolute worst case: >>>>>>>>>>>>>>> - random reads that do not fit into the block cache >>>>>>>>>>>>>>> - cannot use bloom filters >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Few more question/comments: >>>>>>>>>>>>>>> - Do you have short circuit reading enabled? If >>> not, >>>>> you >>>>>>>>> should. >>>>>>>>>>>>>>> - Is your table major compacted? That will reduce >>> the >>>>>>> number of >>>>>>>>>> files >>>>>>>>>>>>> to >>>>>>>>>>>>>> look at. >>>>>>>>>>>>>>> - Did you disable Nagle's everywhere (enabled >>>>> tcpnodelay)? >>>>>>> It >>>>>>>>>> disabled >>>>>>>>>>>>>> by default in HBase, but necessarily in your install >>> of >>>>> HDFS. >>>>>>>>>>>>>>> - Which version of HDFS are you using as backing >>>>>>> filesystem? >>>>>>>>>>>>>>> - If your disk is idle, it means the data fits into >>>>> the OS >>>>>>>>> buffer >>>>>>>>>>>>> cache. >>>>>>>>>>>>>> In turn that means that you increase the heap for the >>>>> region >>>>>>>>>> servers. You >>>>>>>>>>>>>> can also use block encoding (FAST_DIFF) to try to >>> make >>>>> sure >>>>>>> the >>>>>>>>>> entire >>>>>>>>>>>>>> working set fits into the cache. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Also try to reduce the block size - although if >>> your >>>>>>> overall >>>>>>>>>> working >>>>>>>>>>>>>> set does not fit in the heap it won't help much. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is a good section of the book to read through >>>>>>> generally >>>>>>>>>> (even >>>>>>>>>>>>>> though you might know most of this already): >>> http://hbase.apache.org/book.html#perf.configurations >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- Lars >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com> >>>>>>>>>>>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org >>>> >>>>>>>>>>>>>>> Cc: >>>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 11:27 AM >>>>>>>>>>>>>>> Subject: RE: HBase read performance >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I do see a very brief spike in CPU (user/system), >>> but >>>>> it's >>>>>>> no >>>>>>>>>> where >>>>>>>>>>>>> near >>>>>>>>>>>>>> 0% idle. It goes from 99% idle down to something >>> like 40% >>>>>>> idle >>>>>>>>> for >>>>>>>>>> a >>>>>>>>>>>>> second >>>>>>>>>>>>>> or so. The thing to note, this is all on a test >>> cluster, >>>>> so >>>>>>> no >>>>>>>>>> real load. >>>>>>>>>>>>>> Things are generally idle until i issue 2-3 of these >>>>>>>>>> multi-scan-requests >>>>>>>>>>>>> to >>>>>>>>>>>>>> render a web page. Then, you see the spike in the >>> cpu and >>>>>>> some >>>>>>>>>> activity >>>>>>>>>>>>> in >>>>>>>>>>>>>> the network and disk, but nowhere near saturation. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If there are specific tests you'd like me to do to >>>>> debug >>>>>>> this, >>>>>>>>>> I'd be >>>>>>>>>>>>>> more than happy to do it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Khaled >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 11:15:59 -0700 >>>>>>>>>>>>>>>> From: la...@apache.org >>>>>>>>>>>>>>>> Subject: Re: HBase read performance >>>>>>>>>>>>>>>> To: user@hbase.apache.org >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I still think you're waiting on disk. No IOWAIT? >>> So >>>>> CPU >>>>>>> is not >>>>>>>>>> waiting >>>>>>>>>>>>>> a lot for IO. No high User/System CPU either? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you see a lot of evicted block then each RPC >>> has a >>>>> high >>>>>>>>>> chance of >>>>>>>>>>>>>> requiring to bring an entire 64k block in. You'll >>> see bad >>>>>>>>>> performance >>>>>>>>>>>>> with >>>>>>>>>>>>>> this. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We might need to trace this specific scenario. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- Lars >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ________________________________ >>>>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com> >>>>>>>>>>>>>>>> To: "user@hbase.apache.org" < >>> user@hbase.apache.org> >>>>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 10:46 AM >>>>>>>>>>>>>>>> Subject: RE: HBase read performance >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've set the heap size to 6GB and I do have gc >>>>> logging. No >>>>>>>>> long >>>>>>>>>> pauses >>>>>>>>>>>>>> there -- occasional 0.1s or 0.2s. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Other than the discrepancy between what's >>> reported on >>>>> the >>>>>>>>>> client and >>>>>>>>>>>>>> what's reported at the RS, there is also the issue >>> of not >>>>>>> getting >>>>>>>>>> proper >>>>>>>>>>>>>> concurrency. So, even if a reverse get takes 100ms >>> or so >>>>>>> (this >>>>>>>>> has >>>>>>>>>> to be >>>>>>>>>>>>>> mostly blocking on various things as no physical >>>>> resource is >>>>>>>>>> contended), >>>>>>>>>>>>>> then the other gets/scans should be able to proceed >>> in >>>>>>> parallel, >>>>>>>>>> so a >>>>>>>>>>>>>> thousand concurrent gets/scans should finish in few >>>>> hundreds >>>>>>> of >>>>>>>>> ms >>>>>>>>>> not >>>>>>>>>>>>> many >>>>>>>>>>>>>> seconds. That's why I thought I'd increase the >>> handlers >>>>>>> count to >>>>>>>>>> try to >>>>>>>>>>>>> get >>>>>>>>>>>>>> more concurrency, but it didn't help. So, there must >>> be >>>>>>> something >>>>>>>>>> else. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Khaled >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>>>>>> From: ndimi...@gmail.com >>>>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 10:36:39 -0700 >>>>>>>>>>>>>>>>> Subject: Re: HBase read performance >>>>>>>>>>>>>>>>> To: user@hbase.apache.org >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do check again on the heap size of the region >>>>> servers. >>>>>>> The >>>>>>>>>> default >>>>>>>>>>>>>>>>> unconfigured size is 1G; too small for much of >>>>> anything. >>>>>>>>> Check >>>>>>>>>> your >>>>>>>>>>>>> RS >>>>>>>>>>>>>> logs >>>>>>>>>>>>>>>>> -- look for lines produced by the JVMPauseMonitor >>>>> thread. >>>>>>>>> They >>>>>>>>>>>>> usually >>>>>>>>>>>>>>>>> correlate with long GC pauses or other >>> process-freeze >>>>>>> events. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Get is implemented as a Scan of a single row, so >>> a >>>>>>> reverse >>>>>>>>>> scan of a >>>>>>>>>>>>>> single >>>>>>>>>>>>>>>>> row should be functionally equivalent. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In practice, I have seen discrepancy between the >>>>>>> latencies >>>>>>>>>> reported >>>>>>>>>>>>> by >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> RS and the latencies experienced by the client. >>> I've >>>>> not >>>>>>>>>> investigated >>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> area thoroughly. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Oct 2, 2014 at 10:05 AM, Khaled >>> Elmeleegy < >>>>>>>>>> kd...@hotmail.com >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks Lars for your quick reply. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yes performance is similar with less handlers (I >>>>> tried >>>>>>> with >>>>>>>>>> 100 >>>>>>>>>>>>>> first). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The payload is not big ~1KB or so. The working >>> set >>>>>>> doesn't >>>>>>>>>> seem to >>>>>>>>>>>>>> fit in >>>>>>>>>>>>>>>>>> memory as there are many cache misses. However, >>>>> disk is >>>>>>> far >>>>>>>>>> from >>>>>>>>>>>>>> being a >>>>>>>>>>>>>>>>>> bottleneck. I checked using iostat. I also >>> verified >>>>> that >>>>>>>>>> neither the >>>>>>>>>>>>>>>>>> network nor the CPU of the region server or the >>>>> client >>>>>>> are a >>>>>>>>>>>>>> bottleneck. >>>>>>>>>>>>>>>>>> This leads me to believe that likely this is a >>>>> software >>>>>>>>>> bottleneck, >>>>>>>>>>>>>>>>>> possibly due to a misconfiguration on my side. I >>>>> just >>>>>>> don't >>>>>>>>>> know how >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> debug it. A clear disconnect I see is the >>> individual >>>>>>> request >>>>>>>>>> latency >>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>> reported by metrics on the region server (IPC >>>>>>>>> processCallTime >>>>>>>>>> vs >>>>>>>>>>>>>> scanNext) >>>>>>>>>>>>>>>>>> vs what's measured on the client. Does this >>> sound >>>>>>> right? Any >>>>>>>>>> ideas >>>>>>>>>>>>> on >>>>>>>>>>>>>> how >>>>>>>>>>>>>>>>>> to better debug it? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> About this trick with the timestamps to be able >>> to >>>>> do a >>>>>>>>>> forward >>>>>>>>>>>>> scan, >>>>>>>>>>>>>>>>>> thanks for pointing it out. Actually, I am >>> aware of >>>>> it. >>>>>>> The >>>>>>>>>> problem >>>>>>>>>>>>> I >>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>> is, sometimes I want to get the key after a >>>>> particular >>>>>>>>>> timestamp and >>>>>>>>>>>>>>>>>> sometimes I want to get the key before, so just >>>>> relying >>>>>>> on >>>>>>>>>> the key >>>>>>>>>>>>>> order >>>>>>>>>>>>>>>>>> doesn't work. Ideally, I want a reverse get(). I >>>>> thought >>>>>>>>>> reverse >>>>>>>>>>>>> scan >>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>> do the trick though. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Khaled >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>>>>>>>> Date: Thu, 2 Oct 2014 09:40:37 -0700 >>>>>>>>>>>>>>>>>>> From: la...@apache.org >>>>>>>>>>>>>>>>>>> Subject: Re: HBase read performance >>>>>>>>>>>>>>>>>>> To: user@hbase.apache.org >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Khaled, >>>>>>>>>>>>>>>>>>> is it the same with fewer threads? 1500 handler >>>>> threads >>>>>>>>>> seems to >>>>>>>>>>>>> be a >>>>>>>>>>>>>>>>>> lot. Typically a good number of threads depends >>> on >>>>> the >>>>>>>>>> hardware >>>>>>>>>>>>>> (number of >>>>>>>>>>>>>>>>>> cores, number of spindles, etc). I cannot think >>> of >>>>> any >>>>>>> type >>>>>>>>> of >>>>>>>>>>>>>> scenario >>>>>>>>>>>>>>>>>> where more than 100 would give any improvement. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How large is the payload per KV retrieved that >>>>> way? If >>>>>>>>> large >>>>>>>>>> (as >>>>>>>>>>>>> in a >>>>>>>>>>>>>>>>>> few 100k) you definitely want to lower the >>> number >>>>> of the >>>>>>>>>> handler >>>>>>>>>>>>>> threads. >>>>>>>>>>>>>>>>>>> How much heap do you give the region server? >>> Does >>>>> the >>>>>>>>>> working set >>>>>>>>>>>>> fit >>>>>>>>>>>>>>>>>> into the cache? (i.e. in the metrics, do you >>> see the >>>>>>>>> eviction >>>>>>>>>> count >>>>>>>>>>>>>> going >>>>>>>>>>>>>>>>>> up, if so it does not fit into the cache). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If the working set does not fit into the cache >>>>>>> (eviction >>>>>>>>>> count goes >>>>>>>>>>>>>> up) >>>>>>>>>>>>>>>>>> then HBase will need to bring a new block in >>> from >>>>> disk >>>>>>> on >>>>>>>>>> each Get >>>>>>>>>>>>>>>>>> (assuming the Gets are more or less random as >>> far >>>>> as the >>>>>>>>>> server is >>>>>>>>>>>>>>>>>> concerned). >>>>>>>>>>>>>>>>>>> In case you'll benefit from reducing the HFile >>>>> block >>>>>>> size >>>>>>>>>> (from 64k >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> 8k or even 4k). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Lastly I don't think we tested the performance >>> of >>>>> using >>>>>>>>>> reverse >>>>>>>>>>>>> scan >>>>>>>>>>>>>>>>>> this way, there is probably room to optimize >>> this. >>>>>>>>>>>>>>>>>>> Can you restructure your keys to allow forwards >>>>>>> scanning? >>>>>>>>> For >>>>>>>>>>>>> example >>>>>>>>>>>>>>>>>> you could store the time as MAX_LONG-time. Or >>> you >>>>> could >>>>>>>>>> invert all >>>>>>>>>>>>>> the bits >>>>>>>>>>>>>>>>>> of the time portion of the key, so that it sort >>> the >>>>>>> other >>>>>>>>>> way. Then >>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>> could do a forward scan. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Let us know how it goes. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> -- Lars >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>>>>>>> From: Khaled Elmeleegy <kd...@hotmail.com> >>>>>>>>>>>>>>>>>>> To: "user@hbase.apache.org" < >>> user@hbase.apache.org >>>>>> >>>>>>>>>>>>>>>>>>> Cc: >>>>>>>>>>>>>>>>>>> Sent: Thursday, October 2, 2014 12:12 AM >>>>>>>>>>>>>>>>>>> Subject: HBase read performance >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am trying to do a scatter/gather on hbase >>>>> (0.98.6.1), >>>>>>>>>> where I >>>>>>>>>>>>> have >>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>> client reading ~1000 keys from an HBase table. >>> These >>>>>>> keys >>>>>>>>>> happen to >>>>>>>>>>>>>> fall on >>>>>>>>>>>>>>>>>> the same region server. For my reads I use >>> reverse >>>>> scan >>>>>>> to >>>>>>>>>> read each >>>>>>>>>>>>>> key as >>>>>>>>>>>>>>>>>> I want the key prior to a specific time stamp >>> (time >>>>>>> stamps >>>>>>>>> are >>>>>>>>>>>>> stored >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>> reverse order). I don't believe gets can >>> accomplish >>>>>>> that, >>>>>>>>>> right? so >>>>>>>>>>>>> I >>>>>>>>>>>>>> use >>>>>>>>>>>>>>>>>> scan, with caching set to 1. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I use 2000 reader threads in the client and on >>>>> HBase, >>>>>>> I've >>>>>>>>>> set >>>>>>>>>>>>>>>>>> hbase.regionserver.handler.count to 1500. With >>> this >>>>>>> setup, >>>>>>>>> my >>>>>>>>>>>>> scatter >>>>>>>>>>>>>>>>>> gather is very slow and can take up to 10s in >>> total. >>>>>>> Timing >>>>>>>>> an >>>>>>>>>>>>>> individual >>>>>>>>>>>>>>>>>> getScanner(..) call on the client side, it can >>>>> easily >>>>>>> take >>>>>>>>> few >>>>>>>>>>>>>> hundreds of >>>>>>>>>>>>>>>>>> ms. I also got the following metrics from the >>> region >>>>>>> server >>>>>>>>> in >>>>>>>>>>>>>> question: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "queueCallTime_mean" : 2.190855525775637, >>>>>>>>>>>>>>>>>>> "queueCallTime_median" : 0.0, >>>>>>>>>>>>>>>>>>> "queueCallTime_75th_percentile" : 0.0, >>>>>>>>>>>>>>>>>>> "queueCallTime_95th_percentile" : 1.0, >>>>>>>>>>>>>>>>>>> "queueCallTime_99th_percentile" : >>>>> 556.9799999999818, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "processCallTime_min" : 0, >>>>>>>>>>>>>>>>>>> "processCallTime_max" : 12755, >>>>>>>>>>>>>>>>>>> "processCallTime_mean" : 105.64873440912682, >>>>>>>>>>>>>>>>>>> "processCallTime_median" : 0.0, >>>>>>>>>>>>>>>>>>> "processCallTime_75th_percentile" : 2.0, >>>>>>>>>>>>>>>>>>> "processCallTime_95th_percentile" : 7917.95, >>>>>>>>>>>>>>>>>>> "processCallTime_99th_percentile" : 8876.89, >>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_min" >>>>>>>>>>>>>>>>>> : 89, >>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_max" >>>>>>>>>>>>>>>>>> : 11300, >>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_mean" >>>>>>>>>>>>>>>>>> : 654.4949739797315, >>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_median" >>>>>>>>>>>>>>>>>> : 101.0, >>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_75th_percentile" >>>>>>>>>>>>>>>>>> : 101.0, >>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_95th_percentile" >>>>>>>>>>>>>>>>>> : 101.0, >>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_99th_percentile" >>>>>>>>>>>>>>>>>> : 113.0, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Where "delta" is the name of the table I am >>>>> querying. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> In addition to all this, i monitored the >>> hardware >>>>>>> resources >>>>>>>>>> (CPU, >>>>>>>>>>>>>> disk, >>>>>>>>>>>>>>>>>> and network) of both the client and the region >>>>> server >>>>>>> and >>>>>>>>>> nothing >>>>>>>>>>>>>> seems >>>>>>>>>>>>>>>>>> anywhere near saturation. So I am puzzled by >>> what's >>>>>>> going on >>>>>>>>>> and >>>>>>>>>>>>>> where this >>>>>>>>>>>>>>>>>> time is going. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Few things to note based on the above >>> measurements: >>>>>>> both >>>>>>>>>> medians of >>>>>>>>>>>>>> IPC >>>>>>>>>>>>>>>>>> processCallTime and queueCallTime are basically >>> zero >>>>>>> (ms I >>>>>>>>>> presume, >>>>>>>>>>>>>>>>>> right?). However, scanNext_median is 101 (ms >>> too, >>>>>>> right?). I >>>>>>>>>> am not >>>>>>>>>>>>>> sure >>>>>>>>>>>>>>>>>> how this adds up. Also, even though the 101 >>> figure >>>>> seems >>>>>>>>>>>>> outrageously >>>>>>>>>>>>>> high >>>>>>>>>>>>>>>>>> and I don't know why, still all these scans >>> should >>>>> be >>>>>>>>>> happening in >>>>>>>>>>>>>>>>>> parallel, so the overall call should finish >>> fast, >>>>> given >>>>>>> that >>>>>>>>>> no >>>>>>>>>>>>>> hardware >>>>>>>>>>>>>>>>>> resource is contended, right? but this is not >>> what's >>>>>>>>>> happening, so I >>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>> to be missing something(s). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> So, any help is appreciated there. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Khaled >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> >>>>>> - Andy >>>>>> >>>>>> Problems worthy of attack prove their worth by hitting back. - Piet >>> Hein >>>>>> (via Tom White) >