No, I was using reverse scan to mimic the behavior of reverse get.

----------------------------------------
> Date: Thu, 2 Oct 2014 10:50:15 -0700
> Subject: Re: HBase read performance
> From: yuzhih...@gmail.com
> To: user@hbase.apache.org
>
> Khaled:
> Were you using this method from HTable ?
>
> public Result[] get(List<Get> gets) throws IOException {
>
> Cheers
>
> On Thu, Oct 2, 2014 at 10:46 AM, Khaled Elmeleegy <kd...@hotmail.com> wrote:
>
>> I've set the heap size to 6GB and I do have gc logging. No long pauses
>> there -- occasional 0.1s or 0.2s.
>>
>> Other than the discrepancy between what's reported on the client and
>> what's reported at the RS, there is also the issue of not getting proper
>> concurrency. So, even if a reverse get takes 100ms or so (this has to be
>> mostly blocking on various things as no physical resource is contended),
>> then the other gets/scans should be able to proceed in parallel, so a
>> thousand concurrent gets/scans should finish in few hundreds of ms not many
>> seconds. That's why I thought I'd increase the handlers count to try to get
>> more concurrency, but it didn't help. So, there must be something else.
>>
>> Khaled
>>
>> ----------------------------------------
>>> From: ndimi...@gmail.com
>>> Date: Thu, 2 Oct 2014 10:36:39 -0700
>>> Subject: Re: HBase read performance
>>> To: user@hbase.apache.org
>>>
>>> Do check again on the heap size of the region servers. The default
>>> unconfigured size is 1G; too small for much of anything. Check your RS
>> logs
>>> -- look for lines produced by the JVMPauseMonitor thread. They usually
>>> correlate with long GC pauses or other process-freeze events.
>>>
>>> Get is implemented as a Scan of a single row, so a reverse scan of a
>> single
>>> row should be functionally equivalent.
>>>
>>> In practice, I have seen discrepancy between the latencies reported by
>> the
>>> RS and the latencies experienced by the client. I've not investigated
>> this
>>> area thoroughly.
>>>
>>> On Thu, Oct 2, 2014 at 10:05 AM, Khaled Elmeleegy <kd...@hotmail.com>
>> wrote:
>>>
>>>> Thanks Lars for your quick reply.
>>>>
>>>> Yes performance is similar with less handlers (I tried with 100 first).
>>>>
>>>> The payload is not big ~1KB or so. The working set doesn't seem to fit
>> in
>>>> memory as there are many cache misses. However, disk is far from being a
>>>> bottleneck. I checked using iostat. I also verified that neither the
>>>> network nor the CPU of the region server or the client are a bottleneck.
>>>> This leads me to believe that likely this is a software bottleneck,
>>>> possibly due to a misconfiguration on my side. I just don't know how to
>>>> debug it. A clear disconnect I see is the individual request latency as
>>>> reported by metrics on the region server (IPC processCallTime vs
>> scanNext)
>>>> vs what's measured on the client. Does this sound right? Any ideas on
>> how
>>>> to better debug it?
>>>>
>>>> About this trick with the timestamps to be able to do a forward scan,
>>>> thanks for pointing it out. Actually, I am aware of it. The problem I
>> have
>>>> is, sometimes I want to get the key after a particular timestamp and
>>>> sometimes I want to get the key before, so just relying on the key order
>>>> doesn't work. Ideally, I want a reverse get(). I thought reverse scan
>> can
>>>> do the trick though.
>>>>
>>>> Khaled
>>>>
>>>> ----------------------------------------
>>>>> Date: Thu, 2 Oct 2014 09:40:37 -0700
>>>>> From: la...@apache.org
>>>>> Subject: Re: HBase read performance
>>>>> To: user@hbase.apache.org
>>>>>
>>>>> Hi Khaled,
>>>>> is it the same with fewer threads? 1500 handler threads seems to be a
>>>> lot. Typically a good number of threads depends on the hardware (number
>> of
>>>> cores, number of spindles, etc). I cannot think of any type of scenario
>>>> where more than 100 would give any improvement.
>>>>>
>>>>> How large is the payload per KV retrieved that way? If large (as in a
>>>> few 100k) you definitely want to lower the number of the handler
>> threads.
>>>>> How much heap do you give the region server? Does the working set fit
>>>> into the cache? (i.e. in the metrics, do you see the eviction count
>> going
>>>> up, if so it does not fit into the cache).
>>>>>
>>>>> If the working set does not fit into the cache (eviction count goes up)
>>>> then HBase will need to bring a new block in from disk on each Get
>>>> (assuming the Gets are more or less random as far as the server is
>>>> concerned).
>>>>> In case you'll benefit from reducing the HFile block size (from 64k to
>>>> 8k or even 4k).
>>>>>
>>>>> Lastly I don't think we tested the performance of using reverse scan
>>>> this way, there is probably room to optimize this.
>>>>> Can you restructure your keys to allow forwards scanning? For example
>>>> you could store the time as MAX_LONG-time. Or you could invert all the
>> bits
>>>> of the time portion of the key, so that it sort the other way. Then you
>>>> could do a forward scan.
>>>>>
>>>>> Let us know how it goes.
>>>>>
>>>>> -- Lars
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>> From: Khaled Elmeleegy <kd...@hotmail.com>
>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>>>>> Cc:
>>>>> Sent: Thursday, October 2, 2014 12:12 AM
>>>>> Subject: HBase read performance
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to do a scatter/gather on hbase (0.98.6.1), where I have a
>>>> client reading ~1000 keys from an HBase table. These keys happen to
>> fall on
>>>> the same region server. For my reads I use reverse scan to read each
>> key as
>>>> I want the key prior to a specific time stamp (time stamps are stored in
>>>> reverse order). I don't believe gets can accomplish that, right? so I
>> use
>>>> scan, with caching set to 1.
>>>>>
>>>>> I use 2000 reader threads in the client and on HBase, I've set
>>>> hbase.regionserver.handler.count to 1500. With this setup, my scatter
>>>> gather is very slow and can take up to 10s in total. Timing an
>> individual
>>>> getScanner(..) call on the client side, it can easily take few hundreds
>> of
>>>> ms. I also got the following metrics from the region server in question:
>>>>>
>>>>> "queueCallTime_mean" : 2.190855525775637,
>>>>> "queueCallTime_median" : 0.0,
>>>>> "queueCallTime_75th_percentile" : 0.0,
>>>>> "queueCallTime_95th_percentile" : 1.0,
>>>>> "queueCallTime_99th_percentile" : 556.9799999999818,
>>>>>
>>>>> "processCallTime_min" : 0,
>>>>> "processCallTime_max" : 12755,
>>>>> "processCallTime_mean" : 105.64873440912682,
>>>>> "processCallTime_median" : 0.0,
>>>>> "processCallTime_75th_percentile" : 2.0,
>>>>> "processCallTime_95th_percentile" : 7917.95,
>>>>> "processCallTime_99th_percentile" : 8876.89,
>>>>>
>>>>>
>>>>
>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_min"
>>>> : 89,
>>>>>
>>>>
>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_max"
>>>> : 11300,
>>>>>
>>>>
>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_mean"
>>>> : 654.4949739797315,
>>>>>
>>>>
>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_median"
>>>> : 101.0,
>>>>>
>>>>
>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_75th_percentile"
>>>> : 101.0,
>>>>>
>>>>
>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_95th_percentile"
>>>> : 101.0,
>>>>>
>>>>
>> "namespace_default_table_delta_region_87be70d7710f95c05cfcc90181d183b4_metric_scanNext_99th_percentile"
>>>> : 113.0,
>>>>>
>>>>> Where "delta" is the name of the table I am querying.
>>>>>
>>>>> In addition to all this, i monitored the hardware resources (CPU, disk,
>>>> and network) of both the client and the region server and nothing seems
>>>> anywhere near saturation. So I am puzzled by what's going on and where
>> this
>>>> time is going.
>>>>>
>>>>> Few things to note based on the above measurements: both medians of IPC
>>>> processCallTime and queueCallTime are basically zero (ms I presume,
>>>> right?). However, scanNext_median is 101 (ms too, right?). I am not sure
>>>> how this adds up. Also, even though the 101 figure seems outrageously
>> high
>>>> and I don't know why, still all these scans should be happening in
>>>> parallel, so the overall call should finish fast, given that no hardware
>>>> resource is contended, right? but this is not what's happening, so I
>> have
>>>> to be missing something(s).
>>>>>
>>>>> So, any help is appreciated there.
>>>>>
>>>>> Thanks,
>>>>> Khaled
>>>>
>>>>
>>
>>
                                          

Reply via email to