I'm in the same exact position as Zack described. Appreciate your feedback.

So far we tried the call queue n the handlers, nope. Planned to try off-heap 
cache.

Kumar Palaniappan   

> On Dec 4, 2015, at 6:45 AM, Riesland, Zack <zack.riesl...@sensus.com> wrote:
> 
> Thanks Satish,
>  
> To clarify: I’m not looking up single rows. I’m looking up the history of 
> each widget, which returns hundreds-to-thousands of results per widget (per 
> query).
>  
> Each query is a range scan, it’s just that I’m performing thousands of them.
>  
> From: Satish Iyengar [mailto:sat...@gmail.com] 
> Sent: Friday, December 04, 2015 9:43 AM
> To: user@phoenix.apache.org
> Subject: Re: Help tuning for bursts of high traffic?
>  
> Hi Zack,
>  
> Did you consider avoiding hitting hbase for every single row by doing that 
> step in an offline mode? I was thinking if you could have some kind of daily 
> export of hbase table and then use pig to perform join (co-group perhaps) to 
> do the same. Obviously this would work only when your hbase table is not 
> maintained by stream based system. Hbase is really good at range scans and 
> may not be ideal for single row (large number of).
>  
> Thanks,
> Satish
>  
>  
>  
>  
>  
> On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <zack.riesl...@sensus.com> 
> wrote:
> SHORT EXPLANATION: a much higher percentage of queries to phoenix return 
> exceptionally slow after querying very heavily for several minutes.
>  
> LONGER EXPLANATION:
>  
> I’ve been using Pheonix for about a year as a data store for web-based 
> reporting tools and it works well.
>  
> Now, I’m trying to use the data in a different (much more request-intensive) 
> way and encountering some issues.
>  
> The scenario is basically this:
>  
> Daily, ingest very large CSV files with data for widgets.
>  
> Each input file has hundreds of rows of data for each widget, and tens of 
> thousands of unique widgets.
>  
> As a first step, I want to de-duplicate this data against my Phoenix-based DB 
> (I can’t rely on just upserting the data for de-dup because it will go 
> through several ETL steps before being stored into Phoenix/HBase).
>  
> So, per-widget, I perform a query against Phoenix (the table is keyed against 
> the unique widget ID + sample point). I get all the data for a given widget 
> id, within a certain period of time, and then I only ingest rows for that 
> widget that are new to me.
>  
> I’m doing this in Java in a single step: I loop through my input file and 
> perform one query per widget, using the same Connection object to Phoenix.
>  
> THE ISSUE:
>  
> What I’m finding is that for the first several thousand queries, I almost 
> always get a very fast (less than 10 ms) response (good).
>  
> But after 15-20 thousand queries, the response starts to get MUCH slower. 
> Some queries respond as expected, but many take as many as 2-3 minutes, 
> pushing the total time to prime the data structure into the 12-15 hour range, 
> when it would only take 2-3 hours if all the queries were fast.
>  
> The same exact queries, when run manually and not part of this bulk process, 
> return in the (expected) < 10 ms.
>  
> So it SEEMS like the burst of queries puts Phoenix into some sort of busy 
> state that causes it to respond far too slowly.
>  
> The connection properties I’m setting are:
>  
> Phoenix.query.timeoutMs: 90000
> Phoenix.query.keepAliveMs: 90000
> Phenix.query.threadPoolSize: 256
>  
> Our cluster is 9 (beefy) region servers and the table I’m referencing is 511 
> regions. We went through a lot of pain to get the data split extremely well, 
> and I don’t think Schema design is the issue here.
>  
> Can anyone help me understand how to make this better? Is there a better 
> approach I could take? A better set of configuration parameters? Is our 
> cluster just too small for this?
>  
>  
> Thanks!
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> 
> 
>  
> --
> Satish Iyengar
> 
> "Anyone who has never made a mistake has never tried anything new."
> Albert Einstein

Reply via email to