I'm in the same exact position as Zack described. Appreciate your feedback.
So far we tried the call queue n the handlers, nope. Planned to try off-heap cache. Kumar Palaniappan > On Dec 4, 2015, at 6:45 AM, Riesland, Zack <zack.riesl...@sensus.com> wrote: > > Thanks Satish, > > To clarify: I’m not looking up single rows. I’m looking up the history of > each widget, which returns hundreds-to-thousands of results per widget (per > query). > > Each query is a range scan, it’s just that I’m performing thousands of them. > > From: Satish Iyengar [mailto:sat...@gmail.com] > Sent: Friday, December 04, 2015 9:43 AM > To: user@phoenix.apache.org > Subject: Re: Help tuning for bursts of high traffic? > > Hi Zack, > > Did you consider avoiding hitting hbase for every single row by doing that > step in an offline mode? I was thinking if you could have some kind of daily > export of hbase table and then use pig to perform join (co-group perhaps) to > do the same. Obviously this would work only when your hbase table is not > maintained by stream based system. Hbase is really good at range scans and > may not be ideal for single row (large number of). > > Thanks, > Satish > > > > > > On Fri, Dec 4, 2015 at 9:09 AM, Riesland, Zack <zack.riesl...@sensus.com> > wrote: > SHORT EXPLANATION: a much higher percentage of queries to phoenix return > exceptionally slow after querying very heavily for several minutes. > > LONGER EXPLANATION: > > I’ve been using Pheonix for about a year as a data store for web-based > reporting tools and it works well. > > Now, I’m trying to use the data in a different (much more request-intensive) > way and encountering some issues. > > The scenario is basically this: > > Daily, ingest very large CSV files with data for widgets. > > Each input file has hundreds of rows of data for each widget, and tens of > thousands of unique widgets. > > As a first step, I want to de-duplicate this data against my Phoenix-based DB > (I can’t rely on just upserting the data for de-dup because it will go > through several ETL steps before being stored into Phoenix/HBase). > > So, per-widget, I perform a query against Phoenix (the table is keyed against > the unique widget ID + sample point). I get all the data for a given widget > id, within a certain period of time, and then I only ingest rows for that > widget that are new to me. > > I’m doing this in Java in a single step: I loop through my input file and > perform one query per widget, using the same Connection object to Phoenix. > > THE ISSUE: > > What I’m finding is that for the first several thousand queries, I almost > always get a very fast (less than 10 ms) response (good). > > But after 15-20 thousand queries, the response starts to get MUCH slower. > Some queries respond as expected, but many take as many as 2-3 minutes, > pushing the total time to prime the data structure into the 12-15 hour range, > when it would only take 2-3 hours if all the queries were fast. > > The same exact queries, when run manually and not part of this bulk process, > return in the (expected) < 10 ms. > > So it SEEMS like the burst of queries puts Phoenix into some sort of busy > state that causes it to respond far too slowly. > > The connection properties I’m setting are: > > Phoenix.query.timeoutMs: 90000 > Phoenix.query.keepAliveMs: 90000 > Phenix.query.threadPoolSize: 256 > > Our cluster is 9 (beefy) region servers and the table I’m referencing is 511 > regions. We went through a lot of pain to get the data split extremely well, > and I don’t think Schema design is the issue here. > > Can anyone help me understand how to make this better? Is there a better > approach I could take? A better set of configuration parameters? Is our > cluster just too small for this? > > > Thanks! > > > > > > > > > > > > > > -- > Satish Iyengar > > "Anyone who has never made a mistake has never tried anything new." > Albert Einstein