Thanks James! I've made a JIRA ticket here: https://issues.apache.org/jira/projects/PHOENIX/issues/PHOENIX-4666
This is a priority for us at 23andMe as it substantially affects some of our queries, so we'd be happy to provide a patch if Phoenix maintainers are able to provide some guidance on the design. I've put a question in the JIRA ticket as well regarding the approach to take. On Thu, Mar 15, 2018 at 10:44 AM, James Taylor <jamestay...@apache.org> wrote: > Hi Marcell, > Yes, that's correct - the cache we build for the RHS is only kept around > while the join query is being executed. It'd be interesting to explore > keeping the cache around longer for cases like yours (and probably not too > difficult). We'd need to keep a map that maps the RHS query to its hash > join cache identifier and if found skip the running of the query. Would you > mind filing a JIRA and we can explore further? > Thanks, > James > > On Wed, Mar 14, 2018 at 3:40 PM, Marcell Ortutay <mortu...@23andme.com> > wrote: > >> A quick update--I did some inspection of the Phoenix codebase, and it >> looks like my understanding of the coprocessor cache was incorrect. I >> thought it was meant to be used across queries, eg. that the RHS of the >> join would be saved for subsequent queries. In fact this is not the case, >> the coprocessor cache is meant to live only for the duration of the query. >> This explains the performance difference--Phoenix is re-running a long >> subquery for each join, whereas my direct to HBase script saves those >> results across queries. >> >> On Tue, Mar 13, 2018 at 4:56 PM, Marcell Ortutay <mortu...@23andme.com> >> wrote: >> >>> Hi James, >>> >>> Thanks for the tips. Our row keys are (I think) reasonably optimized. >>> I've made a gist which is an anonymized version of the query, and it >>> indicates which conditions are / are not part of the PK. It is here: >>> https://gist.github.com/ortutay23andme/12f03767db13343ee797c328a4d78c9c >>> >>> I don't (yet) have an anonymized version of the raw HBase Go script >>> available, but after comparing the performance of the two, I've figured out >>> the root cause. The query does a subquery to produce the LHS of one of the >>> hash joins. This can be seen on L5 of the gist above. This subquery is >>> quite long (~1sec) to execute and scans a few million rows. It is shared >>> across all queries so in the raw HBase script I cached / re-used it for all >>> queries. This has a (very large) performance benefit, in particular under >>> high load. >>> >>> My understanding of Phoenix is that it is supposed to do the same thing. >>> It seems like the hash join code has some mechanic for caching data for >>> hash joining using the HBase coprocessor system. I would expect this cache >>> to kick in, and only execute the large subquery once. Looking at the >>> performance of the query (30sec timeouts after ~2qps), this doesn't seem to >>> be happening. >>> >>> I'm wondering if my understanding of the Phoenix join cache is right. Is >>> it correct to expect that it would cache the results of a subquery used in >>> a join? If so, what are possible reasons why it would *not* do so? Any >>> guidance on metrics / optimizations to look at would be appreciated. >>> >>> Thanks, >>> Marcell >>> >>> On Thu, Mar 8, 2018 at 2:59 PM, James Taylor <jamestay...@apache.org> >>> wrote: >>> >>>> Hi Marcell, >>>> It'd be helpful to see the table DDL and the query too along with an >>>> idea of how many regions might be involved in the query. If a query is a >>>> commonly run query, usually you'll design the row key around optimizing it. >>>> If you have other, simpler queries that have determined your row key, then >>>> another alternative is to add one or more secondary indexes. Another common >>>> technique is to denormalize your data in ways that precompute the join to >>>> avoid having to do it at run time. >>>> >>>> With joins, make sure to order your tables from post filtered largest >>>> (on LHS) to smallest (on RHS). Also, if you're joining on the PK of both >>>> tables, you should use the USE_SORT_MERGE_JOIN hint. Another common tuning >>>> exercise is around determining the best parallelization to use (i.e. >>>> guidepost width) or even disabling parallelization for more than an entire >>>> region's worth of data. >>>> >>>> It'd also be interesting to see the raw HBase code for a query of this >>>> complexity. >>>> >>>> Thanks, >>>> James >>>> >>>> On Thu, Mar 8, 2018 at 1:03 PM, Marcell Ortutay <mortu...@23andme.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am using Phoenix at my company for a large query that is meant to be >>>>> run in real time as part of our application. The query involves several >>>>> aggregations, anti-joins, and an inner query. Here is the (anonymized) >>>>> query plan: https://gist.github.com/ortutay23andme/1da620472cc469e >>>>> d2d8a6fdd0cc7eb01 >>>>> >>>>> The query performance on this is not great, it takes about 5sec to >>>>> execute the query, and moreover it performs badly under load. If we run >>>>> ~4qps of this query Phoenix starts to timeout and slow down a lot (queries >>>>> take >30sec). >>>>> >>>>> For comparison, I wrote a simple Go script that runs a similar query >>>>> talking directly to HBase. The performance on it is substantially better. >>>>> It executes in ~1.5sec, and can handle loads of ~50-100qps on the same >>>>> cluster. >>>>> >>>>> I'm wondering if anyone has ideas on what might be causing this >>>>> difference in performance? Are there configs / optimizations we can do in >>>>> Phoenix to bring the performance closer to direct HBase queries? >>>>> >>>>> I can provide context on the table sizes etc. if needed. >>>>> >>>>> Thanks, >>>>> Marcell >>>>> >>>>> >>>> >>> >> >