-u...@hbase.apache.org,+user@phoenix.apache.org Are you trying to use the secondary index directly in the join query (VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX) or is it just another table? as we don't support direct reference to the secondary index, so either you need to use HINTs or refactor your query/schema to use indices automatically.
Regards, Ankit Singhal On Mon, Oct 11, 2021 at 2:08 PM Josh Elser <els...@apache.org> wrote: > No worries. Thanks for confirming! > > On 10/10/21 1:43 PM, Simon Mottram wrote: > > Hi > > > > Thanks for the reply, I posted here by mistake and wasn't sure how to > delete. It's indeed a problem with phoenix > > > > Sorry to waste your time > > > > Cheers > > > > S > > > > > > > > > > ________________________________ > > From: Josh Elser <els...@apache.org> > > Sent: Saturday, 9 October 2021 3:25 am > > To: u...@hbase.apache.org > > Subject: Re: Major problem for us with Phoenix joins with certain > aggregations > > > > That error sounds like a bug in Phoenix. > > > > Maybe you could try with a newer version of Phoenix? Asking over on > > user@phoenix might net a better result. > > > > On 9/27/21 11:47 PM, Simon Mottram wrote: > >> Forgot to mention this is only an issue for LAST_VALUE (so far!) > >> > >> This works fine > >> > >> SELECT > >> "BIOMATERIAL_NAME", > >> AVG("PLANT_FRUIT_COUNT") > >> FROM > >> VARIABLE_VALUES_QA.OBSERVATION > >> WHERE > >> EXISTS ( > >> SELECT > >> DOCID > >> FROM > >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > >> WHERE > >> DOCID = OBSERVATION_VALUE_ID > >> AND TAGNAME = 'TRIAL_ID' > >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe') > >> GROUP BY > >> "BIOMATERIAL_NAME" > >> LIMIT 10 > >> OFFSET 0; > >> ________________________________ > >> From: Simon Mottram <simon.mott...@cucumber.co.nz> > >> Sent: 28 September 2021 4:34 PM > >> To: u...@hbase.apache.org <u...@hbase.apache.org> > >> Subject: Major problem for us with Phoenix joins with certain > aggregations > >> > >> Hi > >> > >> Got my fingers crossed that there's a work around for this as this > really is a big problem for us > >> > >> We are using: > >> > >> Amazon EMR > >> > >> Release label:emr-6.1.0 > >> Hadoop distribution:Amazon > >> Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0 > >> > >> Thin Client version: > >> phoenix-5.0.0-HBase-2.0-thin-client.jar > >> > >> We get the following error when doing an aggregation where > >> > >> 1. A JOIN is empty > >> 2. The column is INTEGER or DATETIME > >> > >> Remote driver error: IllegalArgumentException: offset (25) + length (4) > exceed the capacity of the array: 25 > >> > >> The query that breaks is: > >> > >> SELECT > >> "BIOMATERIAL_NAME", > >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > >> FROM > >> VARIABLE_VALUES_QA.OBSERVATION > >> JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > >> ON DOCID = OBSERVATION_VALUE_ID > >> AND TAGNAME = 'TRIAL_ID' > >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe' > >> GROUP BY > >> "BIOMATERIAL_NAME" > >> LIMIT 10 > >> OFFSET 0; > >> > >> I can refactor this using EXIST but get same error, presumably the > driver knows to treat them the same: > >> > >> SELECT > >> "BIOMATERIAL_NAME", > >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > >> FROM > >> VARIABLE_VALUES_QA.OBSERVATION > >> WHERE > >> EXISTS ( > >> SELECT > >> DOCID > >> FROM > >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > >> WHERE > >> DOCID = OBSERVATION_VALUE_ID > >> AND TAGNAME = 'TRIAL_ID' > >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe') > >> GROUP BY > >> "BIOMATERIAL_NAME" > >> LIMIT 10 > >> OFFSET 0; > >> > >> If we remove the external reference we get no error, regardless of > whether there are any hits or not > >> > >> -- these all work > >> There are no hits for this query > >> > >> SELECT > >> "BIOMATERIAL_NAME", > >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > >> FROM > >> VARIABLE_VALUES_QA.OBSERVATION > >> WHERE > >> BIOMATERIAL_TYPE = 'aardvark' > >> GROUP BY > >> "BIOMATERIAL_NAME" > >> LIMIT 10 > >> OFFSET 0; > >> > >> Lots of hits for this query: > >> > >> SELECT > >> "BIOMATERIAL_NAME", > >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > >> FROM > >> VARIABLE_VALUES_QA.OBSERVATION > >> GROUP BY > >> "BIOMATERIAL_NAME" > >> LIMIT 10 > >> OFFSET 0; > >> > >> > >> > >> > >> > > >