Hiya, no it's not a phoenix index. It's my own table. As it turns out, the bug turns up when there is any filtering, either via join or via a where clause.
Check my last example which has a simple Where TRUE ________________________________ From: Ankit Singhal <ankitsingha...@gmail.com> Sent: Tuesday, 12 October 2021 12:30 pm To: user Subject: Re: Major problem for us with Phoenix joins with certain aggregations -u...@hbase.apache.org<mailto:u...@hbase.apache.org>,+user@phoenix.apache.org<mailto:user@phoenix.apache.org> Are you trying to use the secondary index directly in the join query (VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX) or is it just another table? as we don't support direct reference to the secondary index, so either you need to use HINTs or refactor your query/schema to use indices automatically. Regards, Ankit Singhal On Mon, Oct 11, 2021 at 2:08 PM Josh Elser <els...@apache.org<mailto:els...@apache.org>> wrote: No worries. Thanks for confirming! On 10/10/21 1:43 PM, Simon Mottram wrote: > Hi > > Thanks for the reply, I posted here by mistake and wasn't sure how to delete. > It's indeed a problem with phoenix > > Sorry to waste your time > > Cheers > > S > > > > > ________________________________ > From: Josh Elser <els...@apache.org<mailto:els...@apache.org>> > Sent: Saturday, 9 October 2021 3:25 am > To: u...@hbase.apache.org<mailto:u...@hbase.apache.org> > Subject: Re: Major problem for us with Phoenix joins with certain aggregations > > That error sounds like a bug in Phoenix. > > Maybe you could try with a newer version of Phoenix? Asking over on > user@phoenix might net a better result. > > On 9/27/21 11:47 PM, Simon Mottram wrote: >> Forgot to mention this is only an issue for LAST_VALUE (so far!) >> >> This works fine >> >> SELECT >> "BIOMATERIAL_NAME", >> AVG("PLANT_FRUIT_COUNT") >> FROM >> VARIABLE_VALUES_QA.OBSERVATION >> WHERE >> EXISTS ( >> SELECT >> DOCID >> FROM >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX >> WHERE >> DOCID = OBSERVATION_VALUE_ID >> AND TAGNAME = 'TRIAL_ID' >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe') >> GROUP BY >> "BIOMATERIAL_NAME" >> LIMIT 10 >> OFFSET 0; >> ________________________________ >> From: Simon Mottram >> <simon.mott...@cucumber.co.nz<mailto:simon.mott...@cucumber.co.nz>> >> Sent: 28 September 2021 4:34 PM >> To: u...@hbase.apache.org<mailto:u...@hbase.apache.org> >> <u...@hbase.apache.org<mailto:u...@hbase.apache.org>> >> Subject: Major problem for us with Phoenix joins with certain aggregations >> >> Hi >> >> Got my fingers crossed that there's a work around for this as this really is >> a big problem for us >> >> We are using: >> >> Amazon EMR >> >> Release label:emr-6.1.0 >> Hadoop distribution:Amazon >> Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0 >> >> Thin Client version: >> phoenix-5.0.0-HBase-2.0-thin-client.jar >> >> We get the following error when doing an aggregation where >> >> 1. A JOIN is empty >> 2. The column is INTEGER or DATETIME >> >> Remote driver error: IllegalArgumentException: offset (25) + length (4) >> exceed the capacity of the array: 25 >> >> The query that breaks is: >> >> SELECT >> "BIOMATERIAL_NAME", >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" >> FROM >> VARIABLE_VALUES_QA.OBSERVATION >> JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX >> ON DOCID = OBSERVATION_VALUE_ID >> AND TAGNAME = 'TRIAL_ID' >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe' >> GROUP BY >> "BIOMATERIAL_NAME" >> LIMIT 10 >> OFFSET 0; >> >> I can refactor this using EXIST but get same error, presumably the driver >> knows to treat them the same: >> >> SELECT >> "BIOMATERIAL_NAME", >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" >> FROM >> VARIABLE_VALUES_QA.OBSERVATION >> WHERE >> EXISTS ( >> SELECT >> DOCID >> FROM >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX >> WHERE >> DOCID = OBSERVATION_VALUE_ID >> AND TAGNAME = 'TRIAL_ID' >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe') >> GROUP BY >> "BIOMATERIAL_NAME" >> LIMIT 10 >> OFFSET 0; >> >> If we remove the external reference we get no error, regardless of whether >> there are any hits or not >> >> -- these all work >> There are no hits for this query >> >> SELECT >> "BIOMATERIAL_NAME", >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" >> FROM >> VARIABLE_VALUES_QA.OBSERVATION >> WHERE >> BIOMATERIAL_TYPE = 'aardvark' >> GROUP BY >> "BIOMATERIAL_NAME" >> LIMIT 10 >> OFFSET 0; >> >> Lots of hits for this query: >> >> SELECT >> "BIOMATERIAL_NAME", >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" >> FROM >> VARIABLE_VALUES_QA.OBSERVATION >> GROUP BY >> "BIOMATERIAL_NAME" >> LIMIT 10 >> OFFSET 0; >> >> >> >> >> >