Hiya, no it's not a phoenix index. It's my own table.

As it turns out, the bug turns up when there is any filtering, either via join 
or via a where clause.

Check my last example which has a simple Where TRUE


________________________________
From: Ankit Singhal <ankitsingha...@gmail.com>
Sent: Tuesday, 12 October 2021 12:30 pm
To: user
Subject: Re: Major problem for us with Phoenix joins with certain aggregations

-u...@hbase.apache.org<mailto:u...@hbase.apache.org>,+user@phoenix.apache.org<mailto:user@phoenix.apache.org>

Are you trying to use the secondary index directly in the join query 
(VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX) or is it just another table? as we 
don't support direct reference to the secondary index, so either you need to 
use HINTs or refactor your query/schema to use indices automatically.

Regards,
Ankit Singhal

On Mon, Oct 11, 2021 at 2:08 PM Josh Elser 
<els...@apache.org<mailto:els...@apache.org>> wrote:
No worries. Thanks for confirming!

On 10/10/21 1:43 PM, Simon Mottram wrote:
> Hi
>
> Thanks for the reply, I posted here by mistake and wasn't sure how to delete. 
>  It's indeed a problem with phoenix
>
> Sorry to waste your time
>
> Cheers
>
> S
>
>
>
>
> ________________________________
> From: Josh Elser <els...@apache.org<mailto:els...@apache.org>>
> Sent: Saturday, 9 October 2021 3:25 am
> To: u...@hbase.apache.org<mailto:u...@hbase.apache.org>
> Subject: Re: Major problem for us with Phoenix joins with certain aggregations
>
> That error sounds like a bug in Phoenix.
>
> Maybe you could try with a newer version of Phoenix? Asking over on
> user@phoenix might net a better result.
>
> On 9/27/21 11:47 PM, Simon Mottram wrote:
>> Forgot to mention this is only an issue for LAST_VALUE (so far!)
>>
>> This works fine
>>
>>    SELECT
>> "BIOMATERIAL_NAME",
>> AVG("PLANT_FRUIT_COUNT")
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> WHERE
>> EXISTS (
>> SELECT
>> DOCID
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
>> WHERE
>> DOCID = OBSERVATION_VALUE_ID
>> AND TAGNAME = 'TRIAL_ID'
>> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>> ________________________________
>> From: Simon Mottram 
>> <simon.mott...@cucumber.co.nz<mailto:simon.mott...@cucumber.co.nz>>
>> Sent: 28 September 2021 4:34 PM
>> To: u...@hbase.apache.org<mailto:u...@hbase.apache.org> 
>> <u...@hbase.apache.org<mailto:u...@hbase.apache.org>>
>> Subject: Major problem for us with Phoenix joins with certain aggregations
>>
>> Hi
>>
>> Got my fingers crossed that there's a work around for this as this really is 
>> a big problem for us
>>
>> We are using:
>>
>> Amazon EMR
>>
>> Release label:emr-6.1.0
>> Hadoop distribution:Amazon
>> Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0
>>
>> Thin Client version:
>> phoenix-5.0.0-HBase-2.0-thin-client.jar
>>
>> We get the following error when doing an aggregation where
>>
>>     1.  A JOIN is empty
>>     2.  The column is INTEGER or DATETIME
>>
>> Remote driver error: IllegalArgumentException: offset (25) + length (4) 
>> exceed the capacity of the array: 25
>>
>> The query that breaks is:
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
>> ON DOCID = OBSERVATION_VALUE_ID
>>     AND TAGNAME = 'TRIAL_ID'
>> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>> I can refactor this using EXIST but get same error, presumably the driver 
>> knows to treat them the same:
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> WHERE
>> EXISTS (
>> SELECT
>> DOCID
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
>> WHERE
>> DOCID = OBSERVATION_VALUE_ID
>> AND TAGNAME = 'TRIAL_ID'
>> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>> If we remove the external reference we get no error, regardless of whether 
>> there are any hits or not
>>
>> -- these all work
>> There are no hits for this query
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> WHERE
>> BIOMATERIAL_TYPE = 'aardvark'
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>> Lots of hits for this query:
>>
>> SELECT
>> "BIOMATERIAL_NAME",
>> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
>> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
>> FROM
>> VARIABLE_VALUES_QA.OBSERVATION
>> GROUP BY
>> "BIOMATERIAL_NAME"
>> LIMIT 10
>>    OFFSET 0;
>>
>>
>>
>>
>>
>

Reply via email to