-u...@hbase.apache.org,+user@phoenix.apache.org

Are you trying to use the secondary index directly in the join query
(VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX) or is it just another table? as
we don't support direct reference to the secondary index, so either you
need to use HINTs or refactor your query/schema to use indices
automatically.

Regards,
Ankit Singhal

On Mon, Oct 11, 2021 at 2:08 PM Josh Elser <els...@apache.org> wrote:

> No worries. Thanks for confirming!
>
> On 10/10/21 1:43 PM, Simon Mottram wrote:
> > Hi
> >
> > Thanks for the reply, I posted here by mistake and wasn't sure how to
> delete.  It's indeed a problem with phoenix
> >
> > Sorry to waste your time
> >
> > Cheers
> >
> > S
> >
> >
> >
> >
> > ________________________________
> > From: Josh Elser <els...@apache.org>
> > Sent: Saturday, 9 October 2021 3:25 am
> > To: u...@hbase.apache.org
> > Subject: Re: Major problem for us with Phoenix joins with certain
> aggregations
> >
> > That error sounds like a bug in Phoenix.
> >
> > Maybe you could try with a newer version of Phoenix? Asking over on
> > user@phoenix might net a better result.
> >
> > On 9/27/21 11:47 PM, Simon Mottram wrote:
> >> Forgot to mention this is only an issue for LAST_VALUE (so far!)
> >>
> >> This works fine
> >>
> >>    SELECT
> >> "BIOMATERIAL_NAME",
> >> AVG("PLANT_FRUIT_COUNT")
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> WHERE
> >> EXISTS (
> >> SELECT
> >> DOCID
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> >> WHERE
> >> DOCID = OBSERVATION_VALUE_ID
> >> AND TAGNAME = 'TRIAL_ID'
> >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >> ________________________________
> >> From: Simon Mottram <simon.mott...@cucumber.co.nz>
> >> Sent: 28 September 2021 4:34 PM
> >> To: u...@hbase.apache.org <u...@hbase.apache.org>
> >> Subject: Major problem for us with Phoenix joins with certain
> aggregations
> >>
> >> Hi
> >>
> >> Got my fingers crossed that there's a work around for this as this
> really is a big problem for us
> >>
> >> We are using:
> >>
> >> Amazon EMR
> >>
> >> Release label:emr-6.1.0
> >> Hadoop distribution:Amazon
> >> Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0
> >>
> >> Thin Client version:
> >> phoenix-5.0.0-HBase-2.0-thin-client.jar
> >>
> >> We get the following error when doing an aggregation where
> >>
> >>     1.  A JOIN is empty
> >>     2.  The column is INTEGER or DATETIME
> >>
> >> Remote driver error: IllegalArgumentException: offset (25) + length (4)
> exceed the capacity of the array: 25
> >>
> >> The query that breaks is:
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> >> ON DOCID = OBSERVATION_VALUE_ID
> >>     AND TAGNAME = 'TRIAL_ID'
> >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >> I can refactor this using EXIST but get same error, presumably the
> driver knows to treat them the same:
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> WHERE
> >> EXISTS (
> >> SELECT
> >> DOCID
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> >> WHERE
> >> DOCID = OBSERVATION_VALUE_ID
> >> AND TAGNAME = 'TRIAL_ID'
> >> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >> If we remove the external reference we get no error, regardless of
> whether there are any hits or not
> >>
> >> -- these all work
> >> There are no hits for this query
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> WHERE
> >> BIOMATERIAL_TYPE = 'aardvark'
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >> Lots of hits for this query:
> >>
> >> SELECT
> >> "BIOMATERIAL_NAME",
> >> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> >> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> >> FROM
> >> VARIABLE_VALUES_QA.OBSERVATION
> >> GROUP BY
> >> "BIOMATERIAL_NAME"
> >> LIMIT 10
> >>    OFFSET 0;
> >>
> >>
> >>
> >>
> >>
> >
>

Reply via email to