[h2] Re: USE INDEX confusion with joins, v1.4.195

hendrik Sat, 23 Dec 2017 04:25:42 -0800

BTW, when appending NULLS LAST or NULLS FIRST to any of these queries, no 
index is ever used for sorting. And even worse, H2 does not even do the 
efficient join anymore, but uses a tablescan.


explain select A.b_id from A inner join B use index (ab) on A.b_id = B.id 
where A.id = 1 order by B.a nulls first, B.b nulls first;

SELECT
    A.B_ID
FROM PUBLIC.B USE INDEX (AB)
    /* PUBLIC.B.tableScan */
INNER JOIN PUBLIC.A
    /* PUBLIC.PRIMARY_KEY_41_0: ID = 1 */
    ON 1=1
WHERE (A.ID = 1)
    AND (A.B_ID = B.ID)
ORDER BY =B.A NULLS FIRST, =B.B NULLS FIRST

(NULLS LAST is explained the same way)

It seems to me that H2 should use the index with NULLS FIRST, because 
that's the natural order anyway (at  least for ASC). Looking into the code, 
I'm under the impression that a small change in 
org.h2.command.dml.Select#getSortIndex() 
could fix this. It says there:

if (idxCol.sortType != sortTypes[j]) {
    // NULL FIRST for ascending and NULLS LAST
    // for descending would actually match the default
    ok = false;
    break;
}

Apparently the logic suggested in the comment is simply not implemented yet. 
Perhaps this would do it:


if ((idxCol.sortType & SortOrder.DESCENDING) == SortOrder.DESCENDING && 
(idxCol.sortType & SortOrder.NULLS_LAST) == SortOrder.NULLS_LAST
        || (idxCol.sortType & SortOrder.DESCENDING) == SortOrder.ASCENDING && 
(idxCol.sortType & SortOrder.NULLS_FIRST) == SortOrder.NULLS_FIRST) {
    // this is fine
} else if (idxCol.sortType != sortTypes[j]) {
    ok = false;
    break;
}


Also, if H2 realizes that it has to do a tablescan, because of NULLS LAST, it 
should revert to using the efficient join (the standard query plan). I 
understand that the latter is counter to the very idea to specifying an index 
hint and therefore not feasible.


Cheers,

-hendrik


On Thursday, December 21, 2017 at 6:59:08 PM UTC+1, hendrik wrote:
>
> Hey there,
>
> I've been playing with H2 1.4.195, trying to get an ordered resultsset 
> from a query that contains a join.
>
> In the plain vanilla case, H2 tends to create a resultset containing 
> everything from the join plus the sort columns and then sort the set in 
> memory. Here's an example for what I mean:
>
> drop table A if exists;
> drop table B if exists;
>
> create table if not exists B (id bigint, a varchar, b varchar, c varchar, 
> primary key (id));
> create table if not exists A (id bigint, b_id bigint, primary key (id), 
> foreign key (b_id) references B(id));
>
> create index ab on B (a,b);
> create index c on B (c);
>
> So we have two tables A and B. A references B. Primary keys on the id 
> fields. A foreign key on the... well, foreign key.
> Here's my typical query:
>
> select A.b_id from A inner join B on A.b_id = B.id where A.id = 1 order by 
> B.a, B.b;
>
> As mentioned above, I'm using columns B.a and B.b for sorting, but they 
> are not selected. When run with EXPLAIN, this is the result:
>
> SELECT
>     A.B_ID
> FROM PUBLIC.A
>     /* PUBLIC.PRIMARY_KEY_41_0: ID = 1 */
>     /* WHERE A.ID = 1
>     */
> INNER JOIN PUBLIC.B
>     /* PUBLIC.PRIMARY_KEY_4: ID = A.B_ID
>         AND ID = A.B_ID
>      */
>     ON 1=1
> WHERE (A.ID = 1)
>     AND (A.B_ID = B.ID)
> ORDER BY =B.A, =B.B
>
> The join is as efficient as possible, but the sort happens in memory, 
> since the "/* index sorted */" is missing and EXPLAIN does not mention 
> the index "ab", which could have been used.
> So far so good. For some of my data the efficiency of the join is not 
> important (almost all rows from B are returned anyway). What's important to 
> me though is that an index is used for sorting.
> So I've tried to use USE INDEX.
>
> explain select A.b_id from A inner join B use index (ab) on A.b_id = B.id 
> where A.id = 1 order by B.a, B.b;
>
> results in:
>
> SELECT
>     A.B_ID
> FROM PUBLIC.B USE INDEX (AB)
>     /* PUBLIC.AB */
> INNER JOIN PUBLIC.A
>     /* PUBLIC.PRIMARY_KEY_41_0: ID = 1 */
>     ON 1=1
> WHERE (A.ID = 1)
>     AND (A.B_ID = B.ID)
> ORDER BY =B.A, =B.B
> /* index sorted */
>
> Awesome! That's what I wanted.
> But now comes the part that's confusing to me. When I indicate that I want 
> to use index "c"—defined on the column B.c that has nothing to do with my 
> query whatsoever, H2 still uses the correct index (ab):
>
> explain select A.b_id from A inner join B use index (c) on A.b_id = B.id  
> where A.id = 1 order by B.a, B.b;
>
> SELECT
>     A.B_ID
> FROM PUBLIC.B USE INDEX (C)
>     /* PUBLIC.AB */
> INNER JOIN PUBLIC.A
>     /* PUBLIC.PRIMARY_KEY_41_0: ID = 1 */
>     ON 1=1
> WHERE (A.ID = 1)
>     AND (A.B_ID = B.ID)
> ORDER BY =B.A, =B.B
> /* index sorted */
>
> It seems to me, that the USE INDEX hint is disregarded a little, but it's 
> still not the same as without the hint, because magically the best index 
> (ab) is used. So that's kind of odd.
>
> Is this by design? Is this a feature (because the best index is chosen)? 
> Or is this a bug?
>
> Cheers,
>
> -hendrik
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

[h2] Re: USE INDEX confusion with joins, v1.4.195

Reply via email to