Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Илья Жарков Sat, 07 Dec 2024 15:04:31 -0800

I've just realized that I replied not to the whole mailing list. This is a
duplicate of mail sent to Andres.


Thank you for the quick response.

Some background using code and concepts of particular languages and
frameworks:

Exactly such queries are automatically built by Hibernate if using
polymorphic queries with fetching associations of sub-classes using
function treat.
https://docs.jboss.org/hibernate/orm/current/userguide/html_single/Hibernate_User_Guide.html#hql-function-treat

Reproducer:

@Entity
@Inheritance
@DiscriminatorValue("A")
class Parent {
    @Id
    Integer id;
}

@Entity
@DiscriminatorValue("B")
class ParentB extends Parent {
    @OneToOne(mappedBy = "parent")
    Child child;
}

@Entity
class Child {
    @Id
    Integer id;

    @OneToOne
    @MapsId
    @JoinColumn(name = "id")
    private ParentB parent;
}

List<Parent> resultList = session.createQuery(
        """
        from Parent p
        left join fetch treat(p as ParentB).child
        where p.id = :id
        """, Parent.class)
        .setParameter("id", 1)
        .getResultList();

вс, 8 дек. 2024 г. в 01:21, Andres Freund <and...@anarazel.de>:

> On 2024-12-07 16:37:32 -0500, Andres Freund wrote:
> > On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:
> > > Note that the only record in *parent *table has dtype == 'A', but the
> join
> > > condition has p.dtype = 'B'.
> > > The query plan still shows Index Only Scan on *child *table with
> loops=1.
> >
> > The relevant difference between the inner and left join is that for the
> inner
> > join we push down the p.dtype = 'B'::text condition down. However, we do
> *not*
> > do so for outer joins.
> >
> > Outer:
> > ┌───────────────────────────────────────────────────┐
> > │                    QUERY PLAN                     │
> > ├───────────────────────────────────────────────────┤
> > │ Nested Loop Left Join                             │
> > │   Join Filter: (p.dtype = 'B'::text)              │
> > │   ->  Index Scan using parent_pkey on parent p    │
> > │         Index Cond: (id = 1)                      │
> > │   ->  Index Only Scan using child_pkey on child c │
> > │         Index Cond: (id = 1)                      │
> > └───────────────────────────────────────────────────┘
> >
> > Inner:
> > ┌───────────────────────────────────────────────────┐
> > │                    QUERY PLAN                     │
> > ├───────────────────────────────────────────────────┤
> > │ Nested Loop                                       │
> > │   ->  Index Scan using parent_pkey on parent p    │
> > │         Index Cond: (id = 1)                      │
> > │         Filter: (dtype = 'B'::text)               │
> > │   ->  Index Only Scan using child_pkey on child c │
> > │         Index Cond: (id = 1)                      │
> > └───────────────────────────────────────────────────┘
> >
> >
> > We *do* have code that recognizes the case where a clause in a join's ON
> only
> > references the nullable side. We however don't have code that recognizes
> the
> > same if it's the non-nullable side.
> >
> > That's somewhat surprising, but it does kinda make sense: A after all,
> in a
> > query like yours, you could just have had the p.dtype = 'B' in the WHERE
> list,
> > rather than inside the join's ON. The same isn't true for the nullable
> side of
> > the join, as a condition for te nullable side in the WHERE clause breaks
> the
> > join's "outerness".
> >
> > I.e. you can write your query as
> >   SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id =
> 1 AND p.dtype = 'B';
> > in which case you get the expected query plan:
> >
> ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> > │                                                         QUERY PLAN
>                                                       │
> >
> ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
> > │ Nested Loop Left Join  (cost=0.31..16.36 rows=1 width=40) (actual
> time=0.035..0.036 rows=0 loops=1)                         │
> > │   ->  Index Scan using parent_pkey on parent p  (cost=0.15..8.17
> rows=1 width=36) (actual time=0.034..0.034 rows=0 loops=1) │
> > │         Index Cond: (id = 1)
>                                                       │
> > │         Filter: (dtype = 'B'::text)
>                                                      │
> > │         Rows Removed by Filter: 1
>                                                      │
> > │   ->  Index Only Scan using child_pkey on child c  (cost=0.15..8.17
> rows=1 width=4) (never executed)                        │
> > │         Index Cond: (id = 1)
>                                                       │
> > │         Heap Fetches: 0
>                                                      │
> > │ Planning Time: 29.912 ms
>                                                       │
> > │ Execution Time: 0.095 ms
>                                                       │
> >
> └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
>
> Ah, wait - that's not actually equivalent. When p.dtype = 'B' inthe ON
> clause,
> we still produce an output row, but not if in the WHERE clause.
>
> So:
>
> > ISTM that it shouldn't be expensive to recognize this type of join
> clause and
> > pushes them down. While it could be done by the query's author, it seems
> worth
> > handling this on our side.  But maybe I'm missing something here?
>
> yes, I was missing something, it would not be a valid transformation.
>
>
> I guess it's still somewhat silly that we do an index scan for the inner
> side,
> even though we could know that we'll always fail to the joinqual.  We could
> evaluate the qual after fetching the outer row, before fetching the
> matching
> inner row.
>
> It's might not be worth adding code to handle such cases to the the nested
> loop code, this is probably not that common a query pattern. If we don't
> want
> to have explict execution time code paths, we could emit a "constant
> qualification" Result node above the inner side of a parametrized nested
> loop?
>
> Greetings,
>
> Andres Freund
>

Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table

Reply via email to