Re: Consider parallel for lateral subqueries with limit

Brian Davis Sun, 06 Dec 2020 16:34:19 -0800

> Note that near the end of grouping planner we have a similar check:
>
> if (final_rel->consider_parallel && root->query_level > 1 &&
>         !limit_needed(parse))
> 
> guarding copying the partial paths from the current rel to the final
> rel. I haven't managed to come up with a test case that exposes that


Played around with this a bit, here's a non-correlated subquery that gets us to 
that if statement

DROP TABLE IF EXISTS foo;
CREATE TABLE foo (bar int);

INSERT INTO foo (bar)
SELECT
  g
FROM
  generate_series(1, 10000) AS g;


SELECT
  (
    SELECT
      bar
    FROM
      foo
    LIMIT 1
  ) AS y
FROM
  foo;


I also was thinking about the LATERAL part.

I couldn't think of any reason why the uncorrelated subquery's results would 
need to be shared and therefore the same, when we'll be "looping" over each row 
of the source table, running the subquery anew for each, conceptually.

But then I tried this...

test=# CREATE TABLE foo (bar int);
CREATE TABLE
test=#
test=# INSERT INTO foo (bar)
test-# SELECT
test-#   g
test-# FROM
test-#   generate_series(1, 10) AS g;
INSERT 0 10
test=#
test=#
test=# SELECT
test-#   foo.bar,
test-#   lat.bar
test-# FROM
test-#   foo JOIN LATERAL (
test(#     SELECT
test(#       bar
test(#     FROM
test(#       foo AS foo2
test(#     ORDER BY
test(#       random()
test(#     LIMIT 1
test(#   ) AS lat ON true;
 bar | bar
-----+-----
   1 |   7
   2 |   7
   3 |   7
   4 |   7
   5 |   7
   6 |   7
   7 |   7
   8 |   7
   9 |   7
  10 |   7
(10 rows)


As you can see, random() is only called once.  If postgres were supposed to be 
running the subquery for each source row, conceptually, it would be a mistake 
to cache the results of a volatile function like random().

The docs say: "When a FROM item contains LATERAL cross-references, evaluation 
proceeds as follows: for each row of the FROM item providing the 
cross-referenced column(s), or set of rows of multiple FROM items providing the 
columns, the LATERAL item is evaluated using that row or row set's values of 
the columns. The resulting row(s) are joined as usual with the rows they were 
computed from. This is repeated for each row or set of rows from the column 
source table(s)."

They don't say what happens with LATERAL when there aren't cross-references 
though.  As we expect, adding one does show random() being called once for each 
source row.

test=# SELECT
test-#   foo.bar,
test-#   lat.bar
test-# FROM
test-#   foo JOIN LATERAL (
test(#     SELECT
test(#       bar
test(#     FROM
test(#       foo AS foo2
test(#     WHERE
test(#       foo2.bar < foo.bar + 100000
test(#     ORDER BY
test(#       random()
test(#     LIMIT 1
test(#   ) AS lat ON true;
 bar | bar
-----+-----
   1 |   5
   2 |   8
   3 |   3
   4 |   4
   5 |   5
   6 |   5
   7 |   1
   8 |   3
   9 |   7
  10 |   3
(10 rows)

It seems like to keep the same behavior that exists today, results of LATERAL 
subqueries would need to be the same if they aren't correlated, and so you 
couldn't run them in parallel with a limit if the order wasn't guaranteed.  But 
I'll be the first to admit that it's easy enough for me to miss a key piece of 
logic on something like this, so I could be way off base too.

Re: Consider parallel for lateral subqueries with limit

Reply via email to