Re: [PATCH] Allow multiple recursive self-references

Denis Hirn Fri, 26 Mar 2021 02:22:36 -0700

Thanks for the feedback, Tom.

> Tom Lane <t...@sss.pgh.pa.us> writes:
> [...]
> As far as I can see, the spec flat-out forbids this.  In SQL:2021,
> it's discussed in 7.17 <query expression> syntax rule 3) j) ix), which
> defines [linear recursion]


(Aside: We don't have a copy of the SQL:2021 specification here (all
we've got here is the SQL:2016 variant).  We've browsed the ISO site
and didn't find find SQL:2021 there.  Is that a generally available
draft document?)

> So the problem with extending the spec here is that someday they might
> extend it with some other semantics, and then we're between a rock and
> a hard place.

There are two issues, say [LIN] and [NON-LIN], here.  Re [LIN]:

[LIN] PostgreSQL does not allow multiple references to the recursive
   common table, even if the recursion is LINEAR.  Plenty of examples
   of such queries are found in the documentation of established RDBMSs,
   among these IBM Db2 and MS SQL Server:

   - 
https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/sqlp/rbafyrecursivequeries.htm#d60691e2455
     (example "Two tables used for recursion using recursive common
     table expressions")

   - 
https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-ver15#h-using-multiple-anchor-and-recursive-members
     (example "Using multiple anchor and recursive members")

Db2 and SQL Server process such linear queries with multiple recursive
CTE references just fine.  As does SQLite3.  With the proposed patch,
PostgreSQL would be able to process these, too (and deliver the same
expected result).

> If there were really compelling reasons why (a) we have to have this
> feature and (b) there's only one reasonable way for it to act, hence
> no likelihood that the SQL committee will choose something else, then
> I'd be willing to think about getting out front of the committee.
> But you haven't made either case.
> [...]
> Do they all act the same?  Has anybody that sits on the SQL committee
> done it?  (If you could point to DB2, in particular, I'd have some
> confidence that the committee might standardize on their approach.)

We'd classify the ability of established RDBMSs to cope with the just
mentioned class of queries (and PostgreSQL's inability to do the same)
as one compelling reason.  Also, the existing functionality in Db2 and
SQL Server would be a yardstick for the expected behavior.

> A totally independent question is whether you've even defined a
> well-defined behavior.  There's an interesting comment in the spec:
>
>
>   NOTE 310 — The restrictions insure that each WLEi, viewed as a
>   transformation of the query names of the stratum, is monotonically
>   increasing. According to Tarski’s fixed point theorem, this insures that
>   there is a fixed point. The General Rules use Kleene’s fixed point
>   theorem to define a sequence that converges to the minimal fixed
>   point.
>
>
> I don't know any of the underlying math here, but if you've got a
> join of multiple recursion references then the number of output
> rows could certainly be nonlinear, which very possibly breaks the
> argument that there's a well-defined interpretation.

This brings us to [NON-LIN]:

[NON-LIN] NOTE 310 refers to Tarski's fixed point theorem and the
  prerequisite that the recursive CTE defines a MONOTONIC query (this
  guarantees the existence of a least fixed point which defines
  the meaning of a recursive CTE in the first place).  MONOTONICITY
  and NON-LINEAR recursion, however, are two separate/orthogonal issues.

A linear recursive query can be monotonic or non-monotonic (and PostgreSQL
currently has a system of safeguards that aim to identify the latter
kind of problematic queries, ruling out the use of EXCEPT, aggregation,
and so forth), just like a non-linear query can.  A mononotic non-linear
recursive query approaches a least fixed point which makes the
behavior of a non-linear recursive CTE as well-defined as a linear
CTE.

In fact, the document that led to the inclusion of WITH RECURSIVE into
the SQL standard (see reference [1] below) mentions that "Linear
recursion is a special case of non-linear recursion" (Section 3.9) in
the fixpoint-based semantics.  (It appears that the authors aimed to
introduce non-linear WITH RECURSIVE from the start but the existing
RECURSIVE UNION proposal at the time was restricted to linear recursion;
so they paddled back and suggested to include non-linear recursion in a
future SQL standard update, coined SQL4 back then).

We do agree, though, that the absence of non-linear recursion in the SQL
standard has openened the field for varied interpretations of the
semantics.  MariaDB, for example, admits non-linear recursion once you
set the configuration option standard_compliant_cte to 0.  However, a
recursive table reference then returns *all* rows collected so far (not
just those rows produced by the last iteration). This implicit change of
behavior makes sense for use cases of non-linear recursion, yet may come
as a surprise for query authors.  Since this implicit change could
alternatively be made explicit (and thus, arguably, clearer) in terms of
a UNION with the recursive table reference, we'd argue to keep the
semantics of recursive table references as is.  But before you know, you
end up with diverging semantics for a single SQL construct, just as you said
above.

Given this, we would propose the patch as a means to allow multiple
recursive references for linear queries (see [LIN] above).  This allows
PostgreSQL to catch up with Db2, SQL Server, or SQLite3.  The semantics
in the [LIN] case are clear.

Best wishes,
   --Denis and Torsten

[1] S.J. Finkelstein, N. Mattos, I.S. Mumick, H. Pirahesh,
    "Expressing Recursive Queries in SQL,"" ANSI Document X3H2-96-075r1, 1996,
    https://www.kirusa.com/mumick/pspapers/ansiRevisedRecProp96-075r1.ps.Z

Re: [PATCH] Allow multiple recursive self-references

Reply via email to