> On 18 Apr 2022, at 11:56, Pól Ua Laoínecháin <lineh...@tcd.ie> wrote:

(…)

> All  of the code below is available on the fiddle here:
> 
> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=0cc20c9081867131260e6e3550bd08ab

(…)

> OK, grand, now I wish to perform a RECURSIVE CTE on it. So, I start by
> trying something (I thought was) very simple. Obviously, I plan to do
> more, but I wanted to get the "mechanics" correct to start with. So,
> my query is:
> 
> WITH RECURSIVE cte1 (n, ln) AS
> (
>  SELECT 1 AS n, string
>  FROM line

Here is your first problem, this will yield a result for each row in your line 
table, numbering it ‘1’. You seem to have expected just a single result here, 
but that is something that you need to take care of in your query.
This part is called the base case, base step or initial step.

>  UNION ALL
>  SELECT n + 1, ln
>  FROM cte1
>  WHERE n < (SELECT COUNT(*) FROM line)

And then for each of those rows, it will add all those rows (from the same 
CTE!) again.
This part is called the recursive step.

You did add a termination condition here, which indeed manages to terminate, 
but it does so too late.

It seems that you do understand some of the concepts of recursive CTE’s, but 
you appear to be missing some crucial knowledge.

For example, it is actually possible to query multiple trees with a single 
recursive CTE. It is not limited to a single tree. How many trees the CTE will 
navigate depends on how you selected the rows in the base case.

> )
> SELECT * FROM cte1;
> 
> i.e. have a counter variable and a string from the line table

My first question is why you’re using a recursive CTE here? This doesn’t appear 
to be hierarchical data (such as a tree), unless perhaps you intended to 
actually traverse the HTML document hierarchy?

> 
> But, then to my horror, the result of this query is
> 
> 1with t(x) as (values( XMLPARSE(DOCUMENT
> ('<root><NotificationServiceDetails NotificationNo="0"
> AlarmCode="mail" AlarmStartTime="10:00:00" AlarmTime="0" Id ="2"
>> <NotificationServiceDetail
> Id="2"><Title><![CDATA[aaaaaaaaaaaaa]]></Title><ContentJson><![CDATA[
> 1 <html lang="en">
> 1 <head>
> 1 <meta charset="utf-8"/>
> 1 more stuff
> 1 more stuff
> 1 </table>
> 1 </body>
> 1 </html>
> ...
> ... snipped for brevity
> ...
> 
> 256 rows!        <<=== note 256!
> 
> 
> So, my simple recursive CTE is
> 
> a) not incrementing n and

It should be, but only after the first 16 rows from your base case.

> b) obviously doing some sort of CROSS JOIN (16^2 = 256).

Not quite, it’s selecting the same 16 rows 16 times, incrementing the numbering 
each time. Effectively it is indeed a Cartesian product. More on that below.

> I would be grateful if somebody could explain what's going on here. As
> shown in the fiddle, I can do the basic 1 - 10 RCTE and I've done way
> more complex ones before, so I'm a bit baffled as to what's going on
> here.

For a recursive CTE to work, you need two things, and that seems to be where 
you have misunderstood some things:

1. You need a base case that selects the _initial_ rows to start the recursion 
from.
2. You need a recursive step, where you usually join the results of your CTE to 
your table to figure out what the next result should be.

(It is very similar to how this works in procedural languages, where you first 
call a function that initiates the recursion and then that calls a second 
function (with more parameters usually) that calls itself again and again, with 
updated input parameters, until some termination condition is reached and the 
function returns a value.)

For the base case, you will want to make sure that you select the correct 
row(s) from your line table for the initial set. My guess is that it should be 
the row containing the string '<root>'.

For the recursive step, you will want to select the next child row from your 
line table that is (directly) related to the one(s) you selected in the base 
case, or is (directly) related to the previous iteration of the recursive case.
It is unclear to me what that should be in your case, I don’t understand the 
purpose of your CTE.

Your termination condition, WHERE n < (SELECT COUNT(*) FROM line), isn’t quite 
doing what you expect either I think.
Let’s say that c = (SELECT COUNT(*) FROM line), so c = 16 in your example.
- The first iteration of the CTE, the base case, will select 16 lines with n=1.
- The second iteration is the first time in the recursive step, and will select 
all lines from the base case where n < c, or 1 < 16, which is all of them.
- The next iteration will select all rows where 2 < 16, then 3 < 16, … until 
finally in the 16th iteration it stops with the test 16 < 16.

So, you managed to make the recursion terminate at least, but probably not how 
you intended it to work.

Something along the lines of:
WITH RECURSIVE cte (n, ln) AS (
        SELECT 1, string
        FROM line
        WHERE string LIKE '%<root>%'
        UNION ALL
        SELECT n+1, string
        FROM cte
        JOIN line ON iHaveNoClueHere
)
SELECT * FROM cte;

> Any explanations, references to URLs or other advice gratefully received.

There are plenty of explanations about recursive CTE’s in SQL on the Internet. 
I would probably have a good look at how it’s described in the PG 
documentation. There are a couple of good articles on the topic by some PG 
bloggers too.

You’ll find plenty of articles related to how to do this in MSSQL or later 
Oracle versions. Those are totally fine for understanding the principles, and 
even the syntax is almost entirely the same.

For a PG specific explanation, this one looks pretty thorough: 
https://towardsdatascience.com/recursive-sql-queries-with-postgresql-87e2a453f1b

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Reply via email to