kosiew commented on issue #1131: URL: https://github.com/apache/datafusion-python/issues/1131#issuecomment-2903100506
hi @l1t1 , duckdb too goes into an infinite loop: ``` ❯ duckdb v1.1.3 19864453f7 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. D WITH RECURSIVE t(i, lv) AS ( SELECT generate_series AS i, 1 AS lv FROM generate_series(1, 6, 1) UNION ALL SELECT MAX(i), MAX(lv) + 1 FROM t WHERE lv < 2 ) SELECT * FROM t WHERE lv = 2; ``` I think this isn’t a bug in DataFusion-Python so much as a quirk of SQL’s recursive-CTE semantics when you use UNION ALL with an aggregate that never changes its output. Any engine that follows the SQL standard will do the same. DuckDB likewise spins forever because the recursive member keeps re-emitting the same row, and UNION ALL does not remove duplicates, so the CTE never reaches a fixpoint. Amending the query to UNION does complete in duckdb: ``` ❯ duckdb v1.1.3 19864453f7 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. D WITH RECURSIVE t(i, lv) AS ( SELECT generate_series AS i, 1 AS lv FROM generate_series(1, 6, 1) UNION SELECT MAX(i), MAX(lv) + 1 FROM t WHERE lv < 2 ) SELECT * FROM t WHERE lv = 2; ┌───────┬───────┐ │ i │ lv │ │ int64 │ int32 │ ├───────┼───────┤ │ 6 │ 2 │ └───────┴───────┘ ``` But the same query is not yet supported in Datafusion: ``` >>> sql = """ ... with recursive t as(select value i,1 lv from generate_series(1,6,1) ... union ... select max(i),max(lv)+1 from t where lv<2) ... select * from t where lv=2; ... """ >>> df = ctx.sql(sql) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/kosiew/GitHub/datafusion-python/python/datafusion/context.py", line 589, in sql return DataFrame(self.ctx.sql(query)) ^^^^^^^^^^^^^^^^^^^ Exception: DataFusion error: NotImplemented("Recursive queries with a distinct 'UNION' (in which the previous iteration's results will be de-duplicated) is not supported") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org