Lordworms commented on PR #11897: URL: https://github.com/apache/datafusion/pull/11897#issuecomment-2282792372
> Thank you for your work on this PR @Lordworms but I am now pretty confused about these changes 🤔 > > In my understanding, the bug involves he `HAVING` clause -- the `HAVING` clause is like the `WHERE` clause except that it happens _after_ grouping where the `WHERE` clause happens before grouping. > > This [page](https://www.shiksha.com/online-courses/articles/order-of-execution-in-sql/#:~:text=Since%20a%20lot%20of%20people,and%20finally%2C%20LIMIT%2FOFFSET.) has a nice table > > <img alt="Screenshot 2024-08-11 at 7 17 33 AM" width="895" src="https://private-user-images.githubusercontent.com/490673/356869670-2a9cff40-cc70-44b3-854d-973bdfff865c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMzODg5ODksIm5iZiI6MTcyMzM4ODY4OSwicGF0aCI6Ii80OTA2NzMvMzU2ODY5NjcwLTJhOWNmZjQwLWNjNzAtNDRiMy04NTRkLTk3M2JkZmZmODY1Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODExJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgxMVQxNTA0NDlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01NWE5OTliMzRhYzFkZDViMzZkM2Q4NTZjOGVlMjI1YjkxM2E3NTgyM2UzZjU5MjRmZDhmMGU2MTljZjQ2MWJmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.2kJp1YaA_i6xt10Cq7YZkwQCxUOoSnaJpLuRcSEWCg0"> > So for a query like > > ```sql > SELECT AVG(v1) FROM t1 GROUP BY false having false; > ``` > > I would expect a query plan that looks something like > > ``` > Filter(expr=false) <-- this is the HAVING(false) > GroupBy(agg=AVG(v1), gby=false) <-- this is the GROUP BY expr > TableScan(t1) > ``` > Yes, I understand, but the gby expr is optimized into an empty set in Optimizer, in order to pass the gby information to ExecutionPlan, I think I can only add an extra boolean member? > When I explained the query we can see that in fact this filter is added > > ``` > > create table t1(v1 int) as values (1), (2); > 0 row(s) fetched. > Elapsed 0.016 seconds. > > > explain verbose SELECT AVG(v1) FROM t1 GROUP BY false having false; > ... > | initial_logical_plan | Projection: avg(t1.v1) | > | | Filter: Boolean(false) | > | | Aggregate: groupBy=[[Boolean(false)]], aggr=[[avg(t1.v1)]] | > | | TableScan: t1 > ``` > > However, then it looks like the filter is pushed down below the group by > > ``` > | logical_plan after push_down_filter | Projection: avg(t1.v1) | > | | Aggregate: groupBy=[[Boolean(false)]], aggr=[[avg(CAST(t1.v1 AS Float64))]] | > | | Filter: Boolean(false) | > | | TableScan: t1 > ``` > > Which finally results in > > ``` > | logical_plan after eliminate_filter | Aggregate: groupBy=[[]], aggr=[[avg(CAST(t1.v1 AS Float64))]] | > | | EmptyRelation > ``` > > So it seems a solution might be to refine the conditions under which filters can be pushed below grouping (perhaps we shouldn't push filters below grouping when there are no column references in the filter 🤔 ) > dismissed their [stale review](#pullrequestreview-2230373615) Also in duckDB, those query do returns 0 rows.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
