pepijnve commented on issue #17801:
URL: https://github.com/apache/datafusion/issues/17801#issuecomment-3342774125
There seem to be two things going on here.
First, the expression simplification introduced in the commit we're looking
at changes nullability in the schema. For the coalesce we can easily derive
that the results is not nullable if at least one of the expressions is not
nullable. For the `case` rewrite result this is feasible in theory, but the
necessary analysis to conclude that `CASE WHEN x IS NOT NULL THEN x ELSE y END`
is not nullable if `y` is not nullable does not seem to be implemented yet.
Secondly, at the logical level something seems to be going wrong at the
schema layer. After parsing the SQL, the relevant portion of the logical plan is
```
Union [..., sales_cnt:Int64, ...]
Projection: ..., CAST(catalog_sales.cs_quantity AS
Int64) - CASE WHEN __common_expr_7 IS NOT NULL THEN __common_expr_7 ELSE
Int64(0) END AS sales_cnt, ..., sales_cnt:Int64;N, ...]
```
Note that `sales_cnt` is marked nullable in the projection, but not nullable
in the union. My suspicion is that after the expression simplification the
schema is not being updated correctly.
After a second optimisation pass we get
```
Union [..., sales_cnt:Int64;N, ...]
Projection: ..., CAST(catalog_sales.cs_quantity AS
Int64) - CASE WHEN __common_expr_7 IS NOT NULL THEN __common_expr_7 ELSE
Int64(0) END AS sales_cnt, ..., sales_cnt:Int64;N, ...]
```
In other words, the schema error seems to be getting resolved as a side
effect of doing another rewrite pass over the tree.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]