eejbyfeldt opened a new issue, #13060:
URL: https://github.com/apache/datafusion/issues/13060
### Describe the bug
Currently we do not consider the volatility of expressions in
SimplifyExpressions. This leads us to doing rewrites that might change the
results and lead to unexpected behavior.
### To Reproduce
Consider the following query:
```
> explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2
AND random() = 0);
+---------------+---------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------+
| logical_plan | Filter: random() = Float64(0) |
| | Values: (Int64(1)), (Int64(2)) |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: random() = 0 |
| | ValuesExec |
| | |
+---------------+---------------------------------------------+
2 row(s) fetched.
Elapsed 0.013 seconds.
```
The predicate get simplified into `random() = 0`
### Expected behavior
The predicate should not be simplified so we deduplicat the volatile
expressions.
```
> explain select * from VALUES (1), (2) where random() = 0 OR (column1 = 2
AND random() = 0);
+---------------+----------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+----------------------------------------------------------------------------------+
| logical_plan | Filter: random() = Float64(0) OR column1 = Int64(2) AND
random() = Float64(0) |
| | Values: (Int64(1)), (Int64(2))
|
| physical_plan | CoalesceBatchesExec: target_batch_size=8192
|
| | FilterExec: random() = 0
|
| | ValuesExec
|
| |
|
+---------------+----------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.013 seconds.
random() = CAST(Int64(0) AS Float64) OR column1 = Int64(2) AND random() =
CAST(Int64(0) AS Float64)
```
### Additional context
We can not exclude volatile expressions outright from simplification as we
would still like the simplify for example following predicate
```
> explain select * from VALUES (1), (2) where column1 = 2 OR (column1 = 2
AND random() = 0);
+---------------+---------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------+
| logical_plan | Filter: column1 = Int64(2) |
| | Values: (Int64(1)), (Int64(2)) |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: column1@0 = 2 |
| | ValuesExec |
| | |
+---------------+---------------------------------------------+
2 row(s) fetched.
Elapsed 0.015 seconds.
```
As it does not change the result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]