bhanreddy1973 opened a new pull request, #19574:
URL: https://github.com/apache/datafusion/pull/19574
## Rationale for this change
Spark's `explode_outer` function treats empty arrays `[]` the same as NULL
arrays - both produce an output row with NULL. Currently, DataFusion's `unnest`
with `preserve_nulls=true` only handles NULL arrays, and empty arrays produce
no output rows.
This makes it tricky to achieve Spark-compatible behavior when migrating
workloads.
## What changes are included in this PR?
Added a new `preserve_empty_as_null` flag to `UnnestOptions`:
- When `false` (default): empty arrays produce 0 rows (existing behavior,
backwards compatible)
- When `true`: empty arrays produce 1 row with NULL value (Spark's
explode_outer behavior)
Example with `preserve_nulls=true` and `preserve_empty_as_null=true`:
**Input:**
| Column 1 | Column 2 |
|-----------|----------|
| {1, 2} | A |
| null | B |
| {} | C |
| {3} | D |
**Output:**
| Column 1 | Column 2 |
|-----------|----------|
| 1 | A |
| 2 | A |
| null | B |
| null | C | ← empty {} now outputs null
| 3 | D |
## Files changed:
- `datafusion/common/src/unnest.rs` - added the new option and builder method
- `datafusion/physical-plan/src/unnest.rs` - updated `find_longest_length()`
to handle empty arrays
- `datafusion/proto/*` - updated proto definitions for serialization
## How are these changes tested?
Added new unit test `test_longest_list_length_preserve_empty_as_null` that
verifies:
- Empty arrays get length 1 when the flag is enabled
- NULL arrays still behave correctly based on `preserve_nulls` setting
- The two flags work independently
## Are these changes safe?
Yes - the default value is `false`, so existing behavior is unchanged. Users
have to explicitly opt-in to the new behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]