sergiimk opened a new issue, #22935: URL: https://github.com/apache/datafusion/issues/22935
### Describe the bug Encountered a regression between `53.1.0` and `54.0.0` where [this line](https://github.com/apache/datafusion/blob/45d943dfb8699dc9cb9ef2320e955b73e3e6c03b/datafusion/physical-expr/src/projection.rs#L664) started to panic with `index out of bounds` error. ### To Reproduce It panics in a test where we do a complex change-data-capture query. I'll try to find a minimal repro case, but here's the full optimized plan for now: ``` CopyTo: format=parquet output_url=/tmp/.tmpaPY3Jq/data.parquet options: (single_file_output true) Sort: offset ASC NULLS FIRST Projection: CAST(CAST(row_number() PARTITION BY [Int32(1)] ORDER BY [city ASC NULLS FIRST, op ASC NULLS FIRST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS Decimal128(20, 0)) + Decimal128(Some(2),20,0) AS Int64) AS offset, op, system_time, event_time, city, population, census_url WindowAggr: windowExpr=[[row_number() PARTITION BY [Int32(1)] ORDER BY [city ASC NULLS FIRST, op ASC NULLS FIRST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]] Projection: op, assert_not_null(city) AS city, assert_not_null(population) AS population, census_url, CASE WHEN event_time IS NOT NULL THEN event_time ELSE TimestampMillisecond(946728000000, Some("UTC")) END AS event_time, TimestampMillisecond(1262347200000, Some("UTC")) AS system_time Union Projection: CASE WHEN old.city IS NULL THEN Int32(0) WHEN __common_expr_1 THEN Int32(1) ELSE Int32(3) END AS op, CASE WHEN __common_expr_1 THEN old.city ELSE new.city END AS city, CASE WHEN __common_expr_1 THEN old.population ELSE new.population END AS population, CASE WHEN __common_expr_1 THEN old.census_url ELSE new.census_url END AS census_url, CASE WHEN __common_expr_1 THEN old.event_time ELSE new.event_time END AS event_time Projection: new.city IS NULL AS __common_expr_1, old.event_time, old.city, old.population, old.census_url, new.city, new.population, new.census_url, new.event_time Filter: old.population IS DISTINCT FROM new.population OR old.census_url IS DISTINCT FROM new.census_url OR new.event_time IS NOT NULL AND old.event_time IS DISTINCT FROM new.event_time Full Join: old.city = new.city SubqueryAlias: old Projection: ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url Filter: __rank = UInt64(1) AND ?table?.op != Int32(1) Projection: ?table?.op, ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url, __rank WindowAggr: windowExpr=[[row_number() PARTITION BY [?table?.city] ORDER BY [?table?.offset DESC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS __rank]] TableScan: ?table? projection=[offset, op, event_time, city, population, census_url] SubqueryAlias: new Projection: ?table?.city, ?table?.population, ?table?.census_url, TimestampMillisecond(NULL, Some("UTC")) AS event_time TableScan: ?table? projection=[city, population, census_url] Projection: Int32(2) AS op, old.city AS city, old.population AS population, old.census_url AS census_url, old.event_time AS event_time Inner Join: old.city = new.city Filter: old.population IS DISTINCT FROM new.population OR old.census_url IS DISTINCT FROM new.census_url OR new.event_time IS NOT NULL AND old.event_time IS DISTINCT FROM new.event_time SubqueryAlias: old Projection: ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url Filter: __rank = UInt64(1) AND ?table?.op != Int32(1) Projection: ?table?.op, ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url, __rank WindowAggr: windowExpr=[[row_number() PARTITION BY [?table?.city] ORDER BY [?table?.offset DESC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS __rank]] TableScan: ?table? projection=[offset, op, event_time, city, population, census_url], partial_filters=[Boolean(true), Boolean(true)] SubqueryAlias: new Projection: ?table?.city, ?table?.population, ?table?.census_url, TimestampMillisecond(NULL, Some("UTC")) AS event_time Filter: Boolean(true) TableScan: ?table? projection=[city, population, census_url], partial_filters=[Boolean(true)] ``` ### Expected behavior No panics ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
