piyushgarg5021 commented on issue #15136:
URL: https://github.com/apache/datafusion/issues/15136#issuecomment-2712326834
Title:
🚨 Schema Mismatch in DataFusion JOIN Optimization
Description:
I'm trying to perform a simple JOIN in DataFusion but hit a schema mismatch
error during physical optimization.
Steps to Reproduce:
Create two RecordBatch objects with the schemas:
sources table: id: Utf8, created: Utf8, title: Utf8, uri: Utf8
media table: id: Int64, created: Utf8, title: Utf8, source_id: Utf8, ...
Register the tables using:
rust file
ctx.register_batch("sources", sources_batch)?;
ctx.register_batch("media", media_batch)?;
Run the JOIN query:
sql
SELECT sources.id, media.title
FROM sources
JOIN media ON sources.id = media.source_id;
Error:
nginx
DataFusion error: Internal error: PhysicalOptimizer rule 'join_selection'
failed. Schema mismatch.
Expected Behavior:
DataFusion should correctly optimize the JOIN since sources.id and
media.source_id are both Utf8.
Potential Fixes & Workarounds:
Explicitly cast sources.id:
sql
SELECT sources.id, media.title
FROM sources
JOIN media ON CAST(sources.id AS TEXT) = media.source_id;
Check schema registration using:
rust
println!("{:?}", ctx.table("sources").await?.schema());
println!("{:?}", ctx.table("media").await?.schema());
Disable the join_selection optimization rule (if possible).
Environment Details:
DataFusion Version: 32.0.0
Rust Version: 1.76.0
OS: Ubuntu 22.04 LTS
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]