sergiimk commented on issue #13323: URL: https://github.com/apache/datafusion/issues/13323#issuecomment-2484564677
Did some digging and found this old PR #9041 that seems to have removed `single_file_output` flag from `FileSinkConfig` - worth looking into it to understand the reasoning, not to undo the changes. Looking at v42 code it does indeed seem that `DataFrameWriteOptions::single_file_output` is not read anywhere and the logic relies on just this condition in demux: ```rust let single_file_output = !base_output_path.is_collection(); ``` which in v43 became: ```rust let single_file_output = !base_output_path.is_collection() && base_output_path.file_extension().is_some(); ``` The `DataFrameWriteOptions::new().with_single_file_output(true)` is used in a bunch of tests though, so it's just a lucky coincidence that all tests give file a proper `test.parquet` name and not just `test`. Personally I think that all kinds of extension-based heuristics don't belong in such low level code like `start_demuxer_task` and perhaps better left at the `DataFrame` level. Whichever heuristic version (pre v36, pre v43, or post v43) is the right one - I don't really mind, but I think there should be a way to skip it and specify explicitly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
