sergiimk commented on issue #13323:
URL: https://github.com/apache/datafusion/issues/13323#issuecomment-2484564677

   Did some digging and found this old PR #9041 that seems to have removed 
`single_file_output` flag from `FileSinkConfig` - worth looking into it to 
understand the reasoning, not to undo the changes.
   
   Looking at v42 code it does indeed seem that 
`DataFrameWriteOptions::single_file_output` is not read anywhere and the logic 
relies on just this condition in demux:
   ```rust
   let single_file_output = !base_output_path.is_collection();
   ```
   which in v43 became:
   ```rust
   let single_file_output = !base_output_path.is_collection() && 
base_output_path.file_extension().is_some();
   ```
   
   The `DataFrameWriteOptions::new().with_single_file_output(true)` is used in 
a bunch of tests though, so it's just a lucky coincidence that all tests give 
file a proper `test.parquet` name and not just `test`.
   
   Personally I think that all kinds of extension-based heuristics don't belong 
in such low level code like `start_demuxer_task` and perhaps better left at the 
`DataFrame` level.
   
   Whichever heuristic version (pre v36, pre v43, or post v43) is the right one 
- I don't really mind, but I think there should be a way to skip it and specify 
explicitly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to