AdamGS opened a new issue, #14607: URL: https://github.com/apache/datafusion/issues/14607
### Is your feature request related to a problem or challenge? We’re implementing a file format [Vortex](https://github.com/spiraldb/vortex), which has no “row groups” or similar concept, meaning byte range might fall completely within one column, and aligning columns is a non trivial task. I would like to be able express repartitioning logic to only split files logically (by rows and not by bytes). The existing repartitioning logic in Datafusion (specifically `FileGroupPartitioner` and `FileScanConfig::repartitioned`) assume that files can be split logically by byte ranges (`FileRange`), and even the rustdoc on it seems very Parquet-specific (even though other formats do support it). This assumes some mapping/alignment between the physical layout and the logical one. ### Describe the solution you'd like Seems like the best way would be to configure `FileGroupPartitioner` through `FileSource`. The other option would be to make `FileRange` an enum, but that would still mean we (and any other format with a similar structure) will have to maintain our own repartitioning logic. ### Describe alternatives you've considered We can keep the current state, which is maintaining our own repartitioning logic and eventually just reusing FileRange to describe row splits. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org