hailin0 opened a new issue, #8837: URL: https://github.com/apache/seatunnel/issues/8837
### Search before asking - [x] I had searched in the [feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22) and found no similar feature requirement. ### Description Currently we have supported the file connector and scan files in the directory, with each file as a split. However, for very large files, it will cause slow reading and require further sharding. We can consider sharding a single file again. example: ``` file_1.csv 100gb file_2.csv 100mb file_3.csv 100kb ``` splits result: ``` split_1<file_1.csv, startPos=0, endPos=104857600> split_2<file_1.csv, startPos=104857600, endPos=209715200> .... split_x<file_2.csv, startPos=0, endPos=104857600> split_y<file_3.csv, startPos=0, endPos=102400> ``` You need to consider that the data rows read by each split are complete. The above is only for reference and does not have to be followed completely. Connectors list: https://github.com/apache/seatunnel/tree/dev/seatunnel-connectors-v2/connector-file Updates: - update file connectors - update docs - add testcase ### Usage Scenario _No response_ ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@seatunnel.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org