AHeise commented on pull request #13885: URL: https://github.com/apache/flink/pull/13885#issuecomment-736035288
> Based on the above design, there are some questions that I hope to be answered: > > * `buffer size` is passed to HadoopFileSystem through Constructor. It means that HadoopFileSystem needs to add a new Constructor: `HadoopFileSystem(FileSystem, bufferSize)`. HadoopFsFactory will call the new constructor. > > * The old constructor `HadoopFileSystem(FileSystem)` is still called by other FsFactory. So other FsFactory will not be able to improve performance. For example: OSSFileSystemFactory. > > > Question: Do other FsFactory related to HadoopFileSystem need to improve performance? If needed, are there other better designs. (Can there be a way not to modify many FsFactory?) > > Thanks. Similar to your initial design, I would make `bufferSize = 0` => no buffering. Then all other filesystem factories could use the changed constructor with `HadoopFileSystem(FileSystem, NO_BUFFER)` (`NO_BUFFER = 0`). Since this PR is very specific to Hadoop, I'd probably set the buffer only through `HadoopFsFactory`. Your finding would not necessarily also transfer to other implementations of the `hadoop.FileSystem` (i.e., they don't have statistics). We could evaluate that in future PRs. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org