AHeise commented on pull request #13885:
URL: https://github.com/apache/flink/pull/13885#issuecomment-736035288


   
   > Based on the above design, there are some questions that I hope to be 
answered:
   > 
   >     * `buffer size` is passed to HadoopFileSystem through Constructor. It 
means that HadoopFileSystem needs to add a new Constructor: 
`HadoopFileSystem(FileSystem, bufferSize)`. HadoopFsFactory will call the new 
constructor.
   > 
   >     * The old constructor `HadoopFileSystem(FileSystem)` is still called 
by other FsFactory. So other FsFactory will not be able to improve performance. 
For example: OSSFileSystemFactory.
   > 
   > 
   > Question: Do other FsFactory related to HadoopFileSystem need to improve 
performance? If needed, are there other better designs. (Can there be a way not 
to modify many FsFactory?)
   > 
   > Thanks.
   
   Similar to your initial design, I would make `bufferSize = 0` => no 
buffering. Then all other filesystem factories could use the changed 
constructor with `HadoopFileSystem(FileSystem, NO_BUFFER)` (`NO_BUFFER = 0`).
   
   Since this PR is very specific to Hadoop, I'd probably set the buffer only 
through `HadoopFsFactory`. Your finding would not necessarily also transfer to 
other implementations of the `hadoop.FileSystem` (i.e., they don't have 
statistics). We could evaluate that in future PRs.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to