Hey Guys, I am interested in increasing the throughput of an HDFS read while transferring data between datacenters that are geographically far apart and hence have a network latency of around 60ms. I see in the HDFS code that the DFSClient and DataNode seem to hardcode their socket buffer sizes to 128KB (DFSClient.createBlockOutputStream and DataNode.startDataNode). Is there a reason for this?
I want to expose this value as a configurable property so that when i read over the high-latency link I can set the ideal buffer size for this particular application (around 800KB for our desired bandwidth). Is there a reason this is not done currently? Would you take a patch that added this property? Am I looking at totally the wrong code? -Jay