On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe
wrote:
> I have written a WebHDFSClient and I do not believe that reusing
> connections is enough to noticeably speed up transfers in my case. I did
> some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
> file to an HDFS on my loc
I have written a WebHDFSClient and I do not believe that reusing
connections is enough to noticeably speed up transfers in my case. I did
some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
file to an HDFS on my local network (I tried the same operation using cURL,
with simila
I believe that the WebHDFS API is your best bet for now. The current
implementation of WebHDFSClient does not reuse the HTTP connections, which
leads to a large part of the performance penalty.
You might want to implement your own version that reuses HTTP connection to
see whether it meets your pe