On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe
wrote:
> I have written a WebHDFSClient and I do not believe that reusing
> connections is enough to noticeably speed up transfers in my case. I did
> some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
> file to an HDFS on my loc
I have written a WebHDFSClient and I do not believe that reusing
connections is enough to noticeably speed up transfers in my case. I did
some tests and on average it took roughly 14 minutes to transfer a 3.6 GB
file to an HDFS on my local network (I tried the same operation using cURL,
with simila
I believe that the WebHDFS API is your best bet for now. The current
implementation of WebHDFSClient does not reuse the HTTP connections, which
leads to a large part of the performance penalty.
You might want to implement your own version that reuses HTTP connection to
see whether it meets your pe
Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on
creating a portable version of libhdfs for C/C++ interaction with HDFS? I
know I can use the WebHDFS REST API, but the data transfer rates are
abysmally slow compared to the direct interaction via libhdfs.
Regards,
--
Kyle S