Re: libhdfs portability

2013-10-28 Thread Colin McCabe
On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe wrote: > I have written a WebHDFSClient and I do not believe that reusing > connections is enough to noticeably speed up transfers in my case. I did > some tests and on average it took roughly 14 minutes to transfer a 3.6 GB > file to an HDFS on my loc

Re: libhdfs portability

2013-10-28 Thread Kyle Sletmoe
I have written a WebHDFSClient and I do not believe that reusing connections is enough to noticeably speed up transfers in my case. I did some tests and on average it took roughly 14 minutes to transfer a 3.6 GB file to an HDFS on my local network (I tried the same operation using cURL, with simila

Re: libhdfs portability

2013-10-28 Thread Haohui Mai
I believe that the WebHDFS API is your best bet for now. The current implementation of WebHDFSClient does not reuse the HTTP connections, which leads to a large part of the performance penalty. You might want to implement your own version that reuses HTTP connection to see whether it meets your pe

libhdfs portability

2013-10-28 Thread Kyle Sletmoe
Now that Hadoop 2.2.0 is Windows compatible, is there going to be work on creating a portable version of libhdfs for C/C++ interaction with HDFS? I know I can use the WebHDFS REST API, but the data transfer rates are abysmally slow compared to the direct interaction via libhdfs. Regards, -- Kyle S