Hey all,Had a problem I wanted to ask advice on. The Caltech site I work with currently have a few GridFTP servers which are on the same physical machines as the Hadoop datanodes, and a few that aren't. The GridFTP server has a libhdfs backend which writes incoming network data into HDFS.
They've found that the GridFTP servers which are co-located with HDFS datanode have poor performance because data is incoming at a much faster rate than the HDD can handle. The standalone GridFTP servers, however, push data out to multiple nodes at one, and can handle the incoming data just fine (>200MB/s).
Is there any way to turn off the preference for the local node? Can anyone think of a good workaround to trick HDFS into thinking the client isn't on the same node?
Brian
smime.p7s
Description: S/MIME cryptographic signature
