On Wed, Jun 10, 2009 at 4:55 AM, Sugandha Naolekar <[email protected]>wrote:
> If I want to make the data transfer fast, then what am I supposed > to do? I want to place the data in HDFS and replicate it in fraction of > seconds. I want to go to France, but it takes 10+ hours to get there from California on the fastest plane. How can I get there faster? > Can that be possible. and How? Placing a 5GB file will take atleast > half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and > then placing it in HDFS would take around 2-3 hours. So, how that time > delay > can be avoided..? > HDFS will only replicate as many times as you want it to. The write is also pipelined. This means that writing a 5G file that is replicated to 3 nodes is only marginally faster than the same file on 10 nodes, if for some reason you wanted to set your replication count to 10 (unnecessary for 99.99999% of use cases) > > Also, My simply aim is to transfer the data, i.e; dumping the data > into HDFS and gettign it back whenever needed. So, for this, transfer, how > speed can be achieved? HDFS isn't magic. You can only write as fast as your disk and network can. If your disk has 50MB/sec of throughput, you'll probably be limited at 50MB/sec. Expecting much more than this in real life scenarios is unrealistic. -Todd
