On 11/10/11 04:49, gschen wrote:
In hdfs only one thing we can do is that we could set replication factor to change replication strategy, but we can not change where the block is stored and what type of storage that we stored the data. Just think this case: In order to improve the downloading speed, I can choose my block replication near my location or near someone's location. I mean that users could have more option to decide their block replication strategy.
1. In "apache hadoop goes realtime at facebook", Dhruba and others discuss their use of alternate block placement policies.
2. Russ perry did some work on rasterization of PDF files in Hadoop where the final stage -collecting the output and streaming to the printer- was done on a machine next to the printer. He modified DFSClient to provide all the location data on all blocks, and had his app pick blocks off different machines to keep the net busy, avoid overloading any specific machine with disk IO requests, and to ensure peak bandwidth between the final destination machine
http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf