On 11/10/11 04:49, gschen wrote:

In hdfs only one thing we can do is that we could
set replication factor to change replication strategy, but we can not
change where the block is stored and what type of storage that we stored
the data. Just think this case: In order to improve the downloading
speed, I can choose my block replication near my location or near
someone's location. I mean that users could have more option to decide
their block replication strategy.

1. In "apache hadoop goes realtime at facebook", Dhruba and others discuss their use of alternate block placement policies.

2. Russ perry did some work on rasterization of PDF files in Hadoop where the final stage -collecting the output and streaming to the printer- was done on a machine next to the printer. He modified DFSClient to provide all the location data on all blocks, and had his app pick blocks off different machines to keep the net busy, avoid overloading any specific machine with disk IO requests, and to ensure peak bandwidth between the final destination machine

http://www.hpl.hp.com/techreports/2009/HPL-2009-345.pdf

Reply via email to