Hello all,
Right now HDFS is still using simple replication to increase data
reliability. Even though it works, it just wastes the disk space,
network and disk bandwidth. For data-intensive applications (that
needs to write large result to the HDFS), it just limits the
throughput of MapReduce. Als
tiple source blocks to be ready, so the writer will need to
> buffer the original data, either in memory or on disk. If it is saved on
> disk because of memory pressure, will this be similar to writing the file
> with replication 2?
>
> Ram
>
>
> On Thu, Oct 13, 2011 at 1:16 AM,
e processing.
>
> --Bobby Evans
>
> On 10/31/11 2:50 PM, "Zheng Da" wrote:
>
> Hello Ram,
>
> Sorry, I didn't notice your reply.
>
> I don't really have a complete design in my mind. I wonder if the
> community is interested in using an altern