ively. However, what accounts for the extra 17GB and 25-30GB in the
2 and 3 replica configs? And what accounts for the minimal network usage in
the 1 replica configuration?
Note that the data is generated with TeraGen using the same replication
factor with which it is later sorted.
Thank you,
Daemeon - Indeed, I neglected to mention that I am clearing the caches
throughout my cluster before running the read benchmark. My expectation
was to ideally get results that were proportionate to disk I/O, given
that replicated writes perform twice the disk I/O relative to reads. I've
verified the
I am benchmarking my cluster of 16 nodes (all in one rack) with TestDFSIO on
Hadoop 1.0.4. For simplicity, I turned off speculative task execution and set
the max map and reduce tasks to 1.
With a replication factor of 2, writing 1 file of 5GB takes twice as long as
reading 1 file. This result se
Hi Azurry, I'd also like to be able to manually move blocks.
One piece that is missing in your current approach is updating any
block mappings that the cluster relies on.
The namenode has a mapping of blocks to datanodes, and the datanode
has, as the comments say, a "block -> stream of bytes" mapp
auses the
Replication Monitor to suddenly run and delete blocks at the end of the
benchmark). I am using Hadoop 1.0.4.
Thank you,
Eitan Rosenfeld
itions?
2. My understanding is that blocks are first written to tmp/ or
blocksBeingWritten/, and moved to current/ only after they are written. Can
someone direct me to the class and method responsible for moving the files?
I wasn't able to locate it.
Thank you in advance!
Eitan Rosenfeld