Hi,
   Most of our map jobs are IO bound. However, for the same node, the IO 
throughput during the map phase is only 20% of its real sequential IO 
capability (we tested the sequential IO throughput by iozone) 
   I think the reason is that while each map has a sequential IO request, since 
there are many maps concurrently running on the same node, this causes quite 
expensive IO switches.
   Prefetch may be a good solution here especially a map job is supposed to 
scan through an entire block and no more no less. Any idea how to enable it?

Thanks,
-Songting

Reply via email to