Hi guys and gals,
I originally posted a version of this question on the user list on a few
days ago to no response, so I thought perhaps it delved a bit too far
into the nitty-gritty to warrant one. My apologies for cross-listing.
Can someone please briefly summarize the difference between these two
parameters? I do not see deprecated warnings for fs.local.block.size
when I run with it set. Furthermore, and I'm unsure if this is related,
I see two copies of what is effectively RawLocalFileSystem.java (the
other is local/RawLocalFs.java). It appears that the one in local/ is
for the old abstract FileSystem class, whereas RawLocalFileSystem.java
uses the new abstract class. Perhaps this is the root cause of the two
parameters? Or does file.blocksize simply control the abstract class or
some such thing?
The practical answers I really need to get a handle on are the following:
1. Is the default for the file:// filesystem boosted to a 64MB blocksize
in Hadoop 2.0? It was only 32MB in Hadoop 1.0, but it's not 100% clear
to me that it is now a full 64MB. The core-site.xml docs online suggest
it's been boosted.
2. If I alter the blocksize of file://, is it correct to presume that
also will impact the shuffle block-size since that data goes locally?
Thanks!
ellis