> On 27 Apr 2016, at 04:59, Takeshi Yamamuro <linguin....@gmail.com> wrote: > > Hi, all > > See SPARK-1529 for related discussion. > > // maropu
I'd not seen that discussion. I'm actually curious about why the 15% diff in performance between Java NIO and Hadoop FS APIs, and, if it is the case (Hadoop still uses the pre-NIO libraries, *has anyone thought of just fixing Hadoop Local FS codepath?* It's not like anyone hasn't filed JIRAs on that ... it's just that nothing has ever got to a state where it was considered ready to adopt, where "ready" means: passes all unit and load tests against Linux, Unix, Windows filesystems. There's been some attempts, but they never quite got much engagement or support, especially as nio wasn't there properly until Java 7, —and Hadoop was stuck on java 6 support until 2015. That's no longer a constraint: someone could do the work, using the existing JIRAs as starting points. If someone did do this in RawLocalFS, it'd be nice if the patch also allowed you to turn off CRC creation and checking. That's not only part of the overhead, it means that flush() doesn't, not until you reach the end of a CRC32 block ... so breaking what few durability guarantees POSIX offers.