gt; 1. The OS buffer cache. Which will keep recently read disk blocks in
>> memory.
>> 2. The Java just-in-time compiler (JIT) which will use runtime profiling
>> to significantly speed up execution speed.
>>
>> These can make a huge difference if you are running the same job
&g
We have similar needs but IIRC, I came to the conclusion that this would
only work on ordered RDDs, and then you would still have to figure out
which partition is the first one. I ended up deciding it would be best to
just drop the header lines from a Scala iterator before creating an RDD
based on