Re: performance improvement on second operation...without caching?

2014-05-05 Thread Ethan Jewett
gt; 1. The OS buffer cache. Which will keep recently read disk blocks in >> memory. >> 2. The Java just-in-time compiler (JIT) which will use runtime profiling >> to significantly speed up execution speed. >> >> These can make a huge difference if you are running the same job &g

Re: RDD.tail()

2014-04-14 Thread Ethan Jewett
We have similar needs but IIRC, I came to the conclusion that this would only work on ordered RDDs, and then you would still have to figure out which partition is the first one. I ended up deciding it would be best to just drop the header lines from a Scala iterator before creating an RDD based on