I don't know of a way to do this, out of the box, without maybe digging into custom InputFormats. The RDD from textFile doesn't have an ordering. I can't imagine a world in which partitions weren't iterated in line order, of course, but there's also no real guarantee about ordering among partitions.
On Tue, Sep 22, 2015 at 3:50 PM, Philip Weaver <[email protected]> wrote: > I overcomplicated the question by asking about removing duplicates. > Fundamentally I think my question is, how does one sort lines in a file by > line number. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
