I don't know of a way to do this, out of the box, without maybe
digging into custom InputFormats. The RDD from textFile doesn't have
an ordering. I can't imagine a world in which partitions weren't
iterated in line order, of course, but there's also no real guarantee
about ordering among partitions.

On Tue, Sep 22, 2015 at 3:50 PM, Philip Weaver <[email protected]> wrote:
> I overcomplicated the question by asking about removing duplicates.
> Fundamentally I think my question is, how does one sort lines in a file by
> line number.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to