I was wondering if anyone could provide an explanation for the behavior I'm seeing.
I have an RDD, call it foo, not too complex, with a maybe 8 level deep DAG with 2 shuffles, not empty, not even terribly big - small enough that some partitions could be empty. When I run foo.first, I get workers disconnecting, and applications die When I run foo.mapPartitions.saveAsHadoopDataset, it works fine. Anyone got an explanation for why that might be? -Thanks, Nathan -- Nathan Kronenfeld Senior Visualization Developer Oculus Info Inc 2 Berkeley Street, Suite 600, Toronto, Ontario M5A 4J5 Phone: +1-416-203-3003 x 238 Email: nkronenf...@oculusinfo.com