I was wondering if anyone could provide an explanation for the behavior I'm
seeing.

I have an RDD, call it foo, not too complex, with a maybe 8 level deep DAG
with 2 shuffles, not empty, not even terribly big - small enough that some
partitions could be empty.

When I run foo.first, I get workers disconnecting, and applications die
When I run foo.mapPartitions.saveAsHadoopDataset, it works fine.

Anyone got an explanation for why that might be?

                    -Thanks, Nathan


-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenf...@oculusinfo.com

Reply via email to