I have the same exact error. Am running a pyspark job in yarn-client mode. Works well in standalone but I need to run it in yarn-client mode.
Other people reported the same problem when bundling jars and extra dependencies. I'm pointing the pyspark to use a specific python executable bundled with external dependencies. However since the job runs on standalone, I see no reason why it should give me this error whilst saving to s3 on yarn-client. Thanks. Any help or direction would be appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PairRDD-serialization-exception-tp21999p22019.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org