Hi, Can you check the time when the job actually finished from the logs. The logs provided are too short and do not reveal meaningful information.
On Mon, Jul 11, 2016 at 9:50 AM, velvetbaldmime <keyn...@gmail.com> wrote: > Spark 2.0.0-preview > > We've got an app that uses a fairly big broadcast variable. We run this on > a > big EC2 instance, so deployment is in client-mode. Broadcasted variable is > a > massive Map[String, Array[String]]. > > At the end of saveAsTextFile, the output in the folder seems to be complete > and correct (apart from .crc files still being there) BUT the spark-submit > process is stuck on, seemingly, removing the broadcast variable. The stuck > logs look like this: http://pastebin.com/wpTqvArY > > My last run lasted for 12 hours after after doing saveAsTextFile - just > sitting there. I did a jstack on driver process, most threads are parked: > http://pastebin.com/E29JKVT7 > > Full store: We used this code with Spark 1.5.0 and it worked, but then the > data changed and something stopped fitting into Kryo's serialisation > buffer. > Increasing it didn't help, so I had to disable the KryoSerialiser. Tested > it > again - it hanged. Switched to 2.0.0-preview - seems like the same issue. > > I'm not quite sure what's even going on given that there's almost no CPU > activity and no output in the logs, yet the output is not finalised like it > used to before. > > Would appreciate any help, thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-hangs-at-Removed-broadcast-tp27320.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- -Dhruve Ashar