Hi. Here's the last few lines before it starts removing broadcasts:
16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task 'attempt_201607111123_0009_m_003209_20886' to file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003209 16/07/11 14:02:11 INFO SparkHadoopMapRedUtil: attempt_201607111123_0009_m_003209_20886: Committed 16/07/11 14:02:11 INFO TaskSetManager: Finished task 3211.0 in stage 9.0 (TID 20888) in 95 ms on localhost (3209/3214) 16/07/11 14:02:11 INFO Executor: Finished task 3209.0 in stage 9.0 (TID 20886). 1721 bytes result sent to driver 16/07/11 14:02:11 INFO TaskSetManager: Finished task 3209.0 in stage 9.0 (TID 20886) in 103 ms on localhost (3210/3214) 16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task 'attempt_201607111123_0009_m_003208_20885' to file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003208 16/07/11 14:02:11 INFO SparkHadoopMapRedUtil: attempt_201607111123_0009_m_003208_20885: Committed 16/07/11 14:02:11 INFO Executor: Finished task 3208.0 in stage 9.0 (TID 20885). 1721 bytes result sent to driver 16/07/11 14:02:11 INFO TaskSetManager: Finished task 3208.0 in stage 9.0 (TID 20885) in 109 ms on localhost (3211/3214) 16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task 'attempt_201607111123_0009_m_003212_20889' to file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003212 16/07/11 14:02:11 INFO SparkHadoopMapRedUtil: attempt_201607111123_0009_m_003212_20889: Committed 16/07/11 14:02:11 INFO Executor: Finished task 3212.0 in stage 9.0 (TID 20889). 1721 bytes result sent to driver 16/07/11 14:02:11 INFO TaskSetManager: Finished task 3212.0 in stage 9.0 (TID 20889) in 84 ms on localhost (3212/3214) 16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task 'attempt_201607111123_0009_m_003210_20887' to file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003210 16/07/11 14:02:11 INFO SparkHadoopMapRedUtil: attempt_201607111123_0009_m_003210_20887: Committed 16/07/11 14:02:11 INFO Executor: Finished task 3210.0 in stage 9.0 (TID 20887). 1721 bytes result sent to driver 16/07/11 14:02:11 INFO TaskSetManager: Finished task 3210.0 in stage 9.0 (TID 20887) in 100 ms on localhost (3213/3214) 16/07/11 14:02:11 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 16/07/11 14:02:11 INFO FileOutputCommitter: Saved output of task 'attempt_201607111123_0009_m_003213_20890' to file:/mnt/rendang/cache-main/RunWikistatsSFCounts727fc9d635f25d0922984e59a0d18fdd/stats/sf_counts/_temporary/0/task_201607111123_0009_m_003213 16/07/11 14:02:11 INFO SparkHadoopMapRedUtil: attempt_201607111123_0009_m_003213_20890: Committed 16/07/11 14:02:11 INFO Executor: Finished task 3213.0 in stage 9.0 (TID 20890). 1721 bytes result sent to driver 16/07/11 14:02:11 INFO TaskSetManager: Finished task 3213.0 in stage 9.0 (TID 20890) in 82 ms on localhost (3214/3214) 16/07/11 14:02:11 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool *16/07/11 14:02:11 INFO DAGScheduler: ResultStage 9 (saveAsTextFile at SfCountsDumper.scala:13) finished in 42.294 s* *16/07/11 14:02:11 INFO DAGScheduler: Job 1 finished: saveAsTextFile at SfCountsDumper.scala:13, took 9517.124624 s* 16/07/11 14:28:46 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 10.101.230.154:35192 in memory (size: 15.8 KB, free: 37.1 GB) 16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 7 16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 6 16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 5 16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 4 16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 3 16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 2 16/07/11 14:28:46 INFO ContextCleaner: Cleaned shuffle 1 16/07/11 14:28:46 INFO BlockManager: Removing RDD 14 16/07/11 14:28:46 INFO ContextCleaner: Cleaned RDD 14 16/07/11 14:28:46 INFO BlockManagerInfo: Removed broadcast_11_piece0 on 10.101.230.154:35192 in memory (size: 25.5 KB, free: 37.1 GB) ... In fact, the job is still running, Spark's UI shows uptime of 20.6 hours with last job finishing 18 hours ago at least. On Mon, 11 Jul 2016 at 23:23 dhruve ashar <dhruveas...@gmail.com> wrote: > Hi, > > Can you check the time when the job actually finished from the logs. The > logs provided are too short and do not reveal meaningful information. > > > > On Mon, Jul 11, 2016 at 9:50 AM, velvetbaldmime <keyn...@gmail.com> wrote: > >> Spark 2.0.0-preview >> >> We've got an app that uses a fairly big broadcast variable. We run this >> on a >> big EC2 instance, so deployment is in client-mode. Broadcasted variable >> is a >> massive Map[String, Array[String]]. >> >> At the end of saveAsTextFile, the output in the folder seems to be >> complete >> and correct (apart from .crc files still being there) BUT the spark-submit >> process is stuck on, seemingly, removing the broadcast variable. The stuck >> logs look like this: http://pastebin.com/wpTqvArY >> >> My last run lasted for 12 hours after after doing saveAsTextFile - just >> sitting there. I did a jstack on driver process, most threads are parked: >> http://pastebin.com/E29JKVT7 >> >> Full store: We used this code with Spark 1.5.0 and it worked, but then the >> data changed and something stopped fitting into Kryo's serialisation >> buffer. >> Increasing it didn't help, so I had to disable the KryoSerialiser. Tested >> it >> again - it hanged. Switched to 2.0.0-preview - seems like the same issue. >> >> I'm not quite sure what's even going on given that there's almost no CPU >> activity and no output in the logs, yet the output is not finalised like >> it >> used to before. >> >> Would appreciate any help, thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-hangs-at-Removed-broadcast-tp27320.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > -Dhruve Ashar > >