Ho Jochen, did you setup the EMR cluster with custom security groups? Can you confirm that the relevant EC2 instances can connect through relevant ports?
Best regards Jochen Hebbrecht <jochenhebbre...@gmail.com> schrieb am Fr. 4. Okt. 2019 um 17:09: > Hi Jeff, > > Thanks! Just tried that, but the same timeout occurs :-( ... > > Jochen > > Op vr 4 okt. 2019 om 16:37 schreef Jeff Zhang <zjf...@gmail.com>: > >> You can try to increase property spark.yarn.am.waitTime (by default it >> is 100s) >> Maybe you are doing some very time consuming operation when initializing >> SparkContext, which cause timeout. >> >> See this property here >> http://spark.apache.org/docs/latest/running-on-yarn.html >> >> >> Jochen Hebbrecht <jochenhebbre...@gmail.com> 于2019年10月4日周五 下午10:08写道: >> >>> Hi, >>> >>> I'm using Spark 2.4.2 on AWS EMR 5.24.0. I'm trying to send a Spark job >>> towards the cluster. Thhe job gets accepted, but the YARN application fails >>> with: >>> >>> >>> {code} >>> 19/09/27 14:33:35 ERROR ApplicationMaster: Uncaught exception: >>> java.util.concurrent.TimeoutException: Futures timed out after [100000 >>> milliseconds] >>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) >>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) >>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468) >>> at org.apache.spark.deploy.yarn.ApplicationMaster.org >>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) >>> 19/09/27 14:33:35 INFO ApplicationMaster: Final app status: FAILED, >>> exitCode: 13, (reason: Uncaught exception: >>> java.util.concurrent.TimeoutException: Futures timed out after [100000 >>> milliseconds] >>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223) >>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227) >>> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:468) >>> at org.apache.spark.deploy.yarn.ApplicationMaster.org >>> $apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:305) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:422) >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) >>> at >>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) >>> {code} >>> >>> It actually goes wrong at this line: >>> https://github.com/apache/spark/blob/v2.4.2/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L468 >>> >>> Now, I'm 100% sure Spark is OK and there's no bug, but there must be >>> something wrong with my setup. I don't understand the code of the >>> ApplicationMaster, so could somebody explain me what it is trying to reach? >>> Where exactly does the connection timeout? So at least I can debug it >>> further because I don't have a clue what it is doing :-) >>> >>> Thanks for any help! >>> Jochen >>> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > -- *Roland Johann*Software Developer/Data Engineer *phenetic GmbH* Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io Web: phenetic.io Handelsregister: Amtsgericht Köln (HRB 92595) Geschäftsführer: Roland Johann, Uwe Reimann