Hi Riccardo, Thanks for your suggestions. The thing is that my Spark UI is the one thing that is crashing - and not the app. In fact the app does end up completing successfully. That's why I'm a bit confused by this issue? I'll still try out some of your suggestions. Thanks and Regards, Saatvik Shah
On Tue, Jul 18, 2017 at 9:59 AM, Riccardo Ferrari <ferra...@gmail.com> wrote: > The reason you get connection refused when connecting to the application > UI (port 4040) is because you app gets stopped thus the application UI > stops as well. To inspect your executors logs after the fact you might find > useful the Spark History server > <https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact> > (for standalone mode). > > Personally I I collect the logs from my worker nodes. They generally sit > under the $SPARK_HOME/work/<app-id>/<executor-number> (for standalone). > There you can find exceptions and messages from the executors assigned to > your app. > > Now, about you app crashing, might be useful check whether it is sized > correctly. The issue you linked sounds appropriate however I would give > some sanity checks a try. I solved many issues just by sizing an app that I > would first check memory size, cpu allocations and so on.. > > Best, > > On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah <saatvikshah1...@gmail.com> > wrote: > >> Hi Riccardo, >> >> Yes, Thanks for suggesting I do that. >> >> [Stage 1:==========================================> (12750 + 40) >> / 15000]17/07/18 13:22:28 ERROR org.apache.spark.scheduler.LiveListenerBus: >> Dropping SparkListenerEvent because no remaining room in event queue. This >> likely means one of the SparkListeners is too slow and cannot keep up with >> the rate at which tasks are being started by the scheduler. >> 17/07/18 13:22:28 WARN org.apache.spark.scheduler.LiveListenerBus: >> Dropped 1 SparkListenerEvents since Thu Jan 01 00:00:00 UTC 1970 >> [Stage 1:============================================> (13320 + 41) >> / 15000]17/07/18 13:23:28 WARN org.apache.spark.scheduler.LiveListenerBus: >> Dropped 26782 SparkListenerEvents since Tue Jul 18 13:22:28 UTC 2017 >> [Stage 1:==============================================> (13867 + 40) >> / 15000]17/07/18 13:24:28 WARN org.apache.spark.scheduler.LiveListenerBus: >> Dropped 58751 SparkListenerEvents since Tue Jul 18 13:23:28 UTC 2017 >> [Stage 1:===============================================> (14277 + 40) >> / 15000]17/07/18 13:25:10 INFO >> org.spark_project.jetty.server.ServerConnector: >> Stopped ServerConnector@3b7284c4{HTTP/1.1}{0.0.0.0:4040} >> 17/07/18 13:25:10 ERROR org.apache.spark.scheduler.LiveListenerBus: >> SparkListenerBus has already stopped! Dropping event >> SparkListenerExecutorMetricsUpdate(4,WrappedArray()) >> And similar WARN/INFO messages continue occurring. >> >> When I try to access the UI, I get: >> >> Problem accessing /proxy/application_1500380353993_0001/. Reason: >> >> Connection to http://10.142.0.17:4040 refused >> >> Caused by: >> >> org.apache.http.conn.HttpHostConnectException: Connection to >> http://10.142.0.17:4040 refused >> at >> org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190) >> at >> org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294) >> at >> org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643) >> at >> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) >> at >> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784) >> at >> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.proxyLink(WebAppProxyServlet.java:200) >> at >> org.apache.hadoop.yarn.server.webproxy.WebAppProxyServlet.doGet(WebAppProxyServlet.java:387) >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) >> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) >> >> >> >> I noticed this issue talks about something similar and I guess is >> related: https://issues.apache.org/jira/browse/SPARK-18838. >> >> On Tue, Jul 18, 2017 at 2:49 AM, Riccardo Ferrari <ferra...@gmail.com> >> wrote: >> >>> Hi, >>> can you share more details. do you have any exceptions from the driver? >>> or executors? >>> >>> best, >>> >>> On Jul 18, 2017 02:49, "saatvikshah1994" <saatvikshah1...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have a pyspark App which when provided a huge amount of data as input >>>> throws the error explained here sometimes: >>>> https://stackoverflow.com/questions/32340639/unable-to-under >>>> stand-error-sparklistenerbus-has-already-stopped-dropping-event. >>>> All my code is running inside the main function, and the only slightly >>>> peculiar thing I am doing in this app is using a custom PySpark ML >>>> Transformer(Modified from >>>> https://stackoverflow.com/questions/32331848/create-a-custom >>>> -transformer-in-pyspark-ml). >>>> Could this be the issue? How can I debug why this is happening? >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://apache-spark-user-list. >>>> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> >> >> >> -- >> *Saatvik Shah,* >> *Masters in the School of Computer Science,* >> *Carnegie Mellon University,* >> *LinkedIn <https://www.linkedin.com/in/saatvikshah/>, Website >> <https://saatvikshah1994.github.io/>* >> > > -- *Saatvik Shah,* *Masters in the School of Computer Science,* *Carnegie Mellon University,* *LinkedIn <https://www.linkedin.com/in/saatvikshah/>, Website <https://saatvikshah1994.github.io/>*