Jacek, Turns out that this was the RPC connection to the master (7077) from the driver closing. We had Istio closing this out as there was a silly idle timeout setting they had after one hour.
I was able to re-create this by running lsof on the driver for port 7077 and then killing that process. After this, I would see the application mark as "finished" The fix was to exclude port 7077 on the istio sidecar... it only took me over 6 months to figure this out, so I wanted to share. :) On Thu, Jan 21, 2021 at 5:39 AM Jacek Laskowski <ja...@japila.pl> wrote: > Hi Brett, > > No idea why it happens, but got curious about this "Cores" column being 0. > Is this always the case? > > Pozdrawiam, > Jacek Laskowski > ---- > https://about.me/JacekLaskowski > "The Internals Of" Online Books <https://books.japila.pl/> > Follow me on https://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Tue, Jan 19, 2021 at 11:27 PM Brett Spark <blarsonsp...@gmail.com> > wrote: > >> Hello! >> When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our >> standalone Spark "applications" timeout and show as "Finished" after around >> an hour of time. >> >> Here is a screenshot from the Spark master before it's marked as finished. >> [image: image.png] >> Here is a screenshot from the Spark master after it's marked as finished. >> (After over an hour of idle time). >> [image: image.png] >> Here are the logs from the Spark Master / Worker: >> >> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master >> 2021-01-19 21:55:47,282 INFO master.Master: 172.32.3.66:34570 got >> disassociated, removing it. >> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master >> 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:36556 got >> disassociated, removing it. >> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master >> 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:37305 got >> disassociated, removing it. >> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master >> 2021-01-19 21:55:52,096 INFO master.Master: Removing app >> app-20210119204911-0000 >> spark-worker-2d733568b2a7e82de7b2b09b6daa17e9-7bbb75f9b6-8mv2b worker >> 2021-01-19 21:55:52,112 INFO shuffle.ExternalShuffleBlockResolver: >> Application app-20210119204911-0000 removed, cleanupLocalDirs = true >> >> Is there a setting that causes an application to timeout after an hour of >> a Spark application or Spark worker being idle? >> >> I would like to keep our Spark applications alive as long as possible. >> >> I haven't been able to find a setting in the Spark confs documentation >> that corresponds to this so i'm wondering if this is something that's hard >> coded. >> >> Please let me know, >> Thank you! >> >