Jacek,
Turns out that this was the RPC connection to the master (7077) from the
driver closing. We had Istio closing this out as there was a silly idle
timeout setting they had after one hour.

I was able to re-create this by running lsof on the driver for port 7077
and then killing that process. After this, I would see the application mark
as "finished"

The fix was to exclude port 7077 on the istio sidecar... it only took me
over 6 months to figure this out, so I wanted to share. :)

On Thu, Jan 21, 2021 at 5:39 AM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Brett,
>
> No idea why it happens, but got curious about this "Cores" column being 0.
> Is this always the case?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Tue, Jan 19, 2021 at 11:27 PM Brett Spark <blarsonsp...@gmail.com>
> wrote:
>
>> Hello!
>> When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our
>> standalone Spark "applications" timeout and show as "Finished" after around
>> an hour of time.
>>
>> Here is a screenshot from the Spark master before it's marked as finished.
>> [image: image.png]
>> Here is a screenshot from the Spark master after it's marked as finished.
>> (After over an hour of idle time).
>> [image: image.png]
>> Here are the logs from the Spark Master / Worker:
>>
>> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master
>> 2021-01-19 21:55:47,282 INFO master.Master: 172.32.3.66:34570 got
>> disassociated, removing it.
>> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master
>> 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:36556 got
>> disassociated, removing it.
>> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master
>> 2021-01-19 21:55:52,095 INFO master.Master: 172.32.115.115:37305 got
>> disassociated, removing it.
>> spark-master-2d733568b2a7e82de7b2b09b6daa17e9-7cd4cfcddb-f84q7 master
>> 2021-01-19 21:55:52,096 INFO master.Master: Removing app
>> app-20210119204911-0000
>> spark-worker-2d733568b2a7e82de7b2b09b6daa17e9-7bbb75f9b6-8mv2b worker
>> 2021-01-19 21:55:52,112 INFO shuffle.ExternalShuffleBlockResolver:
>> Application app-20210119204911-0000 removed, cleanupLocalDirs = true
>>
>> Is there a setting that causes an application to timeout after an hour of
>> a Spark application or Spark worker being idle?
>>
>> I would like to keep our Spark applications alive as long as possible.
>>
>> I haven't been able to find a setting in the Spark confs documentation
>> that corresponds to this so i'm wondering if this is something that's hard
>> coded.
>>
>> Please let me know,
>> Thank you!
>>
>

Reply via email to