Re: [EXTERNAL] Re: Stage level scheduling - lower the number of executors when using GPUs

2022-11-02 Thread Shay Elbaz
Thanks Artemis. We are not using Rapids, but rather using GPUs through the Stage Level Scheduling feature with ResourceProfile. In Kubernetes you have to turn on shuffle tracking for dynamic allocation, anyhow. The question is how we can limit the number of executors when building a new Resource

Re: Stage level scheduling - lower the number of executors when using GPUs

2022-11-02 Thread Artemis User
Are you using Rapids for GPU support in Spark?  Couple of options you may want to try: 1. In addition to dynamic allocation turned on, you may also need to turn on external shuffling service. 2. Sounds like you are using Kubernetes.  In that case, you may also need to turn on shuffle track

Stage level scheduling - lower the number of executors when using GPUs

2022-11-02 Thread Shay Elbaz
Hi, Our typical applications need less executors for a GPU stage than for a CPU stage. We are using dynamic allocation with stage level scheduling, and Spark tries to maximize the number of executors also during the GPU stage, causing a bit of resources chaos in the cluster. This forces us to u

[*IMPORTANT*] update Streaming Query Statistics url

2022-11-02 Thread Priyanshi Shahu
Hello Team, I wanna write a mail to inform you that I'm using the spark 3.3.0 release and in that release when im access the spark monitoring UI and I want to access the Streaming Query Statistics tab (It contains all the running ID), after that when I click and of running id the page redirected to

should one every make a spark streaming job in pyspark

2022-11-02 Thread Joris Billen
Dear community, I had a general question about the use of scala VS pyspark for spark streaming. I believe spark streaming will work most efficiently when written in scala. I believe however that things can be implemented in pyspark. My question: 1)is it completely dumb to make a streaming job in