Re: Spark 3.1.2 full thread dumps

2022-03-01 Thread Lalwani, Jayesh
This (https://www.elastic.co/blog/benchmarking-and-sizing-your-elasticsearch-cluster-for-logs-and-metrics) has the math for sizing the cluster. There is a similar document (https://docs.aws.amazon.com/opensearch-service/latest/developerguide/sizing-domains.html) on sizing your cluster on AWS.

Re: Spark 3.1.2 full thread dumps

2022-02-11 Thread Maksim Grinman
Thanks for these suggestions. Regarding hot nodes, are you referring to the same as in this article? https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x. I am also curious where the 10MB heuristic came from, though I have heard a similar heuristic with respect to the size of a pa

Re: Spark 3.1.2 full thread dumps

2022-02-11 Thread Lalwani, Jayesh
You can probably tune writing to elastic search by 1. Increasing number of partitions so you are writing smaller batches of rows to elastic search 2. Using Elastic search’s bulk api 3. Scaling up the number of hot nodes on elastic search cluster to support writing in parallel. You want

Re: Spark 3.1.2 full thread dumps

2022-02-07 Thread Lalwani, Jayesh
Probably not the answer you are looking for, but the best thing to do is to avoid making Spark code sleep. Is there a way you can predict how big your autoscaling group needs to be without looking at all the data? Are you using fixed number of Spark executors or are you have some way of scaling

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
Indeed. Apologies for going on a tangent. view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Maksim Grinman
Not that this discussion is not interesting (it is), but this has strayed pretty far from my original question. Which was: How do I prevent spark from dumping huge Java Full Thread dumps when an executor appears to not be doing anything (in my case, there's a loop where it sleeps waiting for a serv

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
OK basically, do we have a scenario where Spark or for that matter any cluster manager can deploy a new node (after the loss of an existing node) with the view of running the failed tasks on the new executor(s) deployed on that newly spun node? view my Linkedin profile

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Holden Karau
We don’t block scaling up after node failure in classic Spark if that’s the question. On Fri, Feb 4, 2022 at 6:30 PM Mich Talebzadeh wrote: > From what I can see in auto scaling setup, you will always need a min of > two worker nodes as primary. It also states and I quote "Scaling primary > work

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
>From what I can see in auto scaling setup, you will always need a min of two worker nodes as primary. It also states and I quote "Scaling primary workers is not recommended due to HDFS limitations which result in instability while scaling. These limitations do not exist for secondary workers". So

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Sean Owen
I have not seen stack traces under autoscaling, so not even sure what the error in question is. There is always delay in acquiring a whole new executor in the cloud as it usually means a new VM is provisioned. Spark treats the new executor like any other, available for executing tasks. On Fri, Feb

Re: Spark 3.1.2 full thread dumps

2022-02-04 Thread Mich Talebzadeh
Thanks for the info. My concern has always been on how Spark handles autoscaling (adding new executors) when the load pattern changes.I have tried to test this with setting the following parameters (Spark 3.1.2 on GCP) spark-submit --verbose \ ... --conf spark.dynami

Re: Spark 3.1.2 full thread dumps

2022-02-03 Thread Maksim Grinman
It's actually on AWS EMR. The job bootstraps and runs fine -- the autoscaling group is to bring up a service that spark will be calling. Some code waits for the autoscaling group to come up before continuing processing in Spark, since the Spark cluster will need to make requests to the service in t

Re: Spark 3.1.2 full thread dumps

2022-02-03 Thread Mich Talebzadeh
Sounds like you are running this on Google Dataproc cluster (spark 3.1.2) with auto scaling policy? Can you describe if this happens before Spark starts a new job on the cluster or somehow half way through processing an existing job? Also is the job involved doing Spark Structured Streaming? HT