Unsubscribe

2021-07-05 Thread Vijay Gharge
Unsubscribe

Unsubscribe

2021-07-05 Thread Sergii Poluektov

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
It is true that the original idea of Yarn on Hdfs came from data affinity. However, nowadays the separation of storage from the compute layer is very common. They do not allude to data affinity (say using Hadoop clusters). They refer to storage in Cloud and they refer to use of SSDs etc. I know cl

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Christian Pfarr
Does anyone know where the data for this benchmark was stored? Spark on YARN gets performance because of data locality via co-allocation of YARN Nodemanager and HDFS Datanode, not because of the job scheduler, right? Regards, z0ltrix \ Original-Nachric

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
Thanks Aditya for the link. I will have a look. Cheers view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise fro

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Madaditya .Maddy
I came across an article that benchmarked spark on k8s vs yarn by Datamechanics. Link : https://www.datamechanics.co/blog-post/apache-spark-performance-benchmarks-show-kubernetes-has-caught-up-with-yarn -Regards Aditya On Mon, Jul 5, 2021, 23:49 Mich Talebzadeh wrote: > Thanks Yuri. Those are

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
Thanks Yuri. Those are very valid points. Let me clarify my point. Let us assume that we will be using Yarn versus K8s doing the same job. Spark-submit will use Yarn at first instance and will then switch to using k8s for the same task. 1. Have there been such benchmarks? 2. When should I

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
Not a big expert on Spark, but I’m not really understand how you are going to compare and what? Reading-writing to and from Hdfs? How does it related to yarn and k8s… these are recourse managers (YARN yet another resource manager) : what and how much to allocate and when… (cpu, ram). Local Disk

Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-05 Thread Mich Talebzadeh
I was curious to know if there are benchmarks around on comparison between Spark on Yarn compared to Kubernetes. This question arose because traditionally in Google Cloud we have been using Spark on Dataproc clusters. Dataproc provides Spark, Hadoop plus other

Re: Spark AQE Post-Shuffle partitions coalesce don't work as expected, and even make data skew in some partitions. Need help to debug issue.

2021-07-05 Thread Nick Grigoriev
Hi Mich, Thanks for quick response. 1. No, I use Batch query with fixed start and end offset. 2. Yes, My message in Kafkas(json format) can have really big difference in size from 1kb to 9kb. And even when I transform JSON to flat Spark SQL row it still can has different size. 3. I have two sta