Mapping stages in DAG to line of code in pyspark

2021-04-18 Thread Dhruv Kumar
Hi I am using PySpark for writing Spark queries. My research project requires me to accurately measure latency for each and every operator/stage in the query. I can make some guesses but unable to exactly map the stages (shown in the DAG on Spark UI) to the exact line in my PySpark code. Can s

4th generation Data Warehousing and Spark

2021-04-18 Thread Mich Talebzadeh
You must forgive me for this seemingly pseudo technical question. Last week I came across a client manager who mentioned developing 4th generation data warehousing with Spark. And I was wondering whether the individual pointedly made a reference to the new data lakehouse concept and how it was dif

Re: [Spark Core][Advanced]: Wrong memory allocation on standalone mode cluster

2021-04-18 Thread Sean Owen
Are you sure about the worker mem configuration? what are you setting --memory too and what does the worker UI think its memory allocation is? On Sun, Apr 18, 2021 at 4:08 AM Mohamadreza Rostami < mohamadrezarosta...@gmail.com> wrote: > I see a bug in executer memory allocation in the standalone

[Spark Core][Advanced]: Wrong memory allocation on standalone mode cluster

2021-04-18 Thread Mohamadreza Rostami
I see a bug in executer memory allocation in the standalone cluster, but I can't find which part of the spark code causes this problem. That why's I decided to raise this issue here. Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume also you have 2 spark jobs that run