Hi
I am using PySpark for writing Spark queries. My research project requires me
to accurately measure latency for each and every operator/stage in the query. I
can make some guesses but unable to exactly map the stages (shown in the DAG on
Spark UI) to the exact line in my PySpark code.
Can s
You must forgive me for this seemingly pseudo technical question. Last
week I came across a client manager who mentioned developing 4th generation
data warehousing with Spark. And I was wondering whether the individual
pointedly made a reference to the new data lakehouse concept and how it was
dif
Are you sure about the worker mem configuration? what are you setting
--memory too and what does the worker UI think its memory allocation is?
On Sun, Apr 18, 2021 at 4:08 AM Mohamadreza Rostami <
mohamadrezarosta...@gmail.com> wrote:
> I see a bug in executer memory allocation in the standalone
I see a bug in executer memory allocation in the standalone cluster, but I
can't find which part of the spark code causes this problem. That why's I
decided to raise this issue here.
Assume you have 3 workers with 10 CPU cores and 10 Gigabyte memories. Assume
also you have 2 spark jobs that run