The issue is explained in depth here:
https://medium.com/@manuzhang/the-hidden-cost-of-spark-withcolumn-8ffea517c015
Am 19.12.19 um 23:33 schrieb Chris Teoh:
As far as I'm aware it isn't any better. The logic all gets processed
by the same engine so to confirm, compare the DAGs generated from b
Hi All
Sorry, earlier, I forgot to set the subject line correctly
> Hello Experts
> I am trying to maximise the resource utilisation on my 3 node spark
> cluster (2 data nodes and 1 driver) so that the job finishes quickest. I am
> trying to create a benchmark so I can recommend an optimal POD for
Hello Experts
I am trying to maximise the resource utilisation on my 3 node spark cluster
(2 data nodes and 1 driver) so that the job finishes quickest. I am trying
to create a benchmark so I can recommend an optimal POD for the job
128GB x 16 cores
I have standalone spark running 2.4.0
HTOP shows
Hi all:
i want to ask a question about how to estimate the rdd size( according to
byte) when it is not saved to disk because the job spends long time if the
output is very huge and output partition number is small.
following step is what i can solve for this problem
1.sample 0.01 's or
unsubscribe
As far as I'm aware it isn't any better. The logic all gets processed by
the same engine so to confirm, compare the DAGs generated from both
approaches and see if they're identical.
On Fri, 20 Dec 2019, 8:56 am ayan guha, wrote:
> Quick question: Why is it better to use one sql vs multiple withC
Quick question: Why is it better to use one sql vs multiple withColumn?
isnt everything eventually rewritten by catalyst?
On Wed, 18 Dec 2019 at 9:14 pm, Enrico Minack
wrote:
> How many withColumn statements do you have? Note that it is better to use
> a single select, rather than lots of withCo
If you're inferring the schema, that also incurs an overhead whilst the
data is being read into dataframe.
Are you observing data skew? Perhaps some nodes are busier than others.
Look at the average task time compared to the lowest and highest times.
At 20 cores, 2 cores each executor, 10 executo