11, 2019 8:23 AM
To: yeikel valdes
Cc: jasonnerot...@gmail.com; arthur...@flipp.com; user @spark/'user
@spark'/spark users/user@spark
Subject: Re: Question about relationship between number of files and initial
tasks(partitions)
Extending Arthur's question,
I am facing the
Extending Arthur's question,
I am facing the same problem(no of partitions were huge- cored 960,
partitions - 16000). I tried to decrease the number of partitions with
coalesce, but the problem is unbalanced data. After using coalesce, it
gives me Java out of heap space error. There was no out of h
If you need to reduce the number of partitions you could also try df.coalesce
On Thu, 04 Apr 2019 06:52:26 -0700 jasonnerot...@gmail.com wrote
Have you tried something like this?
spark.conf.set("spark.sql.shuffle.partitions", "5" )
On Wed, Apr 3, 2019 at 8:37 PM Arthur Li wrote:
H
Have you tried something like this?
spark.conf.set("spark.sql.shuffle.partitions", "5" )
On Wed, Apr 3, 2019 at 8:37 PM Arthur Li wrote:
> Hi Sparkers,
>
> I noticed that in my spark application, the number of tasks in the first
> stage is equal to the number of files read by the application(
Hi Sparkers,
I noticed that in my spark application, the number of tasks in the first
stage is equal to the number of files read by the application(at least for
Avro) if the number of cpu cores is less than the number of files. Though
If cpu cores are more than number of files, it's usually equal