subject:"Question about relationship between number of files and initial tasks\(partitions\)"

RE: Question about relationship between number of files and initial tasks(partitions)

2019-04-13 Thread email

11, 2019 8:23 AM To: yeikel valdes Cc: jasonnerot...@gmail.com; arthur...@flipp.com; user @spark/'user @spark'/spark users/user@spark Subject: Re: Question about relationship between number of files and initial tasks(partitions) Extending Arthur's question, I am facing the

Re: Question about relationship between number of files and initial tasks(partitions)

2019-04-11 Thread Sagar Grover

Extending Arthur's question, I am facing the same problem(no of partitions were huge- cored 960, partitions - 16000). I tried to decrease the number of partitions with coalesce, but the problem is unbalanced data. After using coalesce, it gives me Java out of heap space error. There was no out of h

Re: Question about relationship between number of files and initial tasks(partitions)

2019-04-10 Thread yeikel valdes

If you need to reduce the number of partitions you could also try df.coalesce On Thu, 04 Apr 2019 06:52:26 -0700 jasonnerot...@gmail.com wrote Have you tried something like this? spark.conf.set("spark.sql.shuffle.partitions", "5" ) On Wed, Apr 3, 2019 at 8:37 PM Arthur Li wrote: H

Re: Question about relationship between number of files and initial tasks(partitions)

2019-04-04 Thread Jason Nerothin

Have you tried something like this? spark.conf.set("spark.sql.shuffle.partitions", "5" ) On Wed, Apr 3, 2019 at 8:37 PM Arthur Li wrote: > Hi Sparkers, > > I noticed that in my spark application, the number of tasks in the first > stage is equal to the number of files read by the application(

Question about relationship between number of files and initial tasks(partitions)

2019-04-03 Thread Arthur Li

Hi Sparkers, I noticed that in my spark application, the number of tasks in the first stage is equal to the number of files read by the application(at least for Avro) if the number of cpu cores is less than the number of files. Though If cpu cores are more than number of files, it's usually equal

RE: Question about relationship between number of files and initial tasks(partitions)

Re: Question about relationship between number of files and initial tasks(partitions)

Re: Question about relationship between number of files and initial tasks(partitions)

Re: Question about relationship between number of files and initial tasks(partitions)

Question about relationship between number of files and initial tasks(partitions)

5 matches

Site Navigation

Mail list logo

Footer information