Re: mapreduce.input.fileinputformat.split.maxsize not working for spark 2.4.0

2019-03-07 Thread Akshay Mendole
ich is 128MB by default. > > Thanks, > Manu Zhang > On Feb 25, 2019, 2:58 AM +0800, Akshay Mendole , > wrote: > > Hi, >We have dfs.blocksize configured to be 512MB and we have some large > files in hdfs that we want to process with spark application. We want to > s

mapreduce.input.fileinputformat.split.maxsize not working for spark 2.4.0

2019-02-24 Thread Akshay Mendole
Hi, We have dfs.blocksize configured to be 512MB and we have some large files in hdfs that we want to process with spark application. We want to split the files get more splits to optimise for memory but the above mentioned parameters are not working The max and min size params as below are con

Re: Tuning G1GC params for aggressive garbage collection?

2018-12-25 Thread Akshay Mendole
, 11:28 pm Ramandeep Singh Nanda Hi, > > Did you try increasing concurrentgcthreads for the marking? > > System.gc is not a good way to handle this, as it is not guaranteed and is > a high pause,full gc. > > Regards, > Ramandeep Singh > > On Tue, Dec 25, 2018, 07:0

Re: spark application takes significant some time to succeed even after all jobs are completed

2018-12-25 Thread Akshay Mendole
: > Do you have a lot of small files? Do you use S3 or similar? It could be > that Spark does some IO related tasks. > > > Am 25.12.2018 um 12:51 schrieb Akshay Mendole : > > > > Hi, > > As you can see in the picture below, the application last job > finished

spark application takes significant some time to succeed even after all jobs are completed

2018-12-25 Thread Akshay Mendole
Hi, As you can see in the picture below, the application last job finished at around 13:45 and I could see the output directory updated with the results. Yet, the application took a total of 20 min more to change the status. What could be the reason for this? Is this a known fact? The applica