Re: JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com
wholeTextFiles() works. It is just that it does not provide the parallelism. This is on Spark 1.4. HDP 2.3.2. Batch jobs. Thanks > On Apr 26, 2016, at 9:16 PM, Harjit Singh > wrote: > > You will have to write your customReceiver to do that. I don’t think > wholeTextFile is designed for that

Re: JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Harjit Singh
You will have to write your customReceiver to do that. I don’t think wholeTextFile is designed for that. - Harjit > On Apr 26, 2016, at 7:19 PM, Mail.com wrote: > > > Hi All, > I am reading entire directory of gz XML files with wholeTextFiles. > > I understand as it is gz and with wholeTextFi

JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com
Hi All, I am reading entire directory of gz XML files with wholeTextFiles. I understand as it is gz and with wholeTextFiles the individual files are not splittable but why the entire directory is read by one executor, single task. I have provided number of executors as number of files in that