wholeTextFiles() works. It is just that it does not provide the parallelism.
This is on Spark 1.4. HDP 2.3.2. Batch jobs.
Thanks
> On Apr 26, 2016, at 9:16 PM, Harjit Singh
> wrote:
>
> You will have to write your customReceiver to do that. I don’t think
> wholeTextFile is designed for that
You will have to write your customReceiver to do that. I don’t think
wholeTextFile is designed for that.
- Harjit
> On Apr 26, 2016, at 7:19 PM, Mail.com wrote:
>
>
> Hi All,
> I am reading entire directory of gz XML files with wholeTextFiles.
>
> I understand as it is gz and with wholeTextFi
Hi All,
I am reading entire directory of gz XML files with wholeTextFiles.
I understand as it is gz and with wholeTextFiles the individual files are not
splittable but why the entire directory is read by one executor, single task. I
have provided number of executors as number of files in that