wholeTextFiles() works. It is just that it does not provide the parallelism.
This is on Spark 1.4. HDP 2.3.2. Batch jobs. Thanks > On Apr 26, 2016, at 9:16 PM, Harjit Singh <harjit.si...@deciphernow.com> > wrote: > > You will have to write your customReceiver to do that. I don’t think > wholeTextFile is designed for that. > > - Harjit >> On Apr 26, 2016, at 7:19 PM, Mail.com <pradeep.mi...@mail.com> wrote: >> >> >> Hi All, >> I am reading entire directory of gz XML files with wholeTextFiles. >> >> I understand as it is gz and with wholeTextFiles the individual files are >> not splittable but why the entire directory is read by one executor, single >> task. I have provided number of executors as number of files in that >> directory. >> >> Is the only option here is to repartition after the xmls are read and parsed >> with JaxB. >> >> Regards, >> Pradeep >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org > > v/r, > Harjit Singh > Decipher Technology Studios > email:harjit.sin...@deciphernow.com > mobile: 303-870-0883 > website: deciphertechstudios.com <http://deciphertechstudios.com/> > > GPG: > keyserver: hkps://hkps.pool.sks-keyservers.net > keyid: D814A2EF > > > > >