Re: JavaSparkContext.wholeTextFiles read directory

Mail.com Tue, 26 Apr 2016 19:31:49 -0700

wholeTextFiles() works.  It is just that it does not provide the parallelism.


This is on Spark 1.4. HDP 2.3.2. Batch jobs.

Thanks

> On Apr 26, 2016, at 9:16 PM, Harjit Singh <harjit.si...@deciphernow.com> 
> wrote:
> 
> You will have to write your customReceiver to do that. I don’t think 
> wholeTextFile is designed for that.
> 
> - Harjit
>> On Apr 26, 2016, at 7:19 PM, Mail.com <pradeep.mi...@mail.com> wrote:
>> 
>> 
>> Hi All,
>> I am reading entire directory of gz XML files with wholeTextFiles. 
>> 
>> I understand as it is gz and with wholeTextFiles the individual files are 
>> not splittable but why the entire directory is read by one executor, single 
>> task. I have provided number of executors as number of files in that 
>> directory.
>> 
>> Is the only option here is to repartition after the xmls are read and parsed 
>> with JaxB.
>> 
>> Regards,
>> Pradeep
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
> 
> v/r,
> Harjit Singh
> Decipher Technology Studios
> email:harjit.sin...@deciphernow.com
> mobile: 303-870-0883
> website: deciphertechstudios.com <http://deciphertechstudios.com/>
> 
> GPG:
> keyserver: hkps://hkps.pool.sks-keyservers.net
> keyid: D814A2EF
> 
> 
> 
> 
>

Re: JavaSparkContext.wholeTextFiles read directory

Reply via email to