Regards,
> Saatvik Shah
>
>
>
> On Fri, Jun 30, 2017 at 12:50 AM, Mahesh Sawaiker <
> mahesh_sawai...@persistent.com> wrote:
>
>> Wouldn’t this work if you load the files in hdfs and let the partitions
>> be equal to the amount of parallelism you want?
>>
>>
ou load the files in hdfs and let the partitions be
>> equal to the amount of parallelism you want?
>>
>>
>>
>> From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
>> Sent: Friday, June 30, 2017 8:55 AM
>> To: ayan guha
>> Cc: user
>> Subj
ilto:saatvikshah1...@gmail.com]
> *Sent:* Friday, June 30, 2017 8:55 AM
> *To:* ayan guha
> *Cc:* user
> *Subject:* Re: PySpark working with Generators
>
>
>
> Hey Ayan,
>
>
>
> This isnt a typical text file - Its a proprietary data format for which a
> nati
Wouldn’t this work if you load the files in hdfs and let the partitions be
equal to the amount of parallelism you want?
From: Saatvik Shah [mailto:saatvikshah1...@gmail.com]
Sent: Friday, June 30, 2017 8:55 AM
To: ayan guha
Cc: user
Subject: Re: PySpark working with Generators
Hey Ayan,
This
Hi
I understand that now. However, your function foo() should take a string
and parse it, rather than trying to read from file. This way, you can
separate the file read path and process part.
r = sc.wholeTextFile(path)
parsed = r.map(lambda x: x[0],foo(x[1]))
On Fri, Jun 30, 2017 at 1:25 PM,
Hey Ayan,
This isnt a typical text file - Its a proprietary data format for which a
native Spark reader is not available.
Thanks and Regards,
Saatvik Shah
On Thu, Jun 29, 2017 at 6:48 PM, ayan guha wrote:
> If your files are in same location you can use sc.wholeTextFile. If not,
> sc.textFile
If your files are in same location you can use sc.wholeTextFile. If not,
sc.textFile accepts a list of filepaths.
On Fri, 30 Jun 2017 at 5:59 am, saatvikshah1994
wrote:
> Hi,
>
> I have this file reading function is called /foo/ which reads contents into
> a list of lists or into a generator of