Re: PySpark working with Generators

2017-07-05 Thread Saatvik Shah
Regards, > Saatvik Shah > > > > On Fri, Jun 30, 2017 at 12:50 AM, Mahesh Sawaiker < > mahesh_sawai...@persistent.com> wrote: > >> Wouldn’t this work if you load the files in hdfs and let the partitions >> be equal to the amount of parallelism you want? >> >>

Re: PySpark working with Generators

2017-06-30 Thread Jörn Franke
ou load the files in hdfs and let the partitions be >> equal to the amount of parallelism you want? >> >> >> >> From: Saatvik Shah [mailto:saatvikshah1...@gmail.com] >> Sent: Friday, June 30, 2017 8:55 AM >> To: ayan guha >> Cc: user >> Subj

Re: PySpark working with Generators

2017-06-30 Thread Saatvik Shah
ilto:saatvikshah1...@gmail.com] > *Sent:* Friday, June 30, 2017 8:55 AM > *To:* ayan guha > *Cc:* user > *Subject:* Re: PySpark working with Generators > > > > Hey Ayan, > > > > This isnt a typical text file - Its a proprietary data format for which a > nati

RE: PySpark working with Generators

2017-06-29 Thread Mahesh Sawaiker
Wouldn’t this work if you load the files in hdfs and let the partitions be equal to the amount of parallelism you want? From: Saatvik Shah [mailto:saatvikshah1...@gmail.com] Sent: Friday, June 30, 2017 8:55 AM To: ayan guha Cc: user Subject: Re: PySpark working with Generators Hey Ayan, This

Re: PySpark working with Generators

2017-06-29 Thread ayan guha
Hi I understand that now. However, your function foo() should take a string and parse it, rather than trying to read from file. This way, you can separate the file read path and process part. r = sc.wholeTextFile(path) parsed = r.map(lambda x: x[0],foo(x[1])) On Fri, Jun 30, 2017 at 1:25 PM,

Re: PySpark working with Generators

2017-06-29 Thread Saatvik Shah
Hey Ayan, This isnt a typical text file - Its a proprietary data format for which a native Spark reader is not available. Thanks and Regards, Saatvik Shah On Thu, Jun 29, 2017 at 6:48 PM, ayan guha wrote: > If your files are in same location you can use sc.wholeTextFile. If not, > sc.textFile

Re: PySpark working with Generators

2017-06-29 Thread ayan guha
If your files are in same location you can use sc.wholeTextFile. If not, sc.textFile accepts a list of filepaths. On Fri, 30 Jun 2017 at 5:59 am, saatvikshah1994 wrote: > Hi, > > I have this file reading function is called /foo/ which reads contents into > a list of lists or into a generator of