Re: Running ALS on comparitively large RDD

2016-03-11 Thread Deepak Gopalakrishnan
; from? How much driver and executor memory have you provided to Spark? > > > > On Fri, 11 Mar 2016 at 09:21 Deepak Gopalakrishnan > wrote: > >> 1. I'm using about 1 million users against few thousand products. I >> basically have around a million ratings >> 2

Re: Running ALS on comparitively large RDD

2016-03-10 Thread Deepak Gopalakrishnan
oducts) > 2. Spark cluster set up and version > > Thanks > > On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan > wrote: > >> Hello All, >> >> I've been running Spark's ALS on a dataset of users and rated items. I >> first encode my users to intege

Re: Mapper side join with DataFrames API

2016-03-05 Thread Deepak Gopalakrishnan
Hello Guys, No help yet. Can someone tell me with a reply to the above question in SO ? Thanks Deepak On Fri, Mar 4, 2016 at 5:32 PM, Deepak Gopalakrishnan wrote: > Have added this to SO, can you guys share any thoughts ? > > > http://stackoverflow.com/questions/35795518/spark-1

Re: Mapper side join with DataFrames API

2016-03-04 Thread Deepak Gopalakrishnan
ugh-memory&sa=D&sntz=1&usg=AFQjCNEzDJqylz5aF0998u08RGlf5YF1-g> On Thu, Mar 3, 2016 at 7:06 AM, Deepak Gopalakrishnan wrote: > Hello, > > I'm using 1.6.0 on EMR > > On Thu, Mar 3, 2016 at 12:34 AM, Yong Zhang wrote: > >> What version of Spark you are usi

Re: Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
ays spilling sort data. I'm a little surprised why this happens even when I have enough memory free. Any inputs will be greatly appreciated! Thanks On Mon, Feb 29, 2016 at 9:15 PM, Deepak Gopalakrishnan wrote: > Hello, > > I'm trying to join 2 dataframes A and B with a >

Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
? DataFrame B = sparkContext.broadcast(B); B.registerTempTable("B"); -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com

Call wholeTextFiles to read gzip files

2016-02-16 Thread Deepak Gopalakrishnan
Hello, I'm reading S3 files using wholeTextFiles() . My files are gzip format but the names of the files does not end with a ".gz". I cannot force the names of these files to end with a ".gz" . Is there a way to specify the InputFormat as Gzip when using wholeTextFiles(

Extracting RDD of values per key from PairRDD

2015-11-03 Thread Deepak Gopalakrishnan
Hello, I have a use case where I need to get *an RDD of values per key *from a PairRDD. Below is my PairRDD. JavaPairRDD> classifiedSampleRdd = sampleRDD.groupByKey(); I want a separate RDD for the vectors per double entry in the key. *I would now want a RDD of values for each key.* Which will b

Re: Spark timeout issue

2015-04-26 Thread Deepak Gopalakrishnan
Hello All, I'm trying to process a 3.5GB file on standalone mode using spark. I could run my spark job succesfully on a 100MB file and it works as expected. But, when I try to run it on the 3.5GB file, I run into the below error : 15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block