Re: Mapper side join with DataFrames API

2016-03-02 Thread Deepak Gopalakrishnan
spill ( as in screenshots). Any idea why ? > > Thanks > Deepak > > On Wed, Mar 2, 2016 at 5:14 AM, Michael Armbrust > wrote: > > Its helpful to always include the output of df.explain(true) when you are > asking about performance. > > On Mon, Feb 29, 2016 at 6:14 PM, Dee

Re: Mapper side join with DataFrames API

2016-03-04 Thread Deepak Gopalakrishnan
ugh-memory&sa=D&sntz=1&usg=AFQjCNEzDJqylz5aF0998u08RGlf5YF1-g> On Thu, Mar 3, 2016 at 7:06 AM, Deepak Gopalakrishnan wrote: > Hello, > > I'm using 1.6.0 on EMR > > On Thu, Mar 3, 2016 at 12:34 AM, Yong Zhang wrote: > >> What version of Spark you are usi

Re: Mapper side join with DataFrames API

2016-03-05 Thread Deepak Gopalakrishnan
Hello Guys, No help yet. Can someone tell me with a reply to the above question in SO ? Thanks Deepak On Fri, Mar 4, 2016 at 5:32 PM, Deepak Gopalakrishnan wrote: > Have added this to SO, can you guys share any thoughts ? > > > http://stackoverflow.com/questions/35795518/spark-1

Re: Running ALS on comparitively large RDD

2016-03-10 Thread Deepak Gopalakrishnan
oducts) > 2. Spark cluster set up and version > > Thanks > > On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan > wrote: > >> Hello All, >> >> I've been running Spark's ALS on a dataset of users and rated items. I >> first encode my users to intege

Re: Running ALS on comparitively large RDD

2016-03-11 Thread Deepak Gopalakrishnan
; from? How much driver and executor memory have you provided to Spark? > > > > On Fri, 11 Mar 2016 at 09:21 Deepak Gopalakrishnan > wrote: > >> 1. I'm using about 1 million users against few thousand products. I >> basically have around a million ratings >> 2

Fwd: Mapper side join with DataFrames API

2016-02-29 Thread Deepak Gopalakrishnan
says spilling sort data. I'm a little surprised why this happens even when I have enough memory free. Any inputs will be greatly appreciated! Thanks -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com

Timeout Error

2015-04-26 Thread Deepak Gopalakrishnan
connection issue. I've a r3 xlarge and 2 m3 large. Can anyone suggest a way to fix this? -- Regards, *Deepak Gopalakrishnan* *Mobile*:+918891509774 *Skype* : deepakgk87 http://myexps.blogspot.com

Re: Spark timeout issue

2015-04-26 Thread Deepak Gopalakrishnan
2015 at 12:42 PM, Deepak Gopalakrishnan > wrote: > > Hello All, > > > > I'm trying to process a 3.5GB file on standalone mode using spark. I > could > > run my spark job succesfully on a 100MB file and it works as expected. > But, > > when

Re: Timeout Error

2015-04-26 Thread Deepak Gopalakrishnan
rote: > I'm not sure what the expected performance should be for this amount of > data, but you could try to increase the timeout with the property > "spark.akka.timeout" to see if that helps. > > Bryan > > On Sun, Apr 26, 2015 at 6:57 AM, Deepak Gopalakrishnan

Re: Timeout Error

2015-04-27 Thread Deepak Gopalakrishnan
ong Zhu wrote: > The configuration key should be "spark.akka.askTimeout" for this timeout. > The time unit is seconds. > > Best Regards, > Shixiong(Ryan) Zhu > > 2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan : > > Hello, >> >> >> Just to a