; from? How much driver and executor memory have you provided to Spark?
>
>
>
> On Fri, 11 Mar 2016 at 09:21 Deepak Gopalakrishnan
> wrote:
>
>> 1. I'm using about 1 million users against few thousand products. I
>> basically have around a million ratings
>> 2
oducts)
> 2. Spark cluster set up and version
>
> Thanks
>
> On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan
> wrote:
>
>> Hello All,
>>
>> I've been running Spark's ALS on a dataset of users and rated items. I
>> first encode my users to intege
Hello Guys,
No help yet. Can someone tell me with a reply to the above question in SO ?
Thanks
Deepak
On Fri, Mar 4, 2016 at 5:32 PM, Deepak Gopalakrishnan
wrote:
> Have added this to SO, can you guys share any thoughts ?
>
>
> http://stackoverflow.com/questions/35795518/spark-1
ugh-memory&sa=D&sntz=1&usg=AFQjCNEzDJqylz5aF0998u08RGlf5YF1-g>
On Thu, Mar 3, 2016 at 7:06 AM, Deepak Gopalakrishnan
wrote:
> Hello,
>
> I'm using 1.6.0 on EMR
>
> On Thu, Mar 3, 2016 at 12:34 AM, Yong Zhang wrote:
>
>> What version of Spark you are usi
ays
spilling sort data. I'm a little surprised why this happens even when I
have enough memory free.
Any inputs will be greatly appreciated!
Thanks
On Mon, Feb 29, 2016 at 9:15 PM, Deepak Gopalakrishnan
wrote:
> Hello,
>
> I'm trying to join 2 dataframes A and B with a
>
?
DataFrame B = sparkContext.broadcast(B);
B.registerTempTable("B");
--
Regards,
*Deepak Gopalakrishnan*
*Mobile*:+918891509774
*Skype* : deepakgk87
http://myexps.blogspot.com
Hello,
I'm reading S3 files using wholeTextFiles() . My files are gzip format but
the names of the files does not end with a ".gz". I cannot force the names
of these files to end with a ".gz" . Is there a way to specify the
InputFormat as Gzip when using wholeTextFiles(
Hello,
I have a use case where I need to get *an RDD of values per key *from
a PairRDD. Below is my PairRDD.
JavaPairRDD> classifiedSampleRdd =
sampleRDD.groupByKey();
I want a separate RDD for the vectors per double entry in the key. *I
would now want a RDD of values for each key.* Which will b
Hello All,
I'm trying to process a 3.5GB file on standalone mode using spark. I could
run my spark job succesfully on a 100MB file and it works as expected. But,
when I try to run it on the 3.5GB file, I run into the below error :
15/04/26 12:45:50 INFO BlockManagerMaster: Updated info of block