python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
Hi Guys, I running the following function with spark-submmit and de SO is killing my process : def getRdd(self,date,provider): path='s3n://'+AWS_BUCKET+'/'+date+'/*.log.gz' log2= self.sqlContext.jsonFile(path) log2.registerTempTable('log_test') log2.cache() out=self.sqlConte

Re: python : Out of memory: Kill process

2015-03-25 Thread Eduardo Cusa
avies Liu wrote: > What's the version of Spark you are running? > > There is a bug in SQL Python API [1], it's fixed in 1.2.1 and 1.3, > > [1] https://issues.apache.org/jira/browse/SPARK-6055 > > On Wed, Mar 25, 2015 at 10:33 AM, Eduardo Cusa > wrote: > > H

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
te for the new DataFrame API. > > On Wed, Mar 25, 2015 at 11:49 AM, Eduardo Cusa > wrote: > > Hi Davies, I running 1.1.0. > > > > Now I'm following this thread that recommend use batchsize parameter = 1 > > > > > > > http://apache-spark-user-lis

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
I running on ec2 : 1 Master : 4 CPU 15 GB RAM (2 GB swap) 2 Slaves 4 CPU 15 GB RAM the uncompressed dataset size is 15 GB On Thu, Mar 26, 2015 at 10:41 AM, Eduardo Cusa < eduardo.c...@usmediaconsulting.com> wrote: > Hi Davies, I upgrade to 1.3.0 and still getting Out of Memo

Re: python : Out of memory: Kill process

2015-03-26 Thread Eduardo Cusa
u, Mar 26, 2015 at 2:29 PM, Davies Liu wrote: > Could you try to remove the line `log2.cache()` ? > > On Thu, Mar 26, 2015 at 10:02 AM, Eduardo Cusa > wrote: > > I running on ec2 : > > > > 1 Master : 4 CPU 15 GB RAM (2 GB swap) > > > > 2 Slaves 4 CPU

Re: python : Out of memory: Kill process

2015-03-30 Thread Eduardo Cusa
ike: > > log2= self.sqlContext.jsonFile(path) > log2.count() > ... > out.count() > ... > > On Thu, Mar 26, 2015 at 10:34 AM, Eduardo Cusa > wrote: > > the last try was without log2.cache() and still getting out of memory > > > > I using the fo

Java client connection

2014-11-12 Thread Eduardo Cusa
HI guys, I starting to working with spark from java and when i run the folliwing code : SparkConf conf = new SparkConf().setMaster("spark://10.0.2.20:7077 ").setAppName("SparkTest"); JavaSparkContext sc = new JavaSparkContext(conf); I recived the following error and the java process exit ends:

EC2 VPC script

2014-12-18 Thread Eduardo Cusa
Hi guys. I run the folling command to lauch a new cluster : ./spark-ec2 -k test -i test.pem -s 1 --vpc-id vpc-X --subnet-id subnet-X launch vpc_spark The instances started ok but the command never end. With the following output: Setting up security groups... Searching for existing cl

undefined

2014-12-18 Thread Eduardo Cusa
Hi guys. I run the folling command to lauch a new cluster : ./spark-ec2 -k test -i test.pem -s 1 --vpc-id vpc-X --subnet-id subnet-X launch vpc_spark The instances started ok but the command never end. With the following output: Setting up security groups... Searching for existing cl

Re: EC2 VPC script

2014-12-29 Thread Eduardo Cusa
27;t come up in a reasonable amount of time and > you have to kill and restart the process. > > Does this always happen, or was it just once? > > Nick > > On Thu, Dec 18, 2014 at 9:42 AM, Eduardo Cusa < > eduardo.c...@usmediaconsulting.com> wrote: > >> Hi guys. &g

Play Scala Spark Exmaple

2015-01-09 Thread Eduardo Cusa
Hi guys, I running the following example : https://github.com/knoldus/Play-Spark-Scala in the same machine as the spark master, and the spark cluster was lauched with ec2 script. I'm stuck with this errors, any idea how to fix it? Regards Eduardo call the play app prints the following exceptio

Re: Play Scala Spark Exmaple

2015-01-12 Thread Eduardo Cusa
quot; %% "spark-mllib" % "1.1.0" ) On Sun, Jan 11, 2015 at 3:01 AM, Akhil Das wrote: > What is your spark version that is running on the EC2 cluster? From the build > file <https://github.com/knoldus/Play-Spark-Scala/blob/master/build.sbt> > of