Re: LBGFS optimizer performace

2015-03-06 Thread Gustavo Enrique Salazar Torres
rithms with ability to caching more data. > > Sincerely, > > DB Tsai > --- > Blog: https://www.dbtsai.com > > > On Tue, Mar 3, 2015 at 2:27 PM, Gustavo Enrique Salazar Torres > wrote: > > Yeah, I can call count be

Re: LBGFS optimizer performace

2015-03-04 Thread Gustavo Enrique Salazar Torres
occurring within LBFGS. With the > given stack trace, I'm not sure what part of LBFGS it's happening in. > > On Tue, Mar 3, 2015 at 2:27 PM, Gustavo Enrique Salazar Torres < > gsala...@ime.usp.br> wrote: > >> Yeah, I can call count before that and it works.

Re: LBGFS optimizer performace

2015-03-03 Thread Gustavo Enrique Salazar Torres
looks like it might be > happening before the data even gets to LBFGS. (Perhaps the outer join > you're trying to do is making the dataset size explode a bit.) Are you > able to call count() (or any RDD action) on the data before you pass it to > LBFGS? > > On Tue, Mar 3, 201

Re: LBGFS optimizer performace

2015-03-03 Thread Gustavo Enrique Salazar Torres
ry. I will let you know. Thanks On Tue, Mar 3, 2015 at 3:25 AM, Akhil Das wrote: > Can you try increasing your driver memory, reducing the executors and > increasing the executor memory? > > Thanks > Best Regards > > On Tue, Mar 3, 2015 at 10:09 AM, Gustavo Enrique Salazar T

Re: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: pyspark on yarn

2015-03-03 Thread Gustavo Enrique Salazar Torres
Hi Sam: Shouldn't you define the table schema? I had the same problem in Scala and then I solved it defining the schema. I did this: sqlContext.applySchema(dataRDD, tableSchema).registerTempTable(tableName) Hope it helps. On Mon, Jan 5, 2015 at 7:01 PM, Sam Flint wrote: > Below is the code th

LBGFS optimizer performace

2015-03-02 Thread Gustavo Enrique Salazar Torres
Hi there: I'm using LBFGS optimizer to train a logistic regression model. The code I implemented follows the pattern showed in https://spark.apache.org/docs/1.2.0/mllib-linear-methods.html but training data is obtained from a Spark SQL RDD. The problem I'm having is that LBFGS tries to count the e

Problem when sorting big file

2014-05-16 Thread Gustavo Enrique Salazar Torres
Hi there: I have this dataset (about 12G) which I need to sort by key. I used the sortByKey method but when I try to save the file to disk (HDFS in this case) it seems that some tasks run out of time because they have too much data to save and it can't fit in memory. I say this because before the