RE: ideal number of executors per machine

2015-12-16 Thread Bui, Tri
Article below gives a good idea. http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ Play around with two configuration (large number of executor with small core, and small executor with large core) . Calculated value have to be conservative or it will make the spa

RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-18 Thread Bui, Tri
DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, Dec 12, 2014 at 12:23 PM, Bui, Tri wrote: > Thanks for the info. > > How do I use StandardScaler() to scale example data (10246.0,[14111.0,1.0]) ? > > Thx > tr

RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
Thanks for the info. How do I use StandardScaler() to scale example data (10246.0,[14111.0,1.0]) ? Thx tri -Original Message- From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] Sent: Friday, December 12, 2014 1:26 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Do I need to

RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
linalg.Vectors Thanks Tri -Original Message- From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] Sent: Friday, December 12, 2014 12:16 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression? You need to do the

Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
Hi, Trying to use LBFGS as the optimizer, do I need to implement feature scaling via StandardScaler or does LBFGS do it by default? Following code generated error " Failure again! Giving up and returning, Maybe the objective is just poorly behaved ?". val data = sc.textFile("file:///data/Tra

RE: Learning rate or stepsize automation

2014-12-09 Thread Bui, Tri
Thanks! Will try it out. From: Debasish Das [mailto:debasish.da...@gmail.com] Sent: Monday, December 08, 2014 5:13 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Learning rate or stepsize automation Hi Bui, Please use BFGS based solvers...For BFGS you don't have to specify step

Learning rate or stepsize automation

2014-12-08 Thread Bui, Tri
Hi, Is there any way to auto calculate the optimum learning rate or stepsize via MLLIB for SGD ? Thx tri

Cannot PredictOnValues or PredictOn base on the model build with StreamingLinearRegressionWithSGD

2014-12-05 Thread Bui, Tri
Hi, The following example code is able to build the correct model.weights, but its prediction value is zero. Am I passing the PredictOnValues incorrectly? I also coded a batch version base on LinearRegressionWithSGD() with the same train and test data, iteration, stepsize info, and it was

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
Yep. No localhost Usually, I use hdfs:///user/data to indicates I want hdfs or file:///user/data to indicates local file directory. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, December 01, 2014 5:06 PM To: Bui, Tri Cc: Benjamin Cuthbert; user

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
For the streaming example I am working on, Its accepted ("hdfs:///user/data") without the localhost info. Let me dig through my hdfs config. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Monday, December 01, 2014 4:50 PM To: Benjamin Cuthbert Cc: user@spark.

RE: hdfs streaming context

2014-12-01 Thread Bui, Tri
Try ("hdfs:///localhost:8020/user/data/*") With 3 "/". Thx tri -Original Message- From: Benjamin Cuthbert [mailto:cuthbert@gmail.com] Sent: Monday, December 01, 2014 4:41 PM To: user@spark.apache.org Subject: hdfs streaming context All, Is it possible to stream on HDFS director

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-12-01 Thread Bui, Tri
ted values, which is the lp.features. Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Thursday, November 27, 2014 12:22 AM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD Hi Tri, Maybe my latest responds

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-11-26 Thread Bui, Tri
nValues(testData.map(lp => (lp.label, lp.features))).print() [error] ^ [error] two errors found [error] (compile:compile) Compilation failed Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Tuesday, November 25, 2014 8:57 PM To: Bui, Tri Cc: user@spark.apache.org Subject:

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-11-26 Thread Bui, Tri
.zeros(args(3).toInt)).setNumIterations(args(4).toInt).setStepSize(.0001) model.trainOn(trainingData) model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print() ssc.start() ssc.awaitTermination() Thanks Tri From: Bui, Tri [mailto:tri@verizonwireless.com.INVALID] Sen

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-11-25 Thread Bui, Tri
().setInitialWeights(Vectors.zeros(args(3).toInt)) .setIntercept(true) But still get compilation error. Thanks Tri From: Yanbo Liang [mailto:yanboha...@gmail.com] Sent: Tuesday, November 25, 2014 4:08 AM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Inaccurate Estimate of weights model from

Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

2014-11-24 Thread Bui, Tri
Hi, I am getting incorrect weights model from StreamingLinearRegressionwith SGD. One feature Input data is: (1,[1]) (2,[2]) ... . (20,[20]) The result from the Current model: weights is [-4.432]which is not correct. Also, how do I turn on the intercept value for the StreamingLinearRegressi

RE: Mulitple Spark Context

2014-11-14 Thread Bui, Tri
Does this also apply to StreamingContext ? What issue would I have if I have 1000s of StreaminContext ? Thanks Tri From: Daniil Osipov [mailto:daniil.osi...@shazam.com] Sent: Friday, November 14, 2014 3:47 PM To: Charles Cc: u...@spark.incubator.apache.org Subject: Re: Mulitple Spark Context It

RE: Read a HDFS file from Spark using HDFS API

2014-11-14 Thread Bui, Tri
It should be val file = sc.textFile("hdfs:///localhost:9000/sigmoid/input.txt") 3 “///” Thanks Tri From: rapelly kartheek [mailto:kartheek.m...@gmail.com] Sent: Friday, November 14, 2014 9:42 AM To: Akhil Das; user@spark.apache.org Subject: Re: Read a HDFS file from Spark using HDFS API No. I

streaming linear regression is not building the model

2014-11-10 Thread Bui, Tri
Hi, The model weight is not updating for streaming linear regression. The code and data below is what I am running. import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD import