Article below gives a good idea.
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
Play around with two configuration (large number of executor with small core,
and small executor with large core) . Calculated value have to be conservative
or it will make the spa
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Fri, Dec 12, 2014 at 12:23 PM, Bui, Tri wrote:
> Thanks for the info.
>
> How do I use StandardScaler() to scale example data (10246.0,[14111.0,1.0]) ?
>
> Thx
> tr
Thanks for the info.
How do I use StandardScaler() to scale example data (10246.0,[14111.0,1.0]) ?
Thx
tri
-Original Message-
From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
Sent: Friday, December 12, 2014 1:26 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Do I need to
linalg.Vectors
Thanks
Tri
-Original Message-
From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
Sent: Friday, December 12, 2014 12:16 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS
for Linear Regression?
You need to do the
Hi,
Trying to use LBFGS as the optimizer, do I need to implement feature scaling
via StandardScaler or does LBFGS do it by default?
Following code generated error " Failure again! Giving up and returning,
Maybe the objective is just poorly behaved ?".
val data = sc.textFile("file:///data/Tra
Thanks! Will try it out.
From: Debasish Das [mailto:debasish.da...@gmail.com]
Sent: Monday, December 08, 2014 5:13 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Learning rate or stepsize automation
Hi Bui,
Please use BFGS based solvers...For BFGS you don't have to specify step
Hi,
Is there any way to auto calculate the optimum learning rate or stepsize via
MLLIB for SGD ?
Thx
tri
Hi,
The following example code is able to build the correct model.weights, but its
prediction value is zero. Am I passing the PredictOnValues incorrectly? I
also coded a batch version base on LinearRegressionWithSGD() with the same
train and test data, iteration, stepsize info, and it was
Yep. No localhost
Usually, I use hdfs:///user/data to indicates I want hdfs or file:///user/data
to indicates local file directory.
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, December 01, 2014 5:06 PM
To: Bui, Tri
Cc: Benjamin Cuthbert; user
For the streaming example I am working on, Its accepted ("hdfs:///user/data")
without the localhost info.
Let me dig through my hdfs config.
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Monday, December 01, 2014 4:50 PM
To: Benjamin Cuthbert
Cc: user@spark.
Try
("hdfs:///localhost:8020/user/data/*")
With 3 "/".
Thx
tri
-Original Message-
From: Benjamin Cuthbert [mailto:cuthbert@gmail.com]
Sent: Monday, December 01, 2014 4:41 PM
To: user@spark.apache.org
Subject: hdfs streaming context
All,
Is it possible to stream on HDFS director
ted values, which is the lp.features.
Thanks
Tri
From: Yanbo Liang [mailto:yanboha...@gmail.com]
Sent: Thursday, November 27, 2014 12:22 AM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Inaccurate Estimate of weights model from
StreamingLinearRegressionWithSGD
Hi Tri,
Maybe my latest responds
nValues(testData.map(lp => (lp.label,
lp.features))).print()
[error] ^
[error] two errors found
[error] (compile:compile) Compilation failed
Thanks
Tri
From: Yanbo Liang [mailto:yanboha...@gmail.com]
Sent: Tuesday, November 25, 2014 8:57 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject:
.zeros(args(3).toInt)).setNumIterations(args(4).toInt).setStepSize(.0001)
model.trainOn(trainingData)
model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()
ssc.start()
ssc.awaitTermination()
Thanks
Tri
From: Bui, Tri [mailto:tri@verizonwireless.com.INVALID]
Sen
().setInitialWeights(Vectors.zeros(args(3).toInt))
.setIntercept(true)
But still get compilation error.
Thanks
Tri
From: Yanbo Liang [mailto:yanboha...@gmail.com]
Sent: Tuesday, November 25, 2014 4:08 AM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Inaccurate Estimate of weights model from
Hi,
I am getting incorrect weights model from StreamingLinearRegressionwith SGD.
One feature Input data is:
(1,[1])
(2,[2])
...
.
(20,[20])
The result from the Current model: weights is [-4.432]which is not correct.
Also, how do I turn on the intercept value for the StreamingLinearRegressi
Does this also apply to StreamingContext ?
What issue would I have if I have 1000s of StreaminContext ?
Thanks
Tri
From: Daniil Osipov [mailto:daniil.osi...@shazam.com]
Sent: Friday, November 14, 2014 3:47 PM
To: Charles
Cc: u...@spark.incubator.apache.org
Subject: Re: Mulitple Spark Context
It
It should be
val file = sc.textFile("hdfs:///localhost:9000/sigmoid/input.txt")
3 “///”
Thanks
Tri
From: rapelly kartheek [mailto:kartheek.m...@gmail.com]
Sent: Friday, November 14, 2014 9:42 AM
To: Akhil Das; user@spark.apache.org
Subject: Re: Read a HDFS file from Spark using HDFS API
No. I
Hi,
The model weight is not updating for streaming linear regression. The code and
data below is what I am running.
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
import
19 matches
Mail list logo