Re: K-Means And Class Tags

2015-01-08 Thread devl.development
Thanks for the suggestion, can anyone offer any advice on the ClassCast
Exception going from Java to Scala? Why does going from JavaRDD.rdd() and
then a collect() result in this exception?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-tp10038p10047.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



LinearRegressionWithSGD accuracy

2015-01-15 Thread devl.development
>From what I gather, you use LinearRegressionWithSGD to predict y or the
response variable given a feature vector x.

In a simple example I used a perfectly linear dataset such that x=y
y,x
1,1
2,2
...

1,1

Using the out-of-box example from the website (with and without scaling):

 val data = sc.textFile(file)

val parsedData = data.map { line =>
  val parts = line.split(',')
 LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
and x

}
val scaler = new StandardScaler(withMean = true, withStd = true)
  .fit(parsedData.map(x => x.features))
val scaledData = parsedData
  .map(x =>
  LabeledPoint(x.label,
scaler.transform(Vectors.dense(x.features.toArray

// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

// Evaluate model on training examples and compute training error *
tried using both scaledData and parsedData
val valuesAndPreds = scaledData.map { point =>
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
println("training Mean Squared Error = " + MSE)

Both scaled and unscaled attempts give:

training Mean Squared Error = NaN

I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
still comes up with the same thing.

Is this not supposed to work for x and y or 2 dimensional plots? Is there
something I'm missing or wrong in the code above? Or is there a limitation
in the method?

Thanks for any advice.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Introduction to Spark Blog

2014-10-09 Thread devl.development
Hi Spark community

Having spent some time getting up to speed with the various Spark components
in the core package, I've written a blog to help other newcomers and
contributors.

By no means am I a Spark expert so would be grateful for any advice,
comments or edit suggestions. 

Thanks very much here's the post.

http://batchinsights.wordpress.com/2014/10/09/a-short-dive-into-apache-spark/

Dev





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Introduction-to-Spark-Blog-tp8718.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org