Re: implementing the VectorAccumulatorParam

2014-06-09 Thread dataginjaninja
New error :-( scala> object VectorAccumulatorParam extends AccumulatorParam[Vector] { | def zero(initialValue: Vector): Vector = { | Vector.zeros(initialValue.size) | } | def addInPlace(v1: Vector, v2: Vector): Vector = { | v1 += v2 | } | } :14: e

Re: implementing the VectorAccumulatorParam

2014-06-09 Thread dataginjaninja
New error :-( scala> object VectorAccumulatorParam extends AccumulatorParam[Vector] { | def zero(initialValue: Vector): Vector = { | Vector.zeros(initialValue.size) | } | def addInPlace(v1: Vector, v2: Vector): Vector = { | v1 += v2 | } | } :12: e

Re: implementing the VectorAccumulatorParam

2014-06-09 Thread dataginjaninja
You are right. I was using the wrong vector class. Thanks. - Cheers, Stephanie -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/implementing-the-VectorAccumulatorParam-tp6973p6975.html Sent from the Apache Spark Developers List mailing list archive

implementing the VectorAccumulatorParam

2014-06-09 Thread dataginjaninja
The programming-guide has the following: object VectorAccumulatorParam extends AccumulatorParam[Vector] { def zero(initialValue: Vector): Vector = { Vector.zeros(initialValue.size) } def addInPlace(v1

implementing the VectorAccumulatorParam

2014-06-09 Thread dataginjaninja
The programming-guide has the following: However, when I try to use this I get an error: Last thing, am I posting on the wrong list? - Cheers, Stephanie -- View this message in context: http://apache-spark-developers-list.1

Re: Timestamp support in v1.0

2014-06-05 Thread dataginjaninja
I can confirm that the patch fixed my issue. :-) - Cheers, Stephanie -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Timestamp-support-in-v1-0-tp6850p6948.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Timestamp support in v1.0

2014-05-29 Thread dataginjaninja
Darn, I was hoping just to sneak it in that file. I am not the only person working on the cluster; if I rebuild it that means I have to redeploy everything to all the nodes as well. So I cannot do that ... today. If someone else doesn't beat me to it. I can rebuild at another time. - Cheer

Re: Timestamp support in v1.0

2014-05-29 Thread dataginjaninja
Michael, Will I have to rebuild after adding the change? Thanks -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Timestamp-support-in-v1-0-tp6850p6855.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Timestamp support in v1.0

2014-05-29 Thread dataginjaninja
Yes, I get the same error: scala> val hc = new org.apache.spark.sql.hive.HiveContext(sc) 14/05/29 16:53:40 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/05/29 16:53:40 INFO deprecation: mapred.max.split.size is depre

Timestamp support in v1.0

2014-05-29 Thread dataginjaninja
Can anyone verify which rc [SPARK-1360] Add Timestamp Support for SQL #275 is included in? I am running rc3, but receiving errors with TIMESTAMP as a datatype in my Hive tables when trying to use them in pyspark. *The error I get: * 14/05/29 15:44:47 I

Re: Standard preprocessing/scaling

2014-05-29 Thread dataginjaninja
I do see the issue for centering sparse data. Actually, the centering is less important than the scaling by the standard deviation. Not having unit variance causes the convergence issues and long runtimes. RowMatrix will compute variance of a column? -- View this message in context: http://ap

Standard preprocessing/scaling

2014-05-28 Thread dataginjaninja
I searched on this, but didn't find anything general so I apologize if this has been addressed. Many algorithms (SGD, SVM...) either will not converge or will run forever if the data is not scaled. Sci-kit has preprocessing