MatrixFactorizationModel serialization

2014-11-07 Thread Dariusz Kobylarz
I am trying to persist MatrixFactorizationModel (Collaborative Filtering example) and use it in another script to evaluate/apply it. This is the exception I get when I try to use a deserialized model instance: Exception in thread "main" java.lang.NullPointerException at org.apache.spark.rdd

MLlib - Naive Bayes Java example bug

2014-11-03 Thread Dariusz Kobylarz
Hi, I noticed a bug in the sample java code in MLlib - Naive Bayes docs page: http://spark.apache.org/docs/1.1.0/mllib-naive-bayes.html In the filter: |double accuracy = 1.0 * predictionAndLabel.filter(new Function, Boolean>() { @Override public Boolean call(Tuple2 pl) { r

saveAsHadoopFile into avro format

2014-09-08 Thread Dariusz Kobylarz
What is the right way of saving any PairRDD into avro output format. GraphArray extends SpecificRecord etc. I have the following java rdd: JavaPairRDD pairRDD = ... and want to save it to avro format: org.apache.hadoop.mapred.JobConf jc = new org.apache.hadoop.mapred.JobConf(); org.apache.avro.m

Spark-sql with Tachyon cache

2014-08-01 Thread Dariusz Kobylarz
Hi, I would like to ask if spark-sql tables cached by Tachyon is a feature to be migrated from shark. I imagine from the user perspective it would look like this: |CREATE TABLE data TBLPROPERTIES("sparksql.cache" = "tachyon") AS SELECT a, b, c from data_on_disk WHERE month="May";|