Re:Re: Serialization issue when using HBase with Spark

2014-12-23 Thread yangliuyu
n Sean Owen's post: http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/ Best Regards, Shixiong Zhu 2014-12-14 16:35 GMT+08:00 Yanbo :In #1, class HTable can not be serializable. You also need to check you self defined function getUserActions an

Serialization issue when using HBase with Spark

2014-12-12 Thread yangliuyu
The scenario is using HTable instance to scan multiple rowkey range in Spark tasks look likes below: Option 1: val users = input .map { case (deviceId, uid) => uid}.distinct().sortBy(x=>x).mapPartitions(iterator=>{ val conf = HBaseConfiguration.create() val table = new HTable(conf

Re: Native library can not be loaded when using Mllib PCA

2014-06-12 Thread yangliuyu
Finally, we solved this problem by building our own netlib-java natives so files on CentOS, it works without any warning but the performance is far from running in Macbook Pro. The matrix size is rows: 6778, columns: 2487 The MBP used 10 seconds to get the PCA result, but CentOS used 110s, event

Re: Native library can not be loaded when using Mllib PCA

2014-06-06 Thread yangliuyu
Thanks Xiangrui, I switched to a Ubuntu 14.04 server and it works after install liblapack3gf and libopenblas-base. So it is a environment problem which is not related to Mllib. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Native-library-can-not-be-load

Native library can not be loaded when using Mllib PCA

2014-06-05 Thread yangliuyu
Hi, We're using Mllib (1.0.0 release version) on a k-means clustering problem. We want to reduce the matrix column size before send the points to k-means solver. It works on my mac with the local mode: spark-test-run-assembly-1.0.jar contains my application code, com.github.fommil, netlib code an