Re: Run ScalaTest inside Intellij IDEA

2014-06-11 Thread Yijie Shen
I got a clean version of the master branch, and do the steps as follows: export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m” mvn -U -Dhadoop.version=2.2.0 -DskipTests clean package after these steps, I open the project in IDEA through pom.xml in the root folder, but wh

Re: Compression with DISK_ONLY persistence

2014-06-11 Thread Matei Zaharia
Yes, actually even if you don’t set it to true, on-disk data is compressed. (This setting only affects serialized data in memory). Matei On Jun 11, 2014, at 2:56 PM, Surendranauth Hiraman wrote: > Hi, > > Will spark.rdd.compress=true enable compression when using DISK_ONLY > persistence? >

Re: Constraint Solver for Spark

2014-06-11 Thread Xiangrui Meng
You idea is close to what implicit feedback does. You can check the paper, which is short and concise. In the ALS setting, all subproblems are independent in each iteration. This is part of the reason why ALS is scalable. If you have some global constraints that make the subproblems no longer decou

Compression with DISK_ONLY persistence

2014-06-11 Thread Surendranauth Hiraman
Hi, Will spark.rdd.compress=true enable compression when using DISK_ONLY persistence? SUREN HIRAMAN, VP TECHNOLOGY Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR NEW YORK, NY 10001 O: (917) 525-2466 ext. 105 F: 646.349.4063 E: suren.hiraman@v elos.io W: www.velos.io

Re: Error During ReceivingConnection

2014-06-11 Thread Surendranauth Hiraman
It looks like this was due to another executor on a different node closing the connection on its side. I found the entries below in the remote side's logs. Can anyone comment on why one ConnectionManager would close its connection to another node and what could be tuned to avoid this? It did not h

MLLib : Decision Tree not getting built for 5 or more levels(maxDepth=5) and the one built for 3 levels is performing poorly

2014-06-11 Thread SURAJ SHETH
Hi, I have been trying to build a Decision Tree using a dataset that I have. Dataset Decription : Train data size = 689,763 Test data size = 8,387,813 Each row in the dataset has 321 numerical features out of which 139th value is the ground truth. The number of positives in the dataset is low.

Re: Run ScalaTest inside Intellij IDEA

2014-06-11 Thread Qiuzhuang Lian
I run into this issue too today via 'mvn install -DskipTests' command today, then I issue a mvn clean and rebuild and it works. Thanks, Qiuzhuang On Wed, Jun 11, 2014 at 9:51 PM, Yijie Shen wrote: > Thx Qiuzhuang, the problems disappeared after I add assembly jar at the > head of list dependen

Re: Run ScalaTest inside Intellij IDEA

2014-06-11 Thread Yijie Shen
Thx Qiuzhuang, the problems disappeared after I add assembly jar at the head of list dependencies in *.iml, but while running test in Spark SQL(SQLQuerySuite in sql-core), another two error occurs: Error 1: Error:scalac: while compiling: /Users/yijie/code/apache.spark.master/sql/core/src

Re: Constraint Solver for Spark

2014-06-11 Thread Debasish Das
I got it...ALS formulation is solving the matrix completion problem To convert the problem to matrix factorization or take user feedback (missing entries means the user hate the site ?), we should put 0 to the missing entries (or may be -1)...in that case we have to use computeYtY and accumula

Re: Constraint Solver for Spark

2014-06-11 Thread Xiangrui Meng
For explicit feedback, ALS uses only observed ratings for computation. So XtXs are not the same. -Xiangrui On Tue, Jun 10, 2014 at 8:58 PM, Debasish Das wrote: > Sorry last one went out by mistake: > > Is not for users (0 to numUsers), fullXtX is same ? In the ALS formulation > this is W^TW or H^