Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37734634
It is hard to say what threshold to use. I couldn't think of a use case
that requires a large window size, but I cannot say there is none.
Another pos
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37732906
@pwendell , the limit case is not a practical example. In that case, we
need re-partition for most operations to be efficient. Also, this is really for
small window sizes
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/136#discussion_r10635646
--- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/136#discussion_r10635644
--- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/136#discussion_r10635557
--- Diff: core/src/main/scala/org/apache/spark/rdd/SlidedRDD.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/135#issuecomment-37587059
LGTM. Waiting for Jenkins.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/135#discussion_r10583451
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -457,6 +457,10 @@ class RDDSuite extends FunSuite with
SharedSparkContext
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/135#discussion_r10580942
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -457,6 +457,10 @@ class RDDSuite extends FunSuite with
SharedSparkContext
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/135#discussion_r10580867
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -310,6 +310,8 @@ abstract class RDD[T: ClassTag](
* Return a sampled subset of this
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/136
[SPARK-1241] Add sliding to RDD
Sliding is useful for operations like creating n-grams, calculating total
variation, numerical integration, etc. This is similar to
https://github.com/apache
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/135#discussion_r10580151
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -310,6 +310,9 @@ abstract class RDD[T: ClassTag](
* Return a sampled subset of this
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/131#issuecomment-37500737
@srowen , level 3 BLAS would certainly help improve the performance. DSYRK
is for computing C <- A^T A + C, but I don't know whether we have it in jblas.
Howe
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/131#issuecomment-37490541
@srowen , this continues the work from
https://github.com/apache/incubator-spark/pull/629 . Would you please help
review the changes? Thanks!
---
If your project is set
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/131
[SPARK-1237, 1238] Improve the computation of YtY for implicit ALS
Computing YtY can be implemented using BLAS's DSPR operations instead of
generating y_i y_i^T and then combining them. The l
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/117#issuecomment-37372615
I added fast distance computation and updated the implementation of KMeans.
Squared norms of the points are pre-computed and cached in order to compute
distance faster for
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/117#issuecomment-37372450
@fommil We will mention native libraries in the documentation once this PR
gets merged.
---
If your project is set up for it, you can reply to this email and have your
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/117#issuecomment-37320613
@fommil I didn't realize the bottom of
https://github.com/fommil/netlib-java/blob/master/LICENSE.txt is 3-clause BSD.
It is Apache authorized, so I don't need
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/117#issuecomment-37252898
Okay, sbt was able to fetch breeze_2.10-0.7-SNAPSHOT from Sonatype, so
tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/117
[MLLIB-18] [WIP] Adding sparse data support and update KMeans
Continue our discussions from
https://github.com/apache/incubator-spark/pull/575
This PR is WIP because it depends on a SNAPSHOT
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/88#discussion_r10450573
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/SVD.scala ---
@@ -142,17 +155,138 @@ object SVD {
val vsirdd = sc.makeRDD(Array.tabulate
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/79#issuecomment-37224328
@manishamde Thanks for updating the code style and adding more docs! I made
a first pass over the code.
For the code style, we do not have a good style checker for
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10444589
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10444261
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10444025
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443976
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443898
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443791
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443435
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443452
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443049
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443413
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443354
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443072
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10443033
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10442815
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10442463
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/model/DecisionTreeModel.scala
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10442556
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10442083
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10442003
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10441994
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10441801
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10441324
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10441013
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10440961
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10440443
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10440340
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10440273
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10440077
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10440050
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10440022
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10439997
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10439850
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10439706
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10439393
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10439348
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,1055 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/18#issuecomment-37208033
MLI is not part of the Spark distribution. @pwendell Is it okay to use
MLI's jira? All changes look good to me.
---
If your project is set up for it, you can reply to
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/18#issuecomment-36944266
LGTM, except the extra empty line. Do you mind creating a Spark JIRA for
this PR?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/18#discussion_r10363910
--- Diff:
core/src/test/scala/org/apache/spark/util/random/RandomSamplerSuite.scala ---
@@ -48,6 +48,20 @@ class RandomSamplerSuite extends FunSuite with
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360640
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Impurity.scala ---
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360574
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTreeRunner.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360539
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360528
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360427
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360465
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360441
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360401
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360367
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/79#discussion_r10360358
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala ---
@@ -0,0 +1,915 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/79#issuecomment-36936178
@manishamde Do you mind updating the code style first to make it easy for
people who want to review the code? I will mark a few examples. We also need a
Spark JIRA ticket
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/18#discussion_r10243724
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/18#discussion_r10243710
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -17,12 +17,17 @@
package org.apache.spark.mllib.util
+import
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/40#discussion_r10178907
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/optimization/GradientDescentSuite.scala
---
@@ -104,4 +104,45 @@ class GradientDescentSuite extends
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/40#discussion_r10178847
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala
---
@@ -149,7 +149,13 @@ object GradientDescent extends Logging
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/18#discussion_r10174437
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -62,6 +67,20 @@ object MLUtils {
}
/**
+ * Return a k
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/18#discussion_r10174439
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala ---
@@ -62,6 +67,20 @@ object MLUtils {
}
/**
+ * Return a k
75 matches
Mail list logo