Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37718637
@mateiz OK, should be good to go now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37714253
Yeah sorry, I didn't mean leave out max and min from StatCounter, I just
meant that the RDD.max() and RDD.min() methods should directly call reduce. If
you're calling those
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/144#discussion_r10633009
--- Diff: python/pyspark/rdd.py ---
@@ -534,7 +534,26 @@ def func(iterator):
return reduce(op, vals, zeroValue)
# TODO: aggregate
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/144#discussion_r10633006
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -958,6 +958,10 @@ abstract class RDD[T: ClassTag](
*/
def takeOrdered(num: Int
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/144#discussion_r10633001
--- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala ---
@@ -171,6 +171,8 @@ class PartitioningSuite extends FunSuite with
SharedSparkContext
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/144#discussion_r10633002
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
---
@@ -477,6 +477,16 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]]
extends S
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37712645
Ahh I understood the downside, that would be just for numbers then. makes
sense. May be we can have both ?
---
If your project is set up for it, you can reply to this
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37712447
Hey Matei,
For a large dataset someone might wanna do it once, like with stat counter
all of the numbers are calculated in one go.
---
If your project is set
Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37694293
Matei,
I updated the branch to do just that. Thanks for the review!
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37681660
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proj
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37679298
It might be better to implement `RDD.min` and `RDD.max` with `reduce`
directly instead of building a whole StatCounter for them. Also, can you add
these to the Java/Scala S
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/144#discussion_r10620079
--- Diff: python/pyspark/rdd.py ---
@@ -24,6 +24,7 @@
import sys
import shlex
import traceback
+from bisect import bisect_right
--- End di
GitHub user dwmclary opened a pull request:
https://github.com/apache/spark/pull/144
Spark 1246 add min max to stat counter
Here's the addition of min and max to statscounter.py and min and max
methods to rdd.py.
You can merge this pull request into a Git repository by running:
13 matches
Mail list logo