[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread dwmclary
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37718637 @mateiz OK, should be good to go now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37714253 Yeah sorry, I didn't mean leave out max and min from StatCounter, I just meant that the RDD.max() and RDD.min() methods should directly call reduce. If you're calling those

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633009 --- Diff: python/pyspark/rdd.py --- @@ -534,7 +534,26 @@ def func(iterator): return reduce(op, vals, zeroValue) # TODO: aggregate

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633006 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -958,6 +958,10 @@ abstract class RDD[T: ClassTag]( */ def takeOrdered(num: Int

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633001 --- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala --- @@ -171,6 +171,8 @@ class PartitioningSuite extends FunSuite with SharedSparkContext

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633002 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -477,6 +477,16 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends S

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712645 Ahh I understood the downside, that would be just for numbers then. makes sense. May be we can have both ? --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712447 Hey Matei, For a large dataset someone might wanna do it once, like with stat counter all of the numbers are calculated in one go. --- If your project is set

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread dwmclary
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37694293 Matei, I updated the branch to do just that. Thanks for the review! --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37681660 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37679298 It might be better to implement `RDD.min` and `RDD.max` with `reduce` directly instead of building a whole StatCounter for them. Also, can you add these to the Java/Scala S

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10620079 --- Diff: python/pyspark/rdd.py --- @@ -24,6 +24,7 @@ import sys import shlex import traceback +from bisect import bisect_right --- End di

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread dwmclary
GitHub user dwmclary opened a pull request: https://github.com/apache/spark/pull/144 Spark 1246 add min max to stat counter Here's the addition of min and max to statscounter.py and min and max methods to rdd.py. You can merge this pull request into a Git repository by running: