[GitHub] spark pull request: [java8API] SPARK-964 Investigate the potential...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17#discussion_r10227199 --- Diff: extras/java8-tests/README.md --- @@ -0,0 +1,15 @@ +# Java 8 test suites. + +These tests are bundled with spark and run if you have java 8 installed as system default or your `JAVA_HOME` points to a java 8(or higher) installation. `JAVA_HOME` is preferred to system default jdk installation. Since these tests require jdk 8 or higher, they defined to be optional to run in the build system. --- End diff -- they defined -> they are defined --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [java8API] SPARK-964 Investigate the potential...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/17#issuecomment-36594377 There is one thing to note, `-java-home` currently has a note, we can actually fix that. In the sense by moving check after process args. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/71 [WIP] SPARK-964 Fix for -java-home note. I just did a manual testing of this. with -java-home "jdk", setting just JAVA_HOME and both. Hope it covers all cases. It is work in progress since it is not ready to merge. Once #17 is merged this can be rebased. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 java8-lambdas5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/71.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #71 commit c33dc2c1b2d2fc06a69ecfd136576af85bb56226 Author: Prashant Sharma Date: 2014-02-24T10:20:26Z SPARK-964, Java 8 API Support. This patch adds a few methods to java API such that it is possible to pass lambdas instead of Anonymous classes and also java 6/7 api users can use the same apis by passing anonymous classes. To achieve this a few older API methods are removed and replaced with their ToPair/ToDouble versions. 1) all anonymous classes extending scala Function is replaced by interfaces. 2) Adds optional to run java 8 tests Please refer to PR comments for more details. commit 4ab87d3551f0b74e4fb6da611a5baea7aba93c6c Author: Prashant Sharma Date: 2014-02-25T05:32:15Z Review feedback on the pr commit 31d4cd63c8f2965a4f864459e5dcf3ab029ec2eb Author: Prashant Sharma Date: 2014-02-25T11:01:53Z Maven build to support -Pjava8-tests flag. commit 35d8d79e4f1ccb6491b81fd670043e2b6c60a815 Author: Prashant Sharma Date: 2014-02-26T10:04:01Z Specified java 8 building in the docs commit 26eb3f60ae421c07522952c1334ad9a16e3bd822 Author: Prashant Sharma Date: 2014-03-03T08:24:24Z Patrick's comments on PR. Added a "Upgrading from pre-1.0 versions of Spark" section in the Java programming guide. Added brief README file in the java8-tests directory that explains what it is? Fixed "When running the tests in Maven, all of the output is sent console, and not the test summaries as they were running." Fixed " hard to get SBT to use the correct Java version without setting Java 8 as my system default." added a warning to dev/run-tests script if the Java version is less than 1.8 Moved java8-tests folder into a new folder called /extras commit 80a13e8b9a2d49a1de5dee263102ac180a9b7077 Author: Prashant Sharma Date: 2014-03-03T09:45:45Z Used fake class tag syntax commit 673f7ac9e8855e3be16e2e955d0c01d1b187073a Author: Prashant Sharma Date: 2014-03-03T10:24:21Z Added support for -java-home as well commit 85a954eefbb310dfa6566e64e1b1162e1aa6dea6 Author: Prashant Sharma Date: 2014-03-03T10:37:00Z Nit. import orderings. commit 95850e6e58b83b59e1f679c7b1cd8aaa7df854dc Author: Patrick Wendell Date: 2014-03-03T22:46:14Z Some doc improvements and build changes to the Java 8 patch. commit 48fbcb7757bb1830d0e25b4125d314e9e2d5338b Author: Prashant Sharma Date: 2014-03-04T06:05:28Z Move java home check after process args. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/71#issuecomment-36597306 @pwendell Hey Patrick, It might be good to have jenkins not test the PRs which start with [WIP] or WIP. Or something like that ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/71#issuecomment-36598821 It does not cover the case if JAVA_HOME points to invalid directory, it will simply takes alternate path instead of failing nicely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1164 Deprecated reduceByKeyToDriver as i...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/72 SPARK-1164 Deprecated reduceByKeyToDriver as it is an alias for reduceByKeyLocally You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1164/deprecate-reducebykeytodriver Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/72.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #72 commit ee521cd1809d36216e4392880163e75e5aed5150 Author: Prashant Sharma Date: 2014-03-04T12:48:13Z SPARK-1164 Deprecated reduceByKeyToDriver as it is an alias for reduceByKeyLocally --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1109 wrong API docs for pyspark map func...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/73 SPARK-1109 wrong API docs for pyspark map function You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1109/wrong-API-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/73.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #73 commit 1a55b5816505dea85d320e3e182b82ad83869ecd Author: Prashant Sharma Date: 2014-03-04T13:02:16Z SPARK-1109 wrong API docs for pyspark map function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-964 Fix for -java-home note.
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/71 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/80 Spark 1165 rdd.intersection in python and java You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1165/RDD.intersection Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/80.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #80 commit d6effee4ee967f15210d0d57526beab4e3f9c8e2 Author: Prashant Sharma Date: 2014-03-05T08:00:27Z SPARK-1165 Implemented RDD.intersection in python. commit d0c71f3a24ea1cec336c9bb4820a6f3fb317953a Author: Prashant Sharma Date: 2014-03-05T08:40:01Z SPARK-1165 RDD.intersection in java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/80#issuecomment-36729592 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1162 Added top in python.
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/93 SPARK-1162 Added top in python. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1162/pyspark-top-takeOrdered Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/93.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #93 commit 4603399c4e7a8c6ed19d916d3a55225b4bb31af8 Author: Prashant Sharma Date: 2014-03-06T12:12:16Z Added top in python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1162 Added top in python.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36887864 @mateiz I am learning python while doing this, so not sure if it is going to make sense. + I have not figured how to implement takeOrdered. Will it be fine if I write our own maxHeap implementation or there is a better way(I am not aware of). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1162 Added top in python.
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/93#discussion_r10370555 --- Diff: python/pyspark/rdd.py --- @@ -628,6 +669,26 @@ def mergeMaps(m1, m2): m1[k] += v return m1 return self.mapPartitions(countPartition).reduce(mergeMaps) + +def top(self, num): +""" +Get the top N elements from a RDD. + +Note: It returns the list sorted in ascending order. +""" +def f(iterator): +q = BoundedPriorityQueue(num) +for k in iterator: +q.put(k) +return q + +def f2(a, b): +a.put(b) +return a +q = BoundedPriorityQueue(num) +# I can not come up with a way to avoid this step. +t = self.mapPartitions(f).collect() +return [k for k in iter(reduce(f2, t, q))] --- End diff -- Thanks that is definitely nicer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1162 Added top in python.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-36971911 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/97 Spark 1162 Implemented takeOrdered in pyspark. Since python does not have a library for max heap and usual tricks like inverting values etc.. does not work for all cases. So best thing I could think of is modify heapq itself. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1162/pyspark-top-takeOrdered2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/97.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #97 commit 3e7a57506ce139af804f89f16a3404624d784f7e Author: Prashant Sharma Date: 2014-03-06T12:12:16Z Added top in python. commit 3bedad7dfe3b18ee9f64cc376627d3d7489a0e9f Author: Prashant Sharma Date: 2014-03-07T10:35:31Z Added takeOrdered --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/97#discussion_r10407050 --- Diff: python/pyspark/maxheapq.py --- @@ -0,0 +1,115 @@ +# -*- coding: latin-1 -*- + +"""Heap queue algorithm (a.k.a. priority queue). + +# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger --- End diff -- Hm.. I have not gone through the license, its copied from python 2.7.6 source. [PSF License](http://docs.python.org/2/license.html) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37086086 Hey Matei, PSF License is included now, I was not sure if the entire history of license should be included. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update junitxml plugin to the latest version t...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/104#issuecomment-37096375 Very cool, finally we have this ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1168, Added foldByKey to pyspark.
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/115 SPARK-1168, Added foldByKey to pyspark. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1168/pyspark-foldByKey Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #115 commit e0dce4bed79f6ba26c25f313110ddb504b367b97 Author: Prashant Sharma Date: 2014-03-10T07:24:21Z SPARK-1168, Added foldByKey to pyspark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37161692 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1170 Added histogram(buckets) to pyspark...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/121 SPARK-1170 Added histogram(buckets) to pyspark and not histogram(noOfBuckets). That can be a part 2 of this PR. If we can have min and max functions on a RDD of double, that would be good. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1170/pyspark-histogram Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/121.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #121 commit 6db3a5b63d78550da3c41af7aafe6fa7dd90540c Author: Prashant Sharma Date: 2014-03-11T07:51:22Z SPARK-1170 Added histogram(buckets) to pyspark and not histogram(noOfBuckets). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1162 Added top in python.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-37272574 Hey Matei, Thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1096, a space after comment style checke...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/124 SPARK-1096, a space after comment style checker. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1096/scalastyle-comment-check Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #124 commit e16693cdf05076a8cea66f73cb1f2b4daaec50fa Author: Prashant Sharma Date: 2014-03-11T11:34:30Z SPARK-1096, a space after comment style checker. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/125 SPARK-1144 Added license and RAT to check licenses. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 rat-integration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/125.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #125 commit 15ab1158456992da119254eed12d8d1d18da9e2d Author: Prashant Sharma Date: 2014-03-04T05:48:48Z SPARK-1144 Added license and RAT to check licenses. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37379097 Hi Matei, Does this mean that when key is None, then it would do the same thing as top ? In case NO, then we would need a maxheap since min heap will only keep N largest entries and not N smallest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/125#issuecomment-37379720 We did not want to have this in our builds (maven or SBT) and running this so trivial that it might not even need that. I am not sure about the dynamics of a release, but hopefully this can be a release only step. In case we agree then we can have it in release script. There is no need to have the jar in the source (sorry about that.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/125#issuecomment-37379933 @pwendell thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10514209 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -49,9 +49,28 @@ class ShuffleDependency[K, V]( @transient rdd: RDD[_ <: Product2[K, V]], val partitioner: Partitioner, val serializerClass: String = null) - extends Dependency(rdd.asInstanceOf[RDD[Product2[K, V]]]) { + extends Dependency(rdd.asInstanceOf[RDD[Product2[K, V]]]) with Logging { val shuffleId: Int = rdd.context.newShuffleId() + + override def finalize() { +try { + if (rdd != null) { +rdd.sparkContext.cleaner.cleanShuffle(shuffleId) + } +} catch { + case t: Throwable => +// Paranoia - If logError throws error as well, report to stderr. +try { + logError("Error in finalize", t) --- End diff -- @tdas Hey TD, A try catch on logging ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1096, a space after comment start style ...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552608 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages { case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave - // + // // Messages from slaves to the master. - // + // sealed trait ToBlockManagerMaster --- End diff -- It does, the space is important. Other option is we disable it here by wrapping in scalastyle:off and on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1096, a space after comment start style ...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552833 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages { case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave - // + // // Messages from slaves to the master. - // + // sealed trait ToBlockManagerMaster --- End diff -- Well even if you use something else, the space has to be there. May I humbly suggest that we live with it like this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1096, a space after comment start style ...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10552982 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages { case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave - // + // // Messages from slaves to the master. - // + // sealed trait ToBlockManagerMaster --- End diff -- Modifying the rule will have a turnaround time of atleast a few days(Send them a PR then they will publish a snapshot etc..). Will do that. In the mean time what do you suggest ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37505304 PriorityQueue in a way is just a wrapper over heapq and allows for blocking for put and get(AFAIU). We would need maxheapq to retain the top N smallest elements. One other thing we can do instead of copying heapq is that we write one of our own in a nice extensible way, which allows one to plugin a comparator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1096, a space after comment start style ...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/124#discussion_r10555984 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala --- @@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages { case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave - // + // // Messages from slaves to the master. - // + // sealed trait ToBlockManagerMaster --- End diff -- So I have sent scalastyle people a PR with a fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/125#discussion_r10562618 --- Diff: project/plugins.sbt --- @@ -10,6 +10,8 @@ addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.2.0") addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.5.1") +libraryDependencies += "org.apache.rat" % "apache-rat" % "0.10" --- End diff -- accidental commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-37529958 Thanks for the fix. Only for the record this happens only when MASTER="local" or local[2]. Looks good. It might be good to add above test case in ReplSuite though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/15#issuecomment-37530227 Mind changing the PR title to add Jira ID? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 615 map partitions with index callable f...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-37531010 It might be good to add this test in java8 API suite ? Not sure if its 100% necessary, but there exist one for all other APIs (I hope!!). Thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37618967 Hey Matei, Got rid of copying `heapq.py` and all the license stuff. But resorted to using internal API of heapq though. It should be simpler. I just checked, heapq hasn't changed much from python 2.7 to python 3.4 (current dev.) There is a patch in python pending for 3.4 or maybe 3.5, which will give us a nice Heap class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37625562 They were added in 2.7.4 onwards though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/140 SPARK-1246, added min max API to Double RDDs in java and scala APIs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-1246/min-max Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/140.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #140 commit 0b20bc758a41bc483be4e258f7031cc02969c206 Author: Prashant Sharma Date: 2014-03-14T12:24:18Z SPARK-1246, added min max API to Double RDDs in java and scala APIs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1170-pyspark-histogram: added histogram ...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/122#issuecomment-37642205 Hi Daniel, Thanks for the patch, It would be good to separate out the implementation of min max into a different PR and provide Rdd.min and RDD.max functions too. Also assign it a JIRA SPARK-1246. Thought of asking you since you have it already in this PR. Part of it is done in #140 for java and scala. Prashant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1246 add min max to stat counter
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712447 Hey Matei, For a large dataset someone might wanna do it once, like with stat counter all of the numbers are calculated in one go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/140#discussion_r10632860 --- Diff: project/build.properties --- @@ -14,4 +14,4 @@ # See the License for the specific language governing permissions and # limitations under the License. # -sbt.version=0.13.1 +sbt.version=0.13.2-M1 --- End diff -- It was accidental, (sorry about that.). I use this version of sbt locally since its really fast with incremental builds. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Spark 1246 add min max to stat counter
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37712645 Ahh I understood the downside, that would be just for numbers then. makes sense. May be we can have both ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/140#discussion_r10632880 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -86,14 +92,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable { * If the elements in RDD do not vary (max == min) always returns a single bucket. */ def histogram(bucketCount: Int): Pair[Array[Double], Array[Long]] = { -// Compute the minimum and the maxium -val (max: Double, min: Double) = self.mapPartitions { items => - Iterator(items.foldRight(Double.NegativeInfinity, -Double.PositiveInfinity)((e: Double, x: Pair[Double, Double]) => -(x._1.max(e), x._2.min(e -}.reduce { (maxmin1, maxmin2) => - (maxmin1._1.max(maxmin2._1), maxmin1._2.min(maxmin2._2)) -} +// Compute the minimum and the maximum from stats once +val _stats = stats() +val (max: Double, min: Double) = (_stats.max, _stats.min) --- End diff -- Okay, will change that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/140 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1121 Only add avro if the build is for H...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-36217654 Rebased !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [HOTFIX] Patching maven build after #6 (SPARK-...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/37#issuecomment-36335799 Hey Patrick, Forgive me for this, this is the second time I have messed up maven build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---