[GitHub] spark pull request: [java8API] SPARK-964 Investigate the potential...

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/17#discussion_r10227199
  
--- Diff: extras/java8-tests/README.md ---
@@ -0,0 +1,15 @@
+# Java 8 test suites.
+
+These tests are bundled with spark and run if you have java 8 installed as 
system default or your `JAVA_HOME` points to a java 8(or higher) installation. 
`JAVA_HOME` is preferred to system default jdk installation. Since these tests 
require jdk 8 or higher, they defined to be optional to run in the build system.
--- End diff --

they defined -> they are defined


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [java8API] SPARK-964 Investigate the potential...

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/17#issuecomment-36594377
  
There is one thing to note, `-java-home` currently has a note, we can 
actually fix that. In the sense by moving check after process args. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/71

[WIP] SPARK-964 Fix for -java-home note.

I just did a manual testing of this.

with -java-home "jdk", setting just JAVA_HOME and both. Hope it covers all 
cases. 

It is work in progress since it is not ready to merge. Once #17 is merged 
this can be rebased. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 java8-lambdas5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/71.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #71


commit c33dc2c1b2d2fc06a69ecfd136576af85bb56226
Author: Prashant Sharma 
Date:   2014-02-24T10:20:26Z

SPARK-964, Java 8 API Support.

This patch adds a few methods to java API such that it is possible to pass 
lambdas instead
of Anonymous classes and also java 6/7 api users can use the same apis by 
passing anonymous classes.
To achieve this a few older API methods are removed and replaced with their 
ToPair/ToDouble versions.

1) all anonymous classes extending scala Function is replaced by interfaces.

2) Adds optional to run java 8 tests

Please refer to PR comments for more details.

commit 4ab87d3551f0b74e4fb6da611a5baea7aba93c6c
Author: Prashant Sharma 
Date:   2014-02-25T05:32:15Z

Review feedback on the pr

commit 31d4cd63c8f2965a4f864459e5dcf3ab029ec2eb
Author: Prashant Sharma 
Date:   2014-02-25T11:01:53Z

Maven build to support -Pjava8-tests flag.

commit 35d8d79e4f1ccb6491b81fd670043e2b6c60a815
Author: Prashant Sharma 
Date:   2014-02-26T10:04:01Z

Specified java 8 building in the docs

commit 26eb3f60ae421c07522952c1334ad9a16e3bd822
Author: Prashant Sharma 
Date:   2014-03-03T08:24:24Z

Patrick's comments on PR.

Added a "Upgrading from pre-1.0 versions of Spark" section in the Java 
programming guide.

Added brief README file in the java8-tests directory that explains what it 
is?

Fixed "When running the tests in Maven, all of the output is sent console, 
and not the test summaries as they were running."

Fixed " hard to get SBT to use the correct Java version without setting 
Java 8 as my system default."

added a warning to dev/run-tests script if the Java version is less than 1.8

Moved java8-tests folder into a new folder called /extras

commit 80a13e8b9a2d49a1de5dee263102ac180a9b7077
Author: Prashant Sharma 
Date:   2014-03-03T09:45:45Z

Used fake class tag syntax

commit 673f7ac9e8855e3be16e2e955d0c01d1b187073a
Author: Prashant Sharma 
Date:   2014-03-03T10:24:21Z

Added support for -java-home as well

commit 85a954eefbb310dfa6566e64e1b1162e1aa6dea6
Author: Prashant Sharma 
Date:   2014-03-03T10:37:00Z

Nit. import orderings.

commit 95850e6e58b83b59e1f679c7b1cd8aaa7df854dc
Author: Patrick Wendell 
Date:   2014-03-03T22:46:14Z

Some doc improvements and build changes to the Java 8 patch.

commit 48fbcb7757bb1830d0e25b4125d314e9e2d5338b
Author: Prashant Sharma 
Date:   2014-03-04T06:05:28Z

Move java home check after process args.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/71#issuecomment-36597306
  
@pwendell Hey Patrick, It might be good to have jenkins not test the PRs 
which start with [WIP] or WIP. Or something like that ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-964 Fix for -java-home note.

2014-03-03 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/71#issuecomment-36598821
  
It does not cover the case if JAVA_HOME points to invalid directory, it 
will simply takes alternate path instead of failing nicely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1164 Deprecated reduceByKeyToDriver as i...

2014-03-04 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/72

SPARK-1164 Deprecated reduceByKeyToDriver as it is an alias for 
reduceByKeyLocally



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 
SPARK-1164/deprecate-reducebykeytodriver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/72.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #72


commit ee521cd1809d36216e4392880163e75e5aed5150
Author: Prashant Sharma 
Date:   2014-03-04T12:48:13Z

SPARK-1164 Deprecated reduceByKeyToDriver as it is an alias for 
reduceByKeyLocally




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1109 wrong API docs for pyspark map func...

2014-03-04 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/73

SPARK-1109 wrong API docs for pyspark map function



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 SPARK-1109/wrong-API-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/73.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #73


commit 1a55b5816505dea85d320e3e182b82ad83869ecd
Author: Prashant Sharma 
Date:   2014-03-04T13:02:16Z

SPARK-1109 wrong API docs for pyspark map function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-964 Fix for -java-home note.

2014-03-04 Thread ScrapCodes
Github user ScrapCodes closed the pull request at:

https://github.com/apache/spark/pull/71


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java

2014-03-05 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/80

Spark 1165 rdd.intersection in python and java



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 SPARK-1165/RDD.intersection

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/80.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #80


commit d6effee4ee967f15210d0d57526beab4e3f9c8e2
Author: Prashant Sharma 
Date:   2014-03-05T08:00:27Z

SPARK-1165 Implemented RDD.intersection in python.

commit d0c71f3a24ea1cec336c9bb4820a6f3fb317953a
Author: Prashant Sharma 
Date:   2014-03-05T08:40:01Z

SPARK-1165 RDD.intersection in java




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java

2014-03-05 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/80#issuecomment-36729592
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/93

SPARK-1162 Added top in python.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 
SPARK-1162/pyspark-top-takeOrdered

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/93.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #93


commit 4603399c4e7a8c6ed19d916d3a55225b4bb31af8
Author: Prashant Sharma 
Date:   2014-03-06T12:12:16Z

Added top in python.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/93#issuecomment-36887864
  
@mateiz I am learning python while doing this, so not sure if it is going 
to make sense. 

+ I have not figured how to implement takeOrdered. Will it be fine if I 
write our own maxHeap implementation or there is a better way(I am not aware 
of).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/93#discussion_r10370555
  
--- Diff: python/pyspark/rdd.py ---
@@ -628,6 +669,26 @@ def mergeMaps(m1, m2):
 m1[k] += v
 return m1
 return self.mapPartitions(countPartition).reduce(mergeMaps)
+
+def top(self, num):
+"""
+Get the top N elements from a RDD.
+
+Note: It returns the list sorted in ascending order.
+"""
+def f(iterator):
+q = BoundedPriorityQueue(num)
+for k in iterator:
+q.put(k)
+return q
+
+def f2(a, b):
+a.put(b)
+return a
+q = BoundedPriorityQueue(num)
+# I can not come up with a way to avoid this step. 
+t = self.mapPartitions(f).collect()
+return [k for k in iter(reduce(f2, t, q))]
--- End diff --

Thanks that is definitely nicer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-06 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/93#issuecomment-36971911
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/97

Spark 1162 Implemented takeOrdered in pyspark.

Since python does not have a library for max heap and usual tricks like 
inverting values etc.. does not work for all cases. So best thing I could think 
of is modify heapq itself.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 
SPARK-1162/pyspark-top-takeOrdered2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/97.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #97


commit 3e7a57506ce139af804f89f16a3404624d784f7e
Author: Prashant Sharma 
Date:   2014-03-06T12:12:16Z

Added top in python.

commit 3bedad7dfe3b18ee9f64cc376627d3d7489a0e9f
Author: Prashant Sharma 
Date:   2014-03-07T10:35:31Z

Added takeOrdered




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/97#discussion_r10407050
  
--- Diff: python/pyspark/maxheapq.py ---
@@ -0,0 +1,115 @@
+# -*- coding: latin-1 -*-
+
+"""Heap queue algorithm (a.k.a. priority queue).
+
+# Original code by Kevin O'Connor, augmented by Tim Peters and Raymond 
Hettinger
--- End diff --

Hm.. I have not gone through the license, its copied from python 2.7.6 
source. [PSF License](http://docs.python.org/2/license.html)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/97#issuecomment-37086086
  
Hey Matei, 

PSF License is included now, I was not sure if the entire history of 
license should be included. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update junitxml plugin to the latest version t...

2014-03-08 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/104#issuecomment-37096375
  
Very cool, finally we have this ! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1168, Added foldByKey to pyspark.

2014-03-10 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/115

SPARK-1168, Added foldByKey to pyspark.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 
SPARK-1168/pyspark-foldByKey

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/115.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #115


commit e0dce4bed79f6ba26c25f313110ddb504b367b97
Author: Prashant Sharma 
Date:   2014-03-10T07:24:21Z

SPARK-1168, Added foldByKey to pyspark.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-10 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/97#issuecomment-37161692
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1170 Added histogram(buckets) to pyspark...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/121

SPARK-1170 Added histogram(buckets) to pyspark and not 
histogram(noOfBuckets).

That can be a part 2 of this PR. If we can have min and max functions on a 
RDD of double, that would be good. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 
SPARK-1170/pyspark-histogram

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #121


commit 6db3a5b63d78550da3c41af7aafe6fa7dd90540c
Author: Prashant Sharma 
Date:   2014-03-11T07:51:22Z

SPARK-1170 Added histogram(buckets) to pyspark and not 
histogram(noOfBuckets).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/93#issuecomment-37272574
  
Hey Matei, Thanks ! 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1096, a space after comment style checke...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/124

SPARK-1096, a space after comment style checker.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 
SPARK-1096/scalastyle-comment-check

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/124.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #124


commit e16693cdf05076a8cea66f73cb1f2b4daaec50fa
Author: Prashant Sharma 
Date:   2014-03-11T11:34:30Z

SPARK-1096, a space after comment style checker.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-11 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/125

SPARK-1144 Added license and RAT to check licenses.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 rat-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #125


commit 15ab1158456992da119254eed12d8d1d18da9e2d
Author: Prashant Sharma 
Date:   2014-03-04T05:48:48Z

SPARK-1144 Added license and RAT to check licenses.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/97#issuecomment-37379097
  
Hi Matei,

Does this mean that when key is None, then it would do the same thing as 
top ? In case NO, then we would need a maxheap since min heap will only keep N 
largest entries and not N smallest. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-37379720
  
We did not want to have this in our builds (maven or SBT) and running this 
so trivial that it might not even need that. I am not sure about the dynamics 
of a release, but hopefully this can be a release only step. In case we agree 
then we can have it in release script. There is no need to have the jar in the 
source (sorry about that.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-11 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/125#issuecomment-37379933
  
@pwendell thoughts ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/126#discussion_r10514209
  
--- Diff: core/src/main/scala/org/apache/spark/Dependency.scala ---
@@ -49,9 +49,28 @@ class ShuffleDependency[K, V](
 @transient rdd: RDD[_ <: Product2[K, V]],
 val partitioner: Partitioner,
 val serializerClass: String = null)
-  extends Dependency(rdd.asInstanceOf[RDD[Product2[K, V]]]) {
+  extends Dependency(rdd.asInstanceOf[RDD[Product2[K, V]]]) with Logging {
 
   val shuffleId: Int = rdd.context.newShuffleId()
+
+  override def finalize() {
+try {
+  if (rdd != null) {
+rdd.sparkContext.cleaner.cleanShuffle(shuffleId)
+  }
+} catch {
+  case t: Throwable =>
+// Paranoia - If logError throws error as well, report to stderr.
+try {
+  logError("Error in finalize", t)
--- End diff --

@tdas Hey TD, A try catch on logging ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/124#discussion_r10552608
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages {
   case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave
 
 
-  
//
+  // 

   // Messages from slaves to the master.
-  
//
+  // 

   sealed trait ToBlockManagerMaster
--- End diff --

It does, the space is important. Other option is we disable it here by 
wrapping in scalastyle:off and on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/124#discussion_r10552833
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages {
   case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave
 
 
-  
//
+  // 

   // Messages from slaves to the master.
-  
//
+  // 

   sealed trait ToBlockManagerMaster
--- End diff --

Well even if you use something else, the space has to be there. May I 
humbly suggest that we live with it like this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-12 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/124#discussion_r10552982
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages {
   case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave
 
 
-  
//
+  // 

   // Messages from slaves to the master.
-  
//
+  // 

   sealed trait ToBlockManagerMaster
--- End diff --

Modifying the rule will have a turnaround time of atleast a few days(Send 
them a PR then they will publish a snapshot etc..). Will do that. In the mean 
time what do you suggest ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/97#issuecomment-37505304
  
PriorityQueue in a way is just a wrapper over heapq and allows for blocking 
for put and get(AFAIU). We would need maxheapq to retain the top N smallest 
elements. One other thing we can do instead of copying heapq is that we write 
one of our own in a nice extensible way, which allows one to plugin a 
comparator. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1096, a space after comment start style ...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/124#discussion_r10555984
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages {
   case class RemoveRdd(rddId: Int) extends ToBlockManagerSlave
 
 
-  
//
+  // 

   // Messages from slaves to the master.
-  
//
+  // 

   sealed trait ToBlockManagerMaster
--- End diff --

So I have sent scalastyle people a PR with a fix. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/125#discussion_r10562618
  
--- Diff: project/plugins.sbt ---
@@ -10,6 +10,8 @@ addSbtPlugin("com.typesafe.sbteclipse" % 
"sbteclipse-plugin" % "2.2.0")
 
 addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.5.1")
 
+libraryDependencies += "org.apache.rat" % "apache-rat" % "0.10"
--- End diff --

accidental commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-37529958
  
Thanks for the fix. Only for the record this happens only when 
MASTER="local" or local[2]. 

Looks good. It might be good to add above test case in ReplSuite though. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Prevent ContextClassLoader of Actor from becom...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/15#issuecomment-37530227
  
Mind changing the PR title to add Jira ID?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/16#issuecomment-37531010
  
It might be good to add this test in java8 API suite ? Not sure if its 100% 
necessary, but there exist one for all other APIs (I hope!!). Thoughts ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-13 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/97#issuecomment-37618967
  

Hey Matei,

Got rid of copying `heapq.py` and all the license stuff. But resorted to 
using internal API of heapq though. It should be simpler. 

I just checked, heapq hasn't changed much from python 2.7 to python 3.4 
(current dev.) There is a patch in python pending for 3.4 or maybe 3.5, which 
will give us a nice Heap class.  



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/97#issuecomment-37625562
  
They were added in 2.7.4 onwards though. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/140

SPARK-1246, added min max API to Double RDDs in java and scala APIs.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 SPARK-1246/min-max

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/140.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #140


commit 0b20bc758a41bc483be4e258f7031cc02969c206
Author: Prashant Sharma 
Date:   2014-03-14T12:24:18Z

SPARK-1246, added min max API to Double RDDs in java and scala APIs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1170-pyspark-histogram: added histogram ...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/122#issuecomment-37642205
  
Hi Daniel,

Thanks for the patch,

It would be good to separate out the implementation of min max into a 
different PR and provide Rdd.min and RDD.max functions too. Also assign it a 
JIRA SPARK-1246. Thought of asking you since you have it already in this PR. 
Part of it is done in #140 for java and scala. 

Prashant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/144#issuecomment-37712447
  
Hey Matei, 
For a large dataset someone might wanna do it once, like with stat counter 
all of the numbers are calculated in one go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/140#discussion_r10632860
  
--- Diff: project/build.properties ---
@@ -14,4 +14,4 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
-sbt.version=0.13.1
+sbt.version=0.13.2-M1
--- End diff --

It was accidental, (sorry about that.). I use this version of sbt locally 
since its really fast with incremental builds. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/144#issuecomment-37712645
  
Ahh I understood the downside, that would be just for numbers then. makes 
sense. May be we can have both ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/140#discussion_r10632880
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -86,14 +92,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
* If the elements in RDD do not vary (max == min) always returns a 
single bucket.
*/
   def histogram(bucketCount: Int): Pair[Array[Double], Array[Long]] = {
-// Compute the minimum and the maxium
-val (max: Double, min: Double) = self.mapPartitions { items =>
-  Iterator(items.foldRight(Double.NegativeInfinity,
-Double.PositiveInfinity)((e: Double, x: Pair[Double, Double]) =>
-(x._1.max(e), x._2.min(e
-}.reduce { (maxmin1, maxmin2) =>
-  (maxmin1._1.max(maxmin2._1), maxmin1._2.min(maxmin2._2))
-}
+// Compute the minimum and the maximum from stats once
+val _stats = stats()
+val (max: Double, min: Double) = (_stats.max, _stats.min)
--- End diff --

Okay, will change that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-15 Thread ScrapCodes
Github user ScrapCodes closed the pull request at:

https://github.com/apache/spark/pull/140


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1121 Only add avro if the build is for H...

2014-02-26 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/6#issuecomment-36217654
  
Rebased !!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [HOTFIX] Patching maven build after #6 (SPARK-...

2014-02-28 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/37#issuecomment-36335799
  
Hey Patrick, 

Forgive me for this, this is the second time I have messed up maven build. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---