Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/17#discussion_r10227199
--- Diff: extras/java8-tests/README.md ---
@@ -0,0 +1,15 @@
+# Java 8 test suites.
+
+These tests are bundled with spark and run if you have java
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/17#issuecomment-36594377
There is one thing to note, `-java-home` currently has a note, we can
actually fix that. In the sense by moving check after process args.
---
If your project is set
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/71
[WIP] SPARK-964 Fix for -java-home note.
I just did a manual testing of this.
with -java-home "jdk", setting just JAVA_HOME and both. Hope it covers all
cases.
It
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/71#issuecomment-36597306
@pwendell Hey Patrick, It might be good to have jenkins not test the PRs
which start with [WIP] or WIP. Or something like that ?
---
If your project is set up for it
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/71#issuecomment-36598821
It does not cover the case if JAVA_HOME points to invalid directory, it
will simply takes alternate path instead of failing nicely.
---
If your project is set up for
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/72
SPARK-1164 Deprecated reduceByKeyToDriver as it is an alias for
reduceByKeyLocally
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/73
SPARK-1109 wrong API docs for pyspark map function
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes/spark-1 SPARK-1109/wrong-API
Github user ScrapCodes closed the pull request at:
https://github.com/apache/spark/pull/71
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/80
Spark 1165 rdd.intersection in python and java
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes/spark-1 SPARK-1165/RDD.intersection
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/80#issuecomment-36729592
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/93
SPARK-1162 Added top in python.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes/spark-1
SPARK-1162/pyspark-top-takeOrdered
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/93#issuecomment-36887864
@mateiz I am learning python while doing this, so not sure if it is going
to make sense.
+ I have not figured how to implement takeOrdered. Will it be fine if
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/93#discussion_r10370555
--- Diff: python/pyspark/rdd.py ---
@@ -628,6 +669,26 @@ def mergeMaps(m1, m2):
m1[k] += v
return m1
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/93#issuecomment-36971911
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/97
Spark 1162 Implemented takeOrdered in pyspark.
Since python does not have a library for max heap and usual tricks like
inverting values etc.. does not work for all cases. So best thing I could
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/97#discussion_r10407050
--- Diff: python/pyspark/maxheapq.py ---
@@ -0,0 +1,115 @@
+# -*- coding: latin-1 -*-
+
+"""Heap queue algorithm (a.k.a.
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/97#issuecomment-37086086
Hey Matei,
PSF License is included now, I was not sure if the entire history of
license should be included.
---
If your project is set up for it, you can
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/104#issuecomment-37096375
Very cool, finally we have this !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/115
SPARK-1168, Added foldByKey to pyspark.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes/spark-1
SPARK-1168/pyspark-foldByKey
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/97#issuecomment-37161692
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/121
SPARK-1170 Added histogram(buckets) to pyspark and not
histogram(noOfBuckets).
That can be a part 2 of this PR. If we can have min and max functions on a
RDD of double, that would be good.
You
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/93#issuecomment-37272574
Hey Matei, Thanks !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/124
SPARK-1096, a space after comment style checker.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes/spark-1
SPARK-1096/scalastyle
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/125
SPARK-1144 Added license and RAT to check licenses.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes/spark-1 rat-integration
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/97#issuecomment-37379097
Hi Matei,
Does this mean that when key is None, then it would do the same thing as
top ? In case NO, then we would need a maxheap since min heap will only keep
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/125#issuecomment-37379720
We did not want to have this in our builds (maven or SBT) and running this
so trivial that it might not even need that. I am not sure about the dynamics
of a release
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/125#issuecomment-37379933
@pwendell thoughts ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/126#discussion_r10514209
--- Diff: core/src/main/scala/org/apache/spark/Dependency.scala ---
@@ -49,9 +49,28 @@ class ShuffleDependency[K, V](
@transient rdd: RDD
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/124#discussion_r10552608
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/124#discussion_r10552833
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/124#discussion_r10552982
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/97#issuecomment-37505304
PriorityQueue in a way is just a wrapper over heapq and allows for blocking
for put and get(AFAIU). We would need maxheapq to retain the top N smallest
elements. One
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/124#discussion_r10555984
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala ---
@@ -35,9 +35,9 @@ private[storage] object BlockManagerMessages
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/125#discussion_r10562618
--- Diff: project/plugins.sbt ---
@@ -10,6 +10,8 @@ addSbtPlugin("com.typesafe.sbteclipse" %
"sbteclipse-plugin" % "2.2.0&
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/15#issuecomment-37529958
Thanks for the fix. Only for the record this happens only when
MASTER="local" or local[2].
Looks good. It might be good to add above test case in
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/15#issuecomment-37530227
Mind changing the PR title to add Jira ID?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/16#issuecomment-37531010
It might be good to add this test in java8 API suite ? Not sure if its 100%
necessary, but there exist one for all other APIs (I hope!!). Thoughts ?
---
If your
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/97#issuecomment-37618967
Hey Matei,
Got rid of copying `heapq.py` and all the license stuff. But resorted to
using internal API of heapq though. It should be simpler.
I
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/97#issuecomment-37625562
They were added in 2.7.4 onwards though.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user ScrapCodes opened a pull request:
https://github.com/apache/spark/pull/140
SPARK-1246, added min max API to Double RDDs in java and scala APIs.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ScrapCodes/spark-1 SPARK
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/122#issuecomment-37642205
Hi Daniel,
Thanks for the patch,
It would be good to separate out the implementation of min max into a
different PR and provide Rdd.min and RDD.max
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37712447
Hey Matei,
For a large dataset someone might wanna do it once, like with stat counter
all of the numbers are calculated in one go.
---
If your project is set
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/140#discussion_r10632860
--- Diff: project/build.properties ---
@@ -14,4 +14,4 @@
# See the License for the specific language governing permissions and
# limitations under
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/144#issuecomment-37712645
Ahh I understood the downside, that would be just for numbers then. makes
sense. May be we can have both ?
---
If your project is set up for it, you can reply to
Github user ScrapCodes commented on a diff in the pull request:
https://github.com/apache/spark/pull/140#discussion_r10632880
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala
---
@@ -86,14 +92,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends
Github user ScrapCodes closed the pull request at:
https://github.com/apache/spark/pull/140
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/6#issuecomment-36217654
Rebased !!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user ScrapCodes commented on the pull request:
https://github.com/apache/spark/pull/37#issuecomment-36335799
Hey Patrick,
Forgive me for this, this is the second time I have messed up maven build.
---
If your project is set up for it, you can reply to this email and
48 matches
Mail list logo