[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

2014-03-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/146#issuecomment-37751624 Hey Michael, I really like the docs and API for this! I tried this out in spark-shell though and saw a few errors: * The built-in SQL seems to be case-sensitive

[GitHub] spark pull request: [SPARK-1241] Add sliding to RDD

2014-03-15 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/136#issuecomment-37749785 For things like n-grams, isn't it okay to do them just per-partition and not worry about doing stuff across partitions? I agree that both this approach and the o

[GitHub] spark pull request: Fix serialization of MutablePair. Also provide...

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/141#issuecomment-37716246 @rxin did you actually merge this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37714253 Yeah sorry, I didn't mean leave out max and min from StatCounter, I just meant that the RDD.max() and RDD.min() methods should directly call reduce. If you'

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633009 --- Diff: python/pyspark/rdd.py --- @@ -534,7 +534,26 @@ def func(iterator): return reduce(op, vals, zeroValue) # TODO: aggregate

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633006 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -958,6 +958,10 @@ abstract class RDD[T: ClassTag]( */ def takeOrdered(num

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633001 --- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala --- @@ -171,6 +171,8 @@ class PartitioningSuite extends FunSuite with SharedSparkContext

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10633002 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -477,6 +477,16 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] exte

[GitHub] spark pull request: SPARK-1240: handle the case of empty RDD when ...

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/135#issuecomment-37679595 Can you check whether this is broken in Python too, and fix it there as well? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/144#issuecomment-37679298 It might be better to implement `RDD.min` and `RDD.max` with `reduce` directly instead of building a whole StatCounter for them. Also, can you add these to the Java/Scala

[GitHub] spark pull request: Spark 1246 add min max to stat counter

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/144#discussion_r10620079 --- Diff: python/pyspark/rdd.py --- @@ -24,6 +24,7 @@ import sys import shlex import traceback +from bisect import bisect_right --- End

[GitHub] spark pull request: Fix serialization of MutablePair. Also provide...

2014-03-14 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/141#discussion_r10619938 --- Diff: core/src/main/scala/org/apache/spark/util/MutablePair.scala --- @@ -25,10 +25,20 @@ package org.apache.spark.util * @param _2 Element 2 of

[GitHub] spark pull request: Don't swallow all kryo errors, only those that...

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/142#issuecomment-37678749 Or perhaps there's a way to check on the Input object itself whether we're done. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: Don't swallow all kryo errors, only those that...

2014-03-14 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/142#issuecomment-37678686 Looks good but maybe make the test `e.getMessage.toLowerCase.contains("buffer underflow")`, in case they change the wording. --- If your project is set up f

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-13 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37619558 Got it, I think this is okay for now then, but please add some comments in the code to explain that this is an internal API and didn't seem to change across Python ver

[GitHub] spark pull request: [Spark-1234] clean up text in running-on-yarn....

2014-03-13 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/130#issuecomment-37505269 Jenkins, add to whitelist and test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...

2014-03-13 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/128#discussion_r10554415 --- Diff: core/pom.xml --- @@ -184,13 +184,12 @@ metrics-graphite - org.apache.derby - derby - test

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-12 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37505082 BTW as mentioned above please use PriorityQueue here instead of copying their heap. It's just a lot of work to copy the heap.. we can take the performance hit in

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-12 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-37479585 Looks good, thanks! I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-12 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37479507 takeOrdered should always return the smallest elements according to the ordering, so it's not the same as top. For example takeOrdered(2) on [1,2,3,4] should return

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-12 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/119#issuecomment-37450691 About turning this on by default, I'm afraid it will mess up uses of Spark inside a servlet container or similar. Maybe we can keep it off at first. --- If your pr

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-12 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/119#discussion_r10532081 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -130,6 +130,18 @@ class SparkContext( val isLocal = (master == "

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-12 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/119#discussion_r10532017 --- Diff: docs/configuration.md --- @@ -393,6 +393,16 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-12 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/119#discussion_r10531950 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -767,6 +781,20 @@ class SparkContext( case _ => p

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-12 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/119#discussion_r10531841 --- Diff: core/src/test/scala/org/apache/spark/TestUtils.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-12 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/119#discussion_r10531878 --- Diff: core/src/test/scala/org/apache/spark/TestUtils.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request: SPARK-1230: [WIP] Enable SparkContext.addJars(...

2014-03-12 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/119#discussion_r10531729 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -767,6 +781,20 @@ class SparkContext( case _ => p

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-11 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/97#issuecomment-37329386 Hi Prashant, For this feature I think it would be better to use a "key" function instead of a boolean flag for the order. So make the API like this: `

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-11 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/93#discussion_r10486433 --- Diff: python/pyspark/rdd.py --- @@ -628,6 +656,31 @@ def mergeMaps(m1, m2): m1[k] += v return m1 return

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-11 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/93#discussion_r10486216 --- Diff: python/pyspark/rdd.py --- @@ -628,6 +656,31 @@ def mergeMaps(m1, m2): m1[k] += v return m1 return

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-37232223 In particular note that you can use `heapq.heappushpop` to add each item and remove the smallest one when the heap reaches the required size. Before that, just use

[GitHub] spark pull request: SPARK-1162 Added top in python.

2014-03-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/93#issuecomment-37231853 Hey Prashant, I looked at this but it seems that the Queue module in Python is used for thread-safe queues, meaning it will have a lot of unnecessary overhead for what we

[GitHub] spark pull request: SPARK-1168, Added foldByKey to pyspark.

2014-03-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/115#issuecomment-37231009 Looks good, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-972] Added detailed callsite info for V...

2014-03-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/34#issuecomment-37230775 Sorry for the late reply; I've now merged this. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: SPARK-977 Added Python RDD.zip function

2014-03-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/76#issuecomment-37229918 Thanks Prabin; I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark pull request: SPARK-1102: Create a saveAsNewAPIHadoopDataset...

2014-03-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/12#issuecomment-37229092 Sorry, haven't had time to look at this lately, but will do soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions

2014-03-10 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/92#discussion_r10444769 --- Diff: python/pyspark/rdd.py --- @@ -1057,6 +1058,64 @@ def coalesce(self, numPartitions, shuffle=False): jrdd = self._jrdd.coalesce

[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-10 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/106#issuecomment-37220219 Looks good, though I guess it has a compile error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-09 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/60#issuecomment-37156364 Not sure if you saw this, @iven, but Ben put some good comments above -- would be good to fix those. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-09 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/108#issuecomment-37156275 Cool, looks good then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-09 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/106#issuecomment-37156227 Actually I do have one other comment, maybe we should call this getCreationSite / creationSite so as not to confuse it with the call site of a job. This is really the

[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-09 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/106#discussion_r10420059 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -830,13 +830,10 @@ class SparkContext( setLocalProperty("externalCal

[GitHub] spark pull request: Ability to initialize Spark-Shell with command...

2014-03-09 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/109#issuecomment-37156133 Yeah let's add those flags to help. I didn't realize that they weren't shown. --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark pull request: SPARK-1019: pyspark RDD take() throws an NPE

2014-03-09 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/112#issuecomment-37156103 Interesting, good catch. I might've written some of the code to get PySpark to stop tasks early a while back. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-09 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/106#discussion_r10417318 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1031,8 +1026,10 @@ abstract class RDD[T: ClassTag]( private var storageLevel

[GitHub] spark pull request: SPARK-1205: Clean up callSite/origin/generator...

2014-03-09 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/106#issuecomment-37147113 I'd say just remove them -- this is a pretty small feature and easy to work around if anyone used it. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: Ability to initialize Spark-Shell with command...

2014-03-09 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/109#issuecomment-37147073 You can actually already use the `-i` argument on `spark-shell` (similar to on `scala`) to load a script at the beginning. Is that enough? We don't want to diverg

[GitHub] spark pull request: SPARK-1167: Remove metrics-ganglia from defaul...

2014-03-09 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/108#discussion_r10417262 --- Diff: docs/monitoring.md --- @@ -48,11 +48,22 @@ Each instance can report to zero or more _sinks_. Sinks are contained in the * `ConsoleSink

[GitHub] spark pull request: Spark 1162 Implemented takeOrdered in pyspark.

2014-03-07 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/97#discussion_r10406957 --- Diff: python/pyspark/maxheapq.py --- @@ -0,0 +1,115 @@ +# -*- coding: latin-1 -*- + +"""Heap queue algorithm (a.k.a.

[GitHub] spark pull request: Spark-1163, Added missing Python RDD functions

2014-03-07 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/92#issuecomment-37084601 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-782 Clean up for ASM dependency.

2014-03-07 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/100#issuecomment-37084570 Will this also work on Java 8? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36946645 That is Jenkins complaining about the style BTW, hopefully should be easy to fix. You can run sbt/sbt scalastyle to run the same tests locally. --- If your project is set

[GitHub] spark pull request: Spark 1165 rdd.intersection in python and java

2014-03-06 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/80#discussion_r10364898 --- Diff: python/pyspark/rdd.py --- @@ -319,6 +319,22 @@ def union(self, other): return RDD(self_copy._jrdd.union(other_copy._jrdd), self.ctx

[GitHub] spark pull request: MLI-1 Decision Trees

2014-03-06 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-36946054 Jenkins, add to whitelist and test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-06 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36945861 I see, regarding the memory part, it sounds like we could do it in bash, but it might be kind of painful. We could do the following: - Look for just the driver memory

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36828535 Also, not sure what people think about calling this "spark-submit" instead of "spark-app". For the in-cluster use case it's really just for submit

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/86#issuecomment-36828505 Hey Sandy, the overall approach looks good, though I made some comments throughout. It would be really nice to avoid launching a second JVM if possible. It seems that the

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332780 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkAppArguments.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332726 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332714 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332654 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332644 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10332582 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: SPARK-1156: allow user to login into a cluster...

2014-03-05 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/58#issuecomment-36827118 Nope, I think this is good to go. Going to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1187, Added missing Python APIs

2014-03-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/75#issuecomment-36715060 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-977 Added Python RDD.zip function

2014-03-04 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/76#discussion_r10289240 --- Diff: python/pyspark/rdd.py --- @@ -1057,6 +1057,24 @@ def coalesce(self, numPartitions, shuffle=False): jrdd = self._jrdd.coalesce

[GitHub] spark pull request: SPARK-977 Added Python RDD.zip function

2014-03-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/76#issuecomment-36714999 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1109 wrong API docs for pyspark map func...

2014-03-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/73#issuecomment-36692956 Thanks, merged in 0.9 and master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [java8API] SPARK-964 Investigate the potential...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/17#issuecomment-36594079 Looks good to me too. I read through the updated docs and build instructions. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/16#issuecomment-36593941 Hey Holden, wait on this a bit until https://github.com/apache/spark/pull/17 is merged. Then we'll also want to make sure it works with Java 8 (you'll need t

[GitHub] spark pull request: SPARK-1145: Memory mapping with many small blo...

2014-03-03 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/43#discussion_r10243564 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -146,6 +146,12 @@ object BlockFetcherIterator

[GitHub] spark pull request: SPARK-1156: allow user to login into a cluster...

2014-03-03 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/58#discussion_r10243513 --- Diff: ec2/spark_ec2.py --- @@ -680,6 +678,9 @@ def real_main(): opts.zone = random.choice(conn.get_all_zones()).name if action

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-03 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/60#discussion_r10243489 --- Diff: docs/running-on-mesos.md --- @@ -15,13 +15,15 @@ Spark can run on clusters managed by [Apache Mesos](http://mesos.apache.org/). F * `export

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-03 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/60#discussion_r10243444 --- Diff: docs/configuration.md --- @@ -134,6 +134,22 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request: Fixed API docs link in Python programming guid...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/67#issuecomment-36593463 Maybe there was temporarily a wrong version of the docs published -- we republished them recently. But the links seem to work now, so I think it's fine to close t

[GitHub] spark pull request: Remove broken/unused Connection.getChunkFIFO m...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/69#issuecomment-36593399 I've merged this, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1158: Fix flaky RateLimitedOutputStreamS...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/55#issuecomment-36593259 Looks good to me. I'm going to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pr

[GitHub] spark pull request: Patch for SPARK-942

2014-03-03 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10243269 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManager) extends

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/60#issuecomment-36583145 Ah, got it. In that case please expand the doc for the spark.mesos.checkpoint entry to explain that, and maybe link to the corresponding Mesos docs. Otherwise the current

[GitHub] spark pull request: [SPARK-972] Added detailed callsite info for V...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/34#issuecomment-36580331 Sure, named tuple sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Patch for SPARK-942

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/50#issuecomment-36580259 Alright, sounds good. Looking forward to it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Remove broken/unused Connection.getChunkFIFO m...

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/69#issuecomment-36580142 @tdas should take a look at this actually, I think it was his code. But yes there's no reason to keep FIFO. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: Added a unit test for PairRDDFunctions.lookup

2014-03-03 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/36#issuecomment-36579672 I've merged this, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Updated the formatting of code blocks using Gi...

2014-03-03 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/68#discussion_r10235312 --- Diff: docs/python-programming-guide.md --- @@ -6,7 +6,7 @@ title: Python Programming Guide The Spark Python API (PySpark) exposes the Spark

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201307 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -59,24 +59,45 @@ private class MemoryStore(blockManager: BlockManager, maxMemory

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201291 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -52,11 +52,21 @@ private class DiskStore(blockManager: BlockManager, diskManager

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201264 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -549,34 +555,43 @@ private[spark] class BlockManager( var

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201268 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -549,34 +555,43 @@ private[spark] class BlockManager( var

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/50#issuecomment-36483263 Hey Kyle, thanks for bringing this to the new repo. I looked through it and made a few comments. Another concern though is that it would be good to make this work for

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201195 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManager) extends

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201182 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -534,8 +539,9 @@ private[spark] class BlockManager( // If we&#x

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201181 --- Diff: core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala --- @@ -23,9 +23,27 @@ import java.nio.ByteBuffer import

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201177 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -71,10 +71,21 @@ private[spark] class CacheManager(blockManager: BlockManager) extends

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201097 --- Diff: core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201080 --- Diff: core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: Patch for SPARK-942

2014-03-02 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/50#discussion_r10201076 --- Diff: core/src/test/scala/org/apache/spark/storage/FlatmapIteratorSuite.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-1156: allow user to login into a cluster...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/58#issuecomment-36481941 Anyway maybe let's do it like this: if you test it with this change and see that all the commands (stop, resume, etc) still work, then we can keep it. But we should

[GitHub] spark pull request: SPARK-1156: allow user to login into a cluster...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/58#issuecomment-36481811 In that case you can still log into the master with ssh. I guess we could make the "login" command work then, I'm just wondering whether removing this check

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/60#issuecomment-36481768 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: Add role and checkpoint support for Mesos back...

2014-03-02 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/60#issuecomment-36481666 CC @benh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

  1   2   >