[GitHub] spark pull request: SPARK-1244: Throw exception if map output stat...

2014-03-15 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/152#discussion_r10637484 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -35,13 +35,21 @@ private[spark] case class GetMapOutputStatuses(shuffleId

[GitHub] spark pull request: SPARK-1246, added min max API to Double RDDs i...

2014-03-14 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/140#discussion_r10618038 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -86,14 +92,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10590763 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -21,20 +21,16 @@ import java.io._ import java.util.zip

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10590604 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -123,17 +123,17 @@ class DAGScheduler( private val

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10590414 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag]( checkpointData.flatMap

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10589028 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -50,23 +50,26 @@ private[spark] class MapOutputTrackerMasterActor(tracker

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10588418 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -17,28 +17,24 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10586047 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -17,28 +17,24 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10579603 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10578363 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-1144 Added license and RAT to check lice...

2014-03-11 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/125#issuecomment-37325883 @srowen Do you mean something other than ``` org.apache apache 13 ``` ...which is part of the maven build and already

[GitHub] spark pull request: SPARK-1181. 'mvn test' fails out of the box si...

2014-03-04 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/77#issuecomment-36659947 The standard maven build procedure should be to run `mvn -DskipTests package` first (which builds the assembly) and then `mvn test`. The "Building Spark with

[GitHub] spark pull request: [Proposal] SPARK-1171: simplify the implementa...

2014-03-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/63#discussion_r10262224 --- Diff: core/src/main/scala/org/apache/spark/scheduler/WorkerOffer.scala --- @@ -21,4 +21,4 @@ package org.apache.spark.scheduler * Represents free

[GitHub] spark pull request: [Proposal] SPARK-1171: simplify the implementa...

2014-03-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/63#discussion_r1026 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -125,14 +126,17 @@ class

[GitHub] spark pull request: [Proposal] SPARK-1171: simplify the implementa...

2014-03-03 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/63#discussion_r10232174 --- Diff: core/src/main/scala/org/apache/spark/scheduler/WorkerOffer.scala --- @@ -21,4 +21,6 @@ package org.apache.spark.scheduler * Represents free

[GitHub] spark pull request: Remove remaining references to incubation

2014-03-01 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/51#issuecomment-36443418 Ah, I see. create-release.sh was handled in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Remove remaining references to incubation

2014-03-01 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/51#issuecomment-36442877 Looks good. The only remaining incubat* I find are in dev/create-release/create-release.sh, but I'm not sure how you use that script. --- If your project is s

[GitHub] spark pull request: [SPARK-979] Randomize order of offers.

2014-02-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/27#issuecomment-36277558 I see two issues: 1) The deterministic nature of the current scheduler places tasks on the same small set of machines while leaving others largely unused; 2) There is

[GitHub] spark pull request: SPARK-1121 Only add avro if the build is for H...

2014-02-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-36267797 This broke the maven build. Also, both SBT and Maven are still building artifacts with "incubating". [ERROR] The project org.apache.spark:spark

[GitHub] spark pull request: [SPARK-1146] Vagrant support for Spark

2014-02-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/26#issuecomment-36226183 Yes, they definitely have value, but putting them directly into Spark also has costs and imposes responsibilities on the maintainers. The question is how to get the

[GitHub] spark pull request: [SPARK-1146] Vagrant support for Spark

2014-02-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/26#issuecomment-36224757 I'm bothered by the idea of vagrant, docker, ec2, and potentially other virtualization and cloud environments (EMR, etc.) all becoming supported and maintained