[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10553393 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/126#issuecomment-37501973 How does immutable Hashmaps help to store metadata? For example, how would you store block ID --> block info in the BlockManager using immutable HashMaps? --- If y

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10553293 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/126#issuecomment-37501724 @rxin It overrides stuff to make sure such things like traversing entire HashMap does not happen. They are meant for being drop-in replacements of scala HashMaps when

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10553150 --- Diff: core/src/main/scala/org/apache/spark/util/BoundedHashMap.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10553128 --- Diff: core/src/main/scala/org/apache/spark/util/TimeStampedWeakValueHashMap.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552915 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552886 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -20,15 +20,15 @@ package org.apache.spark import java.io._ import

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552879 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552817 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552793 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -181,15 +178,50 @@ private[spark] class MapOutputTracker(conf: SparkConf) extends

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552777 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -17,28 +17,24 @@ package org.apache.spark.scheduler

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552692 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag]( checkpointData.flatMap

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552625 --- Diff: core/src/main/scala/org/apache/spark/util/TimeStampedWeakValueHashMap.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552496 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -181,15 +178,50 @@ private[spark] class MapOutputTracker(conf: SparkConf) extends

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552205 --- Diff: core/src/main/scala/org/apache/spark/util/TimeStampedWeakValueHashMap.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10552156 --- Diff: core/src/main/scala/org/apache/spark/util/TimeStampedWeakValueHashMap.scala --- @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/126#issuecomment-37498313 @yaoshengzhe This is only safe, best-effort attempt to clean metadata, so not guarantee is being provided here. All we are trying to do for long running Spark

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10550676 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -50,23 +54,26 @@ private[spark] class MapOutputTrackerMasterActor(tracker

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10550660 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1025,6 +1025,14 @@ abstract class RDD[T: ClassTag]( checkpointData.flatMap

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10549577 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10549507 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10549460 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10548672 --- Diff: core/src/main/scala/org/apache/spark/ContextCleaner.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/126#issuecomment-37485786 @yaoshengzhe I agree using finalizer is not the most ideal thing in the world. However, the problem that we are dealing with here is that there is no clean and safe way to

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-12 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/126#discussion_r10533790 --- Diff: core/src/main/scala/org/apache/spark/Dependency.scala --- @@ -49,9 +49,28 @@ class ShuffleDependency[K, V]( @transient rdd: RDD[_ <: Product

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-11 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/126#issuecomment-37358288 HAHA! I was already working on adding that try-catch. Realized that a bit late after the PR. And yes, super.finalize() is a good call. --- If your project is set up

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

2014-03-11 Thread tdas
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/126 [SPARK-1103] [WIP] Automatic garbage collection of RDD, shuffle and broadcast data This PR allows Spark to automatically cleanup metadata and data related to persisted RDDs, shuffles and broadcast