[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

andrewor14 Wed, 12 Mar 2014 14:57:12 -0700

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/126#discussion_r10543349
  
    --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
    @@ -181,15 +186,49 @@ private[spark] class MapOutputTracker(conf: 
SparkConf) extends Logging {
       }
     }
     
    +/**
    + * MapOutputTracker for the workers. This uses BoundedHashMap to keep 
track of
    + * a limited number of most recently used map output information.
    + */
    +private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends 
MapOutputTracker(conf) {
    +
    +  /**
    +   * Bounded HashMap for storing serialized statuses in the worker. This 
allows
    +   * the HashMap stay bounded in memory-usage. Things dropped from this 
HashMap will be
    +   * automatically repopulated by fetching them again from the driver.
    +   */
    +  protected val MAX_MAP_STATUSES = 100
    --- End diff --
    
    Is this arbitrary? These bounds appear in other places too. Maybe we should 
make them configurable too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1103] [WIP] Automatic garbage collectio...

Reply via email to