[ https://issues.apache.org/jira/browse/HDFS-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HDFS-355. ----------------------------------- Resolution: Fixed Federation sort of fixes this. Closing. > Ability to throttle DFS/MR so as not to overwhelm colo to colo switches > ----------------------------------------------------------------------- > > Key: HDFS-355 > URL: https://issues.apache.org/jira/browse/HDFS-355 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Pete Wyckoff > > Motivation: > This would allow people to put data that is not used as often in non > co-located HDFS instance and when needed pulling it from the other cluster. > This is useful in the context of Hive where a Metastore tells the runtime > system where the data is located (the full URI) or symbolic links. > The problem: > This will not work right now because it may overwhelm switches between the > two instances. > Workaround: > Make the files unplittable or make your block size such that you only get 2-3 > mappers. > Possible solution: > Throttle parallelism in the scheduler by specifying to run only X mappers for > a job no matter how many slots are free. (making some assumptions about the > reliability of the JobTracker's failure detector). -- This message was sent by Atlassian JIRA (v6.2#6252)