Hey Lohit, This is an interesting topic, and something I actually worked on in grad school before coming to Cloudera. It'd help if you could outline some of your usecases and how per-FileSystem throttling would help. For what I was doing, it made more sense to throttle on the DN side since you have a better view over all the I/O happening on the system, and you have knowledge of different volumes so you can set limits per-disk. This still isn't 100% reliable though since normally a portion of each disk is used for MR scratch space, which the DN doesn't have control over. I tried playing with thread I/O priorities here, but didn't see much improvement. Maybe the newer cgroups stuff can help out.
I'm sure per-FileSystem throttling will have some benefits (and probably be easier than some DN-side implementation) but again, it'd help to better understand the problem you are trying to solve. Best, Andrew On Mon, Nov 11, 2013 at 6:16 PM, Haosong Huang <haosd...@gmail.com> wrote: > Hi, lohit. There is a Class named > ThrottledInputStream< > http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java > > > in hadoop-distcp, you could check it out and find more details. > > In addition to this, I am working on this and try to achieve resources > control(include CPU, Network, Disk IO) in JVM. But my implementation is > depends on cgroup, which only could run in Linux. I would push my > library(java-cgroup) to github in the next several months. If you are > interested at it, give my any advices and help me improve it please. :-) > > > On Tue, Nov 12, 2013 at 3:47 AM, lohit <lohit.vijayar...@gmail.com> wrote: > > > Hi Adam, > > > > Thanks for the reply. The changes I was referring was in FileSystem.java > > layer which should not affect HDFS Replication/NameNode operations. > > To give better idea this would affect clients something like this > > > > Configuration conf = new Configuration(); > > conf.setInt("read.bandwitdh.mbpersec", 20); // 20MB/s > > FileSystem fs = FileSystem.get(conf); > > > > FSDataInputStream fis = fs.open("/path/to/file.xt"); > > fis.read(); // <-- This would be max of 20MB/s > > > > > > > > > > 2013/11/11 Adam Muise <amu...@hortonworks.com> > > > > > See https://issues.apache.org/jira/browse/HDFS-3475 > > > > > > Please note that this has met with many unexpected impacts on workload. > > Be > > > careful and be mindful of your Datanode memory and network capacity. > > > > > > > > > > > > > > > On Mon, Nov 11, 2013 at 1:59 PM, lohit <lohit.vijayar...@gmail.com> > > wrote: > > > > > > > Hello Devs, > > > > > > > > Wanted to reach out and see if anyone has thought about ability to > > > throttle > > > > data transfer within HDFS. One option we have been thinking is to > > > throttle > > > > on a per FileSystem basis, similar to Statistics in FileSystem. This > > > would > > > > mean anyone with handle to HDFS/Hftp will be throttled globally > within > > > JVM. > > > > Right value to come up for this would be based on type of hardware we > > use > > > > and how many tasks/clients we allow. > > > > > > > > On the other hand doing something like this at FileSystem layer would > > > mean > > > > many other tasks such as Job jar copy, DistributedCache copy and any > > > hidden > > > > data movement would also be throttled. We wanted to know if anyone > has > > > had > > > > such requirement on their clusters in the past and what was the > > thinking > > > > around it. Appreciate your inputs/comments > > > > > > > > -- > > > > Have a Nice Day! > > > > Lohit > > > > > > > > > > > > > > > > -- > > > * Adam Muise * Solutions Engineer > > > ------------------------------ > > > > > > Phone: 416-417-4037 > > > Email: amu...@hortonworks.com > > > Website: http://www.hortonworks.com/ > > > > > > * Follow Us: * > > > < > > > > > > http://facebook.com/hortonworks/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature > > > > > > > < > > > > > > http://twitter.com/hortonworks?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature > > > > > > > < > > > > > > http://www.linkedin.com/company/hortonworks?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature > > > > > > > > > > [image: photo] > > > > > > Latest From Our Blog: How to use R and other non-Java languages in > > > MapReduce and Hive > > > < > > > > > > http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature > > > > > > > > > > -- > > > CONFIDENTIALITY NOTICE > > > NOTICE: This message is intended for the use of the individual or > entity > > to > > > which it is addressed and may contain information that is confidential, > > > privileged and exempt from disclosure under applicable law. If the > reader > > > of this message is not the intended recipient, you are hereby notified > > that > > > any printing, copying, dissemination, distribution, disclosure or > > > forwarding of this communication is strictly prohibited. If you have > > > received this communication in error, please contact the sender > > immediately > > > and delete it from your system. Thank You. > > > > > > > > > > > -- > > Have a Nice Day! > > Lohit > > > > > > -- > Best Regards, > Haosdent Huang >