Re: HDFS read/write data throttling

Andrew Wang Mon, 11 Nov 2013 19:46:36 -0800

Hey Lohit,

This is an interesting topic, and something I actually worked on in grad
school before coming to Cloudera. It'd help if you could outline some of
your usecases and how per-FileSystem throttling would help. For what I was
doing, it made more sense to throttle on the DN side since you have a
better view over all the I/O happening on the system, and you have
knowledge of different volumes so you can set limits per-disk. This still
isn't 100% reliable though since normally a portion of each disk is used
for MR scratch space, which the DN doesn't have control over. I tried
playing with thread I/O priorities here, but didn't see much improvement.
Maybe the newer cgroups stuff can help out.


I'm sure per-FileSystem throttling will have some benefits (and probably be
easier than some DN-side implementation) but again, it'd help to better
understand the problem you are trying to solve.

Best,
Andrew


On Mon, Nov 11, 2013 at 6:16 PM, Haosong Huang <haosd...@gmail.com> wrote:

> Hi, lohit. There is a Class named
> ThrottledInputStream<
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
> >
>  in hadoop-distcp, you could check it out and find more details.
>
> In addition to this, I am working on this and try to achieve resources
> control(include CPU, Network, Disk IO) in JVM. But my implementation is
> depends on cgroup, which only could run in Linux. I would push my
> library(java-cgroup) to github in the next several months. If you are
> interested at it, give my any advices and help me improve it please. :-)
>
>
> On Tue, Nov 12, 2013 at 3:47 AM, lohit <lohit.vijayar...@gmail.com> wrote:
>
> > Hi Adam,
> >
> > Thanks for the reply. The changes I was referring was in FileSystem.java
> > layer which should not affect HDFS Replication/NameNode operations.
> > To give better idea this would affect clients something like this
> >
> > Configuration conf = new Configuration();
> > conf.setInt("read.bandwitdh.mbpersec", 20); // 20MB/s
> > FileSystem fs = FileSystem.get(conf);
> >
> > FSDataInputStream fis = fs.open("/path/to/file.xt");
> > fis.read(); // <-- This would be max of 20MB/s
> >
> >
> >
> >
> > 2013/11/11 Adam Muise <amu...@hortonworks.com>
> >
> > > See https://issues.apache.org/jira/browse/HDFS-3475
> > >
> > > Please note that this has met with many unexpected impacts on workload.
> > Be
> > > careful and be mindful of your Datanode memory and network capacity.
> > >
> > >
> > >
> > >
> > > On Mon, Nov 11, 2013 at 1:59 PM, lohit <lohit.vijayar...@gmail.com>
> > wrote:
> > >
> > > > Hello Devs,
> > > >
> > > > Wanted to reach out and see if anyone has thought about ability to
> > > throttle
> > > > data transfer within HDFS. One option we have been thinking is to
> > > throttle
> > > > on a per FileSystem basis, similar to Statistics in FileSystem. This
> > > would
> > > > mean anyone with handle to HDFS/Hftp will be throttled globally
> within
> > > JVM.
> > > > Right value to come up for this would be based on type of hardware we
> > use
> > > > and how many tasks/clients we allow.
> > > >
> > > > On the other hand doing something like this at FileSystem layer would
> > > mean
> > > > many other tasks such as Job jar copy, DistributedCache copy and any
> > > hidden
> > > > data movement would also be throttled. We wanted to know if anyone
> has
> > > had
> > > > such requirement on their clusters in the past and what was the
> > thinking
> > > > around it. Appreciate your inputs/comments
> > > >
> > > > --
> > > > Have a Nice Day!
> > > > Lohit
> > > >
> > >
> > >
> > >
> > > --
> > >    * Adam Muise *       Solutions Engineer
> > > ------------------------------
> > >
> > >     Phone:        416-417-4037
> > >   Email:      amu...@hortonworks.com
> > >   Website:   http://www.hortonworks.com/
> > >
> > >       * Follow Us: *
> > > <
> > >
> >
> http://facebook.com/hortonworks/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > > <
> > >
> >
> http://twitter.com/hortonworks?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > > <
> > >
> >
> http://www.linkedin.com/company/hortonworks?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > >
> > >  [image: photo]
> > >
> > >   Latest From Our Blog:  How to use R and other non-Java languages in
> > > MapReduce and Hive
> > > <
> > >
> >
> http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
> >
> >
> > --
> > Have a Nice Day!
> > Lohit
> >
>
>
>
> --
> Best Regards,
> Haosdent Huang
>

Re: HDFS read/write data throttling

Reply via email to