>On Fri, Jan 11, 2013 at 03:26:37PM +0100, Jens Axboe wrote:
>> On 2013-01-11 10:11, majianpeng wrote:
>> > In func tg_may_dispatch,
>> >> if (throtl_slice_used(td, tg, rw))
>> >>           throtl_start_new_slice(td, tg, rw);
>> > ...
>> >> if (tg_with_in_bps_limit(td, tg, bio, &bps_wait)
>> >>       && tg_with_in_iops_limit(td, tg, bio, &iops_wait)) {
>> > 
>> > In funcs tg_with_in_(bps/iops)_limit, it used the slice_start to count.
>> > So if privious slice expired, it start a new slice.This can cause hung
>> > task.
>> > 
>> > The next steps can repeat this bug.
>> > 1:echo "8:48 10240" > blkio.throttle.write_bps_devic
>> > 2:dd if=/dev/zero of=/dev/sdd bs=1M count=1 oflag=direct
>> > 
>> > Using the blktrace, the messages about throttle:
>> > root@kernel:/mnt/programs# blktrace -d /dev/sdd -a notify -o -|blkparse -i 
>> > -
>> >   8,48   1        0     0.000000000     0  m   N throtl / [W] new slice 
>> > start=4294854679 end=4294854779 jiffies=4294854679
>> >   8,48   1        0     0.000000966     0  m   N throtl / [W] extend slice 
>> > start=4294854679 end=4294905900 jiffies=4294854679
>> >   8,48   1        0     0.000002553     0  m   N throtl / [W] bio. bdisp=0 
>> > sz=524288 bps=10240 iodisp=0 iops=4294967295 queued=0/0
>> >   8,48   1        0     0.000004788     0  m   N throtl schedule work. 
>> > delay=51200 jiffies=4294854679
>> >   8,48   1        0    51.304698681     0  m   N throtl dispatch 
>> > nr_queued=1 read=0 write=1
>> >   8,48   1        0    51.304701979     0  m   N throtl / [W] new slice 
>> > start=4294905984 end=4294906084 jiffies=4294905984
>> >   8,48   1        0    51.304703329     0  m   N throtl / [W] extend slice 
>> > start=4294905984 end=4294957200 jiffies=4294905984
>> >   8,48   1        0    51.304705783     0  m   N throtl schedule work. 
>> > delay=51200 jiffies=4294905984
>> >   8,48   1        0   102.632697082     0  m   N throtl dispatch 
>> > nr_queued=1 read=0 write=1
>> >   8,48   1        0   102.632700544     0  m   N throtl / [W] new slice 
>> > start=4294957312 end=4294957412 jiffies=4294957312
>> >   8,48   1        0   102.632701922     0  m   N throtl / [W] extend slice 
>> > start=4294957312 end=4295008600 jiffies=4294957312
>> >   8,48   1        0   102.632704016     0  m   N throtl schedule work. 
>> > delay=51200 jiffies=4294957312
>> >   8,48   1        0   153.960696503     0  m   N throtl dispatch 
>> > nr_queued=1 read=0 write=1
>> >   8,48   1        0   153.960699797     0  m   N throtl / [W] new slice 
>> > start=4295008640 end=4295008740 jiffies=4295008640
>> >   8,48   1        0   153.960701153     0  m   N throtl / [W] extend slice 
>> > start=4295008640 end=4295059900 jiffies=4295008640
>> >   8,48   1        0   153.960703218     0  m   N throtl schedule work. 
>> > delay=51200 jiffies=4295008640
>> >   8,48   1        0   205.288697067     0  m   N throtl dispatch 
>> > nr_queued=1 read=0 write=1
>> >   8,48   1        0   205.288700268     0  m   N throtl / [W] new slice 
>> > start=4295059968 end=4295060068 jiffies=4295059968
>> >   8,48   1        0   205.288701630     0  m   N throtl / [W] extend slice 
>> > start=4295059968 end=4295111200 jiffies=4295059968
>> >   8,48   1        0   205.288703784     0  m   N throtl schedule work. 
>> > delay=51200 jiffies=4295059968
>> >   8,48   1        0   256.616696184     0  m   N throtl dispatch 
>> > nr_queued=1 read=0 write=1
>> >   8,48   1        0   256.616699266     0  m   N throtl / [W] new slice 
>> > start=4295111296 end=4295111396 jiffies=4295111296
>> >   8,48   1        0   256.616700574     0  m   N throtl / [W] extend slice 
>> > start=4295111296 end=4295162500 jiffies=4295111296
>> >   8,48   1        0   256.616702701     0  m   N throtl schedule work. 
>> > delay=51200 jiffies=4295111296
>> > 
>> > Signed-off-by: Jianpeng Ma <majianp...@gmail.com>
>> > ---
>> >  block/blk-throttle.c |   15 ++++++++++-----
>> >  1 file changed, 10 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/block/blk-throttle.c b/block/blk-throttle.c
>> > index 3114622..9258789 100644
>> > --- a/block/blk-throttle.c
>> > +++ b/block/blk-throttle.c
>> > @@ -645,6 +645,15 @@ static bool tg_may_dispatch(struct throtl_data *td, 
>> > struct throtl_grp *tg,
>> >    }
>> >  
>> >    /*
>> > +   * If privious slice expired,then start new slice.
>> > +   * But counting bps and iops limit need privious slice info
>> > +   * which ->slice_start.
>> > +   */
>> > +  if (tg_with_in_bps_limit(td, tg, bio, &bps_wait)
>> > +      && tg_with_in_iops_limit(td, tg, bio, &iops_wait))
>> > +          if (wait)
>> > +                  *wait = 0;
>> > +  /*
>> >     * If previous slice expired, start a new one otherwise renew/extend
>> >     * existing slice to make sure it is at least throtl_slice interval
>> >     * long since now.
>> > @@ -656,12 +665,8 @@ static bool tg_may_dispatch(struct throtl_data *td, 
>> > struct throtl_grp *tg,
>> >                    throtl_extend_slice(td, tg, rw, jiffies + throtl_slice);
>> >    }
>> >  
>> > -  if (tg_with_in_bps_limit(td, tg, bio, &bps_wait)
>> > -      && tg_with_in_iops_limit(td, tg, bio, &iops_wait)) {
>> > -          if (wait)
>> > -                  *wait = 0;
>> > +  if (!(bps_wait || iops_wait))
>> >            return 1;
>> > -  }
>> >  
>> >    max_wait = max(bps_wait, iops_wait);
>> 
>> Looks pretty sane to me. Vivek?
>
>Hi Jens,
>
>This fix will introduce other side affects. And that is when an group
>has been idle for few seconds and a new IO gets queued in, we will
>not start a new slice and allow dispatch equivalent of those idle
>seconds before we throttle the group. So keeping group idle will
>become an incentive.
>
>I have attached a patch which might work better. majianpeng, can
>you please give it a try.
>
>Having said that it is strange that workqueue thread is triggering
>so late. Looking at timestamp of traces attached.
>
>We scheduled a delayed work. Following is trace.
>
>8,48   1        0    51.304705783     0  m   N throtl schedule work. 
>delay=51200 jiffies=4294905984
>
>Now a worker should execute after 51.2 seconds (delay=51200).
>
>8,48   1        0   102.632697082     0  m   N throtl dispatch nr_queued=1 
>read=0 write=1
>
>But worker executed at delay of 51.328 seconds (102.632 - 51.304). That
>is a delay of around 128 jiffies (51.328 - 51.2). That kind of seems odd.
>AFAIK, workers are fairly quick to execute after expiry. And it is 
>because of this excessive delay that we think slice has expired. 
>
>May be it is something new. CCing Tejun, if he has seen anything like this
>and may be it is a known issue.
>
>Even if it is a worker thread issue, I think applying below patch makes
>sense.
>
>Thanks
>Vivek
>
>
>blk-throttle: Do not start a new slice if IO is pending
>
>It might happen that we queue an IO (because it overshot the rate)
>and then wait for certain jiffies to pass. We will set slice_end
>accordingly and wait for that time to elapse and worker thread will
>kick in, again evaluate whether group can dispatch or not. Now it
>might happen that current time is after slice_end and
>tg_may_dispatch() will start a new slice.
>
>This will result in IO not being dispatched and be put back on
>wait again. And this will repeat, resulting in hang.
>
>Do not start a new slice if an IO is pending in that group in
>the direction being queired. Instead extend the slice. New slice
>is supposed to start when a group has not been doing IO for some
>time and a new IO shows up. In that case we want do discard
>history and start a new slice.
>
>Signed-off-by: Vivek Goyal <vgo...@redhat.com>
>---
> block/blk-throttle.c |    8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
>Index: linux-2.6/block/blk-throttle.c
>===================================================================
>--- linux-2.6.orig/block/blk-throttle.c        2012-10-18 01:52:28.000000000 
>-0400
>+++ linux-2.6/block/blk-throttle.c     2013-01-14 03:40:41.355731375 -0500
>@@ -648,8 +648,14 @@ static bool tg_may_dispatch(struct throt
>        * If previous slice expired, start a new one otherwise renew/extend
>        * existing slice to make sure it is at least throtl_slice interval
>        * long since now.
>+       *
>+       * Start a new slice only if there is no bio queued in that direction.
>+       * That bio is waiting to be dispatched and slice needs to be
>+       * extended. It might happen that bio waited to be dispatched but
>+       * workqueue execution got little late it might restart a new slice
>+       * instead of taking all the waited time into account.
>        */
>-      if (throtl_slice_used(td, tg, rw))
>+      if (throtl_slice_used(td, tg, rw) && !tg->nr_queued[rw])
>               throtl_start_new_slice(td, tg, rw);
>       else {
>               if (time_before(tg->slice_end[rw], jiffies + throtl_slice))
Hi vivek,
        Your patch is ok.But i had a question:
What's condition tg->nr_queued[rw] = 0, but bio is not null?

Thanks!
JianpengN�Р骒r��y����b�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf"�h���~����i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

Reply via email to