Re: [OpenWrt-Devel] [PATCH] ar71xx: support for kernel 3.1

Hartmut Knaack Mon, 05 Dec 2011 01:58:07 -0800

Dave Taht schrieb:
> I applied these patch series, and either I goofed (possible), or subsequent
> updates to the various trees since the time it came out and the time
> I started trying it, broke it again.
>
> It fails on linux 3.0.9, 3.1.3, 3.1.4 with errors applying stuff to various
> mtd partitions. A typical error (3.0.9)
>
> Applying patch platform/416-mtd_api_tl_mr3x20.patch
> patching file arch/mips/ar71xx/mach-tl-mr3x20.c
> Hunk #1 FAILED at 34.
> Hunk #2 FAILED at 61.
> 2 out of 2 hunks FAILED -- rejects in file arch/mips/ar71xx/mach-tl-mr3x20.c
> Patch platform/416-mtd_api_tl_mr3x20.patch does not apply (enforce with -f)
> make[4]: *** 
> [/home/cero1/src/cerowrt/build_dir/linux-ar71xx_generic/linux-3.0.9/.quilt_checked]
> Error 1
> make[4]: Leaving directory `/home/cero1/src/cerowrt/target/linux/ar71xx'
> make[3]: *** [compile] Error 2
> make[3]: Leaving directory `/home/cero1/src/cerowrt/target/linux'
> make[2]: *** [target/linux/compile] Error 2
> make[2]: Leaving directory `/home/cero1/src/cerowrt'
> make[1]: *** 
> [/home/cero1/src/cerowrt/staging_dir/target-mips_r2_uClibc-0.9.32/stamp/.target_compile]
> Error 2
> make[1]: Leaving directory `/home/cero1/src/cerowrt'
> make: *** [world] Error 2
juhosg made some changes during the last week, relating mtd code in ar71xx. I 
guess, this is the reason for the above problems, but I still need to check on 
this as soon as I find some time.
>
>
>
> On Sun, Nov 27, 2011 at 7:36 PM, Dave Taht <dave.t...@gmail.com> wrote:
>> On Sun, Nov 27, 2011 at 6:17 PM, Outback Dingo <outbackdi...@gmail.com> 
>> wrote:
>>> On Sun, Nov 27, 2011 at 11:52 AM, Otto Solares Cabrera <so...@guug.org> 
>>> wrote:
>>>> On Sat, Nov 26, 2011 at 10:37:33PM -0500, Outback Dingo wrote:
>>>>> On Sat, Nov 26, 2011 at 10:13 PM, Hartmut Knaack <knaac...@gmx.de> wrote:
>>>>>> This patch brings support for kernel version 3.1 to the ar71xx platform. 
>>>>>> It is based on Otto Estuardo Solares Cabreras linux-3.0 patches, with 
>>>>>> some changes to keep up with recent filename changes in the kernel. 
>>>>>> Minimum kernel version seems to be 3.1.1, otherwise one of the generic 
>>>>>> patches will fail. Successfully tested with kernel 3.1.2 on a WR1043ND. 
>>>>>> Kernel version in the Makefile still needs to be adjusted manually.
>>>>> ill get onto testing these also
>>>> It works for me on the wrt160nl with Linux-3.1.3. Thx Hartmut!
>>> Also working on WNDR3700v2 and a variety of Ubiquiti gear.... nice....
>>> Thanks both of you.
>> My thanks as well, although I haven't had time to do a build yet. IF
>> anyone is interested in
>> byte queue limits, the patches I was attempting to backport to 3.1
>> before taking off for the holiday,
>> including a modified ag71xx driver, are at:
>>
>> http://huchra.bufferbloat.net/~cero1/bql/
>>
>> Regettably they didn't quite compile before I left for holiday, and
>> I'm going to have to rebase cerowrt and rebuild, (I'm still grateful!)
>> and I figure (hope!) one of you folk will beat me to getting BQL working
>> before I get  back to the office tuesday.
>>
>> A plug:
>>
>> Byte queue limits hold great promise for beating bufferbloat, and getting
>> tc's shapers and schedulers to work properly again, at least
>> on ethernet.
>>
>> Byte Queue limits, by holding down the amount of outstanding data that
>> the device driver
>> has in it, all the QoS and shaping tools that we know and love finally
>> get a chance to work again. You can retain high hw tx queue rings - so, as
>> an example, you could have a 6k byte queue limit and 4 large packets
>> in the buffer,
>> or 93 ack packets in the buffer - and this let you manage the bandwidth via
>> tools higher in the stack, as either take about the same amount of
>> time to transmit,
>> without compromising line level performance...
>>
>> The current situation is: we often have hw tx rings of 64 or higher,
>> which translates out to
>> 96k in flight, meaning that (as already demonstrated) with this patch 
>> working,
>> you can improve network responsiveness by a factor of at least ten, perhaps 
>> as
>> much as 100. (TCP's response to buffering is quadratic, not linear,
>> but there are other
>> variables, so... factor 10 sounds good, doesn't it?)
>>
>> From Tom Herbert's announcement (there was much feedback on netdev, I
>> would expect
>> another revision to come by)
>>
>>
>> Changes from last version:
>>  - Rebase to 3.2
>>  - Added CONFIG_BQL and CONFIG_DQL
>>  - Added some cache alignment in struct dql, to split read only, writeable
>>   elements, and split those elements written on transmit from those
>>   written at transmit completion (suggested by Eric).
>>  - Split out adding xps_queue_release as its own patch.
>>  - Some minor performance changes, use likely and unlikely for some
>>   conditionals.
>>  - Cleaned up some "show" functions for bql (pointed out by Ben).
>>  - Change netdev_tx_completed_queue to do check xoff, check
>>   availability, and then check xoff again.  This to prevent potential
>>   race conditions with netdev_sent_queue (as Ben pointed out).
>>  - Did some more testing trying to evaluate overhead of BQL in the
>>   transmit path.  I see about 1-3% degradation in CPU utilization
>>   and maximum pps when BQL is enabled.  Any ideas to beat this
>>   down as much as possible would be appreciated!
>>  - Added high versus low priority traffic test to results below.
>>
>> ----
>>
>> This patch series implements byte queue limits (bql) for NIC TX queues.
>>
>> Byte queue limits are a mechanism to limit the size of the transmit
>> hardware queue on a NIC by number of bytes. The goal of these byte
>> limits is too reduce latency (HOL blocking) caused by excessive queuing
>> in hardware (aka buffer bloat) without sacrificing throughput.
>>
>> Hardware queuing limits are typically specified in terms of a number
>> hardware descriptors, each of which has a variable size. The variability
>> of the size of individual queued items can have a very wide range. For
>> instance with the e1000 NIC the size could range from 64 bytes to 4K
>> (with TSO enabled). This variability makes it next to impossible to
>> choose a single queue limit that prevents starvation and provides lowest
>> possible latency.
>>
>> The objective of byte queue limits is to set the limit to be the
>> minimum needed to prevent starvation between successive transmissions to
>> the hardware. The latency between two transmissions can be variable in a
>> system. It is dependent on interrupt frequency, NAPI polling latencies,
>> scheduling of the queuing discipline, lock contention, etc. Therefore we
>> propose that byte queue limits should be dynamic and change in
>> accordance with networking stack latencies a system encounters.  BQL
>> should not need to take the underlying link speed as input, it should
>> automatically adjust to whatever the speed is (even if that in itself is
>> dynamic).
>>
>> Patches to implement this:
>> - Dynamic queue limits (dql) library.  This provides the general
>> queuing algorithm.
>> - netdev changes that use dlq to support byte queue limits.
>> - Support in drivers for byte queue limits.
>>
>> The effects of BQL are demonstrated in the benchmark results below.
>>
>> --- High priority versus low priority traffic:
>>
>> In this test 100 netperf TCP_STREAMs were started to saturate the link.
>> A single instance of a netperf TCP_RR was run with high priority set.
>> Queuing discipline in pfifo_fast, NIC is e1000 with TX ring size set to
>> 1024.  tps for the high priority RR is listed.
>>
>> No BQL, tso on: 3000-3200K bytes in queue: 36 tps
>> BQL, tso on: 156-194K bytes in queue, 535 tps
>> No BQL, tso off: 453-454K bytes int queue, 234 tps
>> BQL, tso off: 66K bytes in queue, 914 tps
>>
>> ---  Various RR sizes
>>
>> These tests were done running 200 stream of netperf RR tests.  The
>> results demonstrate the reduction in queuing and also illustrates
>> the overhead due to BQL (in small RR sizes).
>>
>> 140000 rr size
>> BQL: 80-215K bytes in queue, 856 tps, 3.26%
>> No BQL: 2700-2930K bytes in queue, 854 tps, 3.71% cpu
>>
>> 14000 rr size
>> BQL: 25-55K bytes in queue, 8500 tps
>> No BQL: 1500-1622K bytes in queue,  8523 tps, 4.53% cpu
>>
>> 1400 rr size
>> BQL: 20-38K in queue bytes in queue, 86582 tps,  7.38% cpu
>> No BQL: 29-117K 85738 tps, 7.67% cpu
>>
>> 140 rr size
>> BQL: 1-10K bytes in queue, 320540 tps, 34.6% cpu
>> No BQL: 1-13K bytes in queue, 323158, 37.16% cpu
>>
>> 1 rr size
>> BQL: 0-3K in queue, 338811 tps, 41.41% cpu
>> No BQL: 0-3K in queue, 339947 42.36% cpu
>>
>> So the amount of queuing in the NIC can be reduced up to 90% or more.
>> Accordingly, the latency for high priority packets in the prescence
>> of low priority bulk throughput traffic can be reduced by 90% or more.
>>
>> Since BQL accounting is in the transmit path for every packet, and the
>> function to recompute the byte limit is run once per transmit
>> completion-- there will be some overhead in using BQL.  So far, Ive see
>> the overhead to be in the range of 1-3% for CPU utilization and maximum
>> pps.
>>
>>
>>
>>
>> --
>> Dave Täht
>> SKYPE: davetaht
>> US Tel: 1-239-829-5608
>> FR Tel: 0638645374
>> http://www.bufferbloat.net
>
>


_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Re: [OpenWrt-Devel] [PATCH] ar71xx: support for kernel 3.1

Reply via email to