Re: [OpenWrt-Devel] [PATCH] ar71xx: support for kernel 3.1

Dave Taht Sun, 04 Dec 2011 23:36:05 -0800

I applied these patch series, and either I goofed (possible), or subsequent
updates to the various trees since the time it came out and the time
I started trying it, broke it again.


It fails on linux 3.0.9, 3.1.3, 3.1.4 with errors applying stuff to various
mtd partitions. A typical error (3.0.9)

Applying patch platform/416-mtd_api_tl_mr3x20.patch
patching file arch/mips/ar71xx/mach-tl-mr3x20.c
Hunk #1 FAILED at 34.
Hunk #2 FAILED at 61.
2 out of 2 hunks FAILED -- rejects in file arch/mips/ar71xx/mach-tl-mr3x20.c
Patch platform/416-mtd_api_tl_mr3x20.patch does not apply (enforce with -f)
make[4]: *** 
[/home/cero1/src/cerowrt/build_dir/linux-ar71xx_generic/linux-3.0.9/.quilt_checked]
Error 1
make[4]: Leaving directory `/home/cero1/src/cerowrt/target/linux/ar71xx'
make[3]: *** [compile] Error 2
make[3]: Leaving directory `/home/cero1/src/cerowrt/target/linux'
make[2]: *** [target/linux/compile] Error 2
make[2]: Leaving directory `/home/cero1/src/cerowrt'
make[1]: *** 
[/home/cero1/src/cerowrt/staging_dir/target-mips_r2_uClibc-0.9.32/stamp/.target_compile]
Error 2
make[1]: Leaving directory `/home/cero1/src/cerowrt'
make: *** [world] Error 2




On Sun, Nov 27, 2011 at 7:36 PM, Dave Taht <dave.t...@gmail.com> wrote:
> On Sun, Nov 27, 2011 at 6:17 PM, Outback Dingo <outbackdi...@gmail.com> wrote:
>> On Sun, Nov 27, 2011 at 11:52 AM, Otto Solares Cabrera <so...@guug.org> 
>> wrote:
>>> On Sat, Nov 26, 2011 at 10:37:33PM -0500, Outback Dingo wrote:
>>>> On Sat, Nov 26, 2011 at 10:13 PM, Hartmut Knaack <knaac...@gmx.de> wrote:
>>>> > This patch brings support for kernel version 3.1 to the ar71xx platform. 
>>>> > It is based on Otto Estuardo Solares Cabreras linux-3.0 patches, with 
>>>> > some changes to keep up with recent filename changes in the kernel. 
>>>> > Minimum kernel version seems to be 3.1.1, otherwise one of the generic 
>>>> > patches will fail. Successfully tested with kernel 3.1.2 on a WR1043ND. 
>>>> > Kernel version in the Makefile still needs to be adjusted manually.
>>>>
>>>> ill get onto testing these also
>>>
>>> It works for me on the wrt160nl with Linux-3.1.3. Thx Hartmut!
>>
>> Also working on WNDR3700v2 and a variety of Ubiquiti gear.... nice....
>> Thanks both of you.
>
> My thanks as well, although I haven't had time to do a build yet. IF
> anyone is interested in
> byte queue limits, the patches I was attempting to backport to 3.1
> before taking off for the holiday,
> including a modified ag71xx driver, are at:
>
> http://huchra.bufferbloat.net/~cero1/bql/
>
> Regettably they didn't quite compile before I left for holiday, and
> I'm going to have to rebase cerowrt and rebuild, (I'm still grateful!)
> and I figure (hope!) one of you folk will beat me to getting BQL working
> before I get  back to the office tuesday.
>
> A plug:
>
> Byte queue limits hold great promise for beating bufferbloat, and getting
> tc's shapers and schedulers to work properly again, at least
> on ethernet.
>
> Byte Queue limits, by holding down the amount of outstanding data that
> the device driver
> has in it, all the QoS and shaping tools that we know and love finally
> get a chance to work again. You can retain high hw tx queue rings - so, as
> an example, you could have a 6k byte queue limit and 4 large packets
> in the buffer,
> or 93 ack packets in the buffer - and this let you manage the bandwidth via
> tools higher in the stack, as either take about the same amount of
> time to transmit,
> without compromising line level performance...
>
> The current situation is: we often have hw tx rings of 64 or higher,
> which translates out to
> 96k in flight, meaning that (as already demonstrated) with this patch working,
> you can improve network responsiveness by a factor of at least ten, perhaps as
> much as 100. (TCP's response to buffering is quadratic, not linear,
> but there are other
> variables, so... factor 10 sounds good, doesn't it?)
>
> From Tom Herbert's announcement (there was much feedback on netdev, I
> would expect
> another revision to come by)
>
>
> Changes from last version:
>  - Rebase to 3.2
>  - Added CONFIG_BQL and CONFIG_DQL
>  - Added some cache alignment in struct dql, to split read only, writeable
>   elements, and split those elements written on transmit from those
>   written at transmit completion (suggested by Eric).
>  - Split out adding xps_queue_release as its own patch.
>  - Some minor performance changes, use likely and unlikely for some
>   conditionals.
>  - Cleaned up some "show" functions for bql (pointed out by Ben).
>  - Change netdev_tx_completed_queue to do check xoff, check
>   availability, and then check xoff again.  This to prevent potential
>   race conditions with netdev_sent_queue (as Ben pointed out).
>  - Did some more testing trying to evaluate overhead of BQL in the
>   transmit path.  I see about 1-3% degradation in CPU utilization
>   and maximum pps when BQL is enabled.  Any ideas to beat this
>   down as much as possible would be appreciated!
>  - Added high versus low priority traffic test to results below.
>
> ----
>
> This patch series implements byte queue limits (bql) for NIC TX queues.
>
> Byte queue limits are a mechanism to limit the size of the transmit
> hardware queue on a NIC by number of bytes. The goal of these byte
> limits is too reduce latency (HOL blocking) caused by excessive queuing
> in hardware (aka buffer bloat) without sacrificing throughput.
>
> Hardware queuing limits are typically specified in terms of a number
> hardware descriptors, each of which has a variable size. The variability
> of the size of individual queued items can have a very wide range. For
> instance with the e1000 NIC the size could range from 64 bytes to 4K
> (with TSO enabled). This variability makes it next to impossible to
> choose a single queue limit that prevents starvation and provides lowest
> possible latency.
>
> The objective of byte queue limits is to set the limit to be the
> minimum needed to prevent starvation between successive transmissions to
> the hardware. The latency between two transmissions can be variable in a
> system. It is dependent on interrupt frequency, NAPI polling latencies,
> scheduling of the queuing discipline, lock contention, etc. Therefore we
> propose that byte queue limits should be dynamic and change in
> accordance with networking stack latencies a system encounters.  BQL
> should not need to take the underlying link speed as input, it should
> automatically adjust to whatever the speed is (even if that in itself is
> dynamic).
>
> Patches to implement this:
> - Dynamic queue limits (dql) library.  This provides the general
> queuing algorithm.
> - netdev changes that use dlq to support byte queue limits.
> - Support in drivers for byte queue limits.
>
> The effects of BQL are demonstrated in the benchmark results below.
>
> --- High priority versus low priority traffic:
>
> In this test 100 netperf TCP_STREAMs were started to saturate the link.
> A single instance of a netperf TCP_RR was run with high priority set.
> Queuing discipline in pfifo_fast, NIC is e1000 with TX ring size set to
> 1024.  tps for the high priority RR is listed.
>
> No BQL, tso on: 3000-3200K bytes in queue: 36 tps
> BQL, tso on: 156-194K bytes in queue, 535 tps
> No BQL, tso off: 453-454K bytes int queue, 234 tps
> BQL, tso off: 66K bytes in queue, 914 tps
>
> ---  Various RR sizes
>
> These tests were done running 200 stream of netperf RR tests.  The
> results demonstrate the reduction in queuing and also illustrates
> the overhead due to BQL (in small RR sizes).
>
> 140000 rr size
> BQL: 80-215K bytes in queue, 856 tps, 3.26%
> No BQL: 2700-2930K bytes in queue, 854 tps, 3.71% cpu
>
> 14000 rr size
> BQL: 25-55K bytes in queue, 8500 tps
> No BQL: 1500-1622K bytes in queue,  8523 tps, 4.53% cpu
>
> 1400 rr size
> BQL: 20-38K in queue bytes in queue, 86582 tps,  7.38% cpu
> No BQL: 29-117K 85738 tps, 7.67% cpu
>
> 140 rr size
> BQL: 1-10K bytes in queue, 320540 tps, 34.6% cpu
> No BQL: 1-13K bytes in queue, 323158, 37.16% cpu
>
> 1 rr size
> BQL: 0-3K in queue, 338811 tps, 41.41% cpu
> No BQL: 0-3K in queue, 339947 42.36% cpu
>
> So the amount of queuing in the NIC can be reduced up to 90% or more.
> Accordingly, the latency for high priority packets in the prescence
> of low priority bulk throughput traffic can be reduced by 90% or more.
>
> Since BQL accounting is in the transmit path for every packet, and the
> function to recompute the byte limit is run once per transmit
> completion-- there will be some overhead in using BQL.  So far, Ive see
> the overhead to be in the range of 1-3% for CPU utilization and maximum
> pps.
>
>
>
>
> --
> Dave Täht
> SKYPE: davetaht
> US Tel: 1-239-829-5608
> FR Tel: 0638645374
> http://www.bufferbloat.net



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Re: [OpenWrt-Devel] [PATCH] ar71xx: support for kernel 3.1

Reply via email to