On Wed, Mar 9, 2016 at 11:07 AM, Brian E Carpenter
<[email protected]> wrote:
> On 10/03/2016 06:12, Elwyn Davies wrote:
>> I am the assigned Gen-ART reviewer for this draft.
>
> And I am another Gen-ART reviewer who saw this go by:
>
> ...
>> Minor issues: Treatment of packets that don't fit into the hashing 
>> classification scheme:  The default FQ-CoDel hashing
>> mechanism uses the protocol/addresses/ports 5-tuple, but there will be 
>> packets that don't fit this scheme (especially ICMP).
>> There is no mention of what the classification would do with these packets.  
>> I guess that one extra queue would probably
>> suffice to hold any such outliers, but it would be wise to say something 
>> about how the packets from this/these queue(s) would
>> be treated by the scheduler.  It might also be useful to say something about 
>> treatment of outliers in other classification
>> schemes, if only to say that the scheme used needs to think about any such 
>> outliers.
>
> For IPv6 there is another issue here: the well-known difficulty in finding
> the protocol and port numbers at line speed when extension headers are 
> present.
> Which is of course why IPv6 senders are supposed to set the flow label, which
> should in turn provide a handy 3-tuple (addresses/flow label) that would be
> ideal for CoDel. The operational facts today are that most hosts don't set
> the flow label and many paths through the Internet drop packets with extension
> headers, but the fact that the 5-tuple is problematic in this way might be
> worth mentioning.

I tend to think operational issues with the flow label are going to
continue to prevent it from being used widely, and those 24 bits
essentially unused for eternity. (I would have liked a convenient,
standardized place to stick a vpn CTR seqno, actually... /me ducks)

Two more common scenarios: are dealing with vlans (and it's 802.q)
priorities sanely, and MPLS. We don't talk about these because support
has only recently landed in the new hashing api in Linux...

As for ipv6's potential set of headers to decode, the Linux
implementation does try very hard to get to the last header to decode.
(The first BSD implementation did not, as best as I recall). But: In
part the 5 tuple requirement is driven by ipv4 NAT, where the src ip
is static on the gateway and the ports change. In the IPv6 case a
source box often has several ipv6 addresses to choose from (and does),
making even a 3 tuple on ipv6 more effective, no matter the protocol.

I tried for much of the past two years to get public in-fpga-hardware
development of fq_codel derived algorithms, even helping fund -
https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking
. (site: http://www.meshsr.com/ )  It's a really nice board. I have
given away a bunch of them...

only to realize that with the present state of the art in FPGA design
tools, existing public IP, and very scarce hardware engineers, that it
would be more difficult than I'd imagined, even though, DRR and AQM
implementations have been produced as part http://netfpga.org/ 's
stuff. I retain hope, that perhaps driven by the publication of this
draft, that a public hardware implementation will appear by some
energized students. Timestamping in hardware - easy. Hashing, also
easy, but it makes cut-through switching a bit more delayed. The
inherent parallelism of having these fq_codel queues might make
next-hop caches "hotter" and searchable in parallel... and while I
know 10Gbits is achievable (achieved), at 100gbit, I'll be damned if I
know how we're going to do things at .67ns each.

Given that we can run things with a shaper, with cake, in software
today, at a gbit, it has generally been my hope that hardware
implementations would appear attempting to make things better at sub
1gbit speeds, mitigated by the slower you go, the more problems you
solve with fq_codel, and the lower the cpu overhead to shape the
traffic, to where even the cheapest hardware you can buy handles up to
60mbit handily.

The biggest overhead we have is not any of this stuff, but software
rate shaping. I'd love to see hardware that can be programmed at a
non-line rate on outbound. Certain 40Gbit cards already have this
ability... (mellonix)

In terms of overheads on software implementations, fq_codel and cake
have been pushed to 40Gbit on big packets on high end intel boxes.
There are a lot of problems on the rx side of the Linux path,
however... http://www.netdevconf.org/1.1/proceedings/slides/ has tons
of talk about it - specifically:
http://www.netdevconf.org/1.1/proceedings/slides/dangaard-network-performance.pdf
which was also filmed somewhere.

Anyway, going back to revising the draft, if something like my third
paragraph above needs to be incorporated, ok. We could mention the
flow label as a potential thing to hash on, if you want.


>
>     Brian

_______________________________________________
Gen-art mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/gen-art

Reply via email to