Re: projects/routing announcement/status

2016-08-27 Thread Hooman Fazaeli

On 2016-01-22 03:11, Alexander V. Chernikov wrote:

I would like to introduce routing rework which started as projects/routing SVN 
branch.
It has been around for quite a long time, some of the code has made its way to 
HEAD, but there hasn't been any public announcements.

So, what is projects/routing about?

First, it is about bringing more scalability by solving most annoying problems 
on packet output path.
To be more specific, it eliminates 2 out of 4 locks, converts other 2 to 
rmlock(9) and adds infrastructure to reduce locking to single rmlock for 
certain traffic types.
With these changes, OS is able to forward 12MPPS on 16-core box for both 
IPv4/IPv6 which is 6-10 times better than stock HEAD.

Second, it eases hacking by avoiding direct access to route/lltable internals 
and providing higher level API instead.

Third, it is about bringing advanced features like route multipath, and even 
more speed by adding modular lookup API permitting to use different route 
lookup algorithms based on server role.

Description with graphs and links is available at: 
http://wiki.freebsd.org/ProjectsRoutingProposal
Used API is described in http://wiki.freebsd.org/ProjectsRoutingProposal/API
Current status is available at 
http://wiki.freebsd.org/ProjectsRoutingProposal/ConversionStatus

It is probably much more convenient to read project details on wiki, however 
I’ll try to summarise the most important things here (wiki readers can skip 
till the end).

Typical packet processing (forwarding for router, or output for web server) 
path consists of:

doing routing lookup (radix read rwlock + routing entry (rte) mutex lock)
(optionally) interface address (ifa) atomic refcount acquire/release
doing link level entry (lle, llentry) lookup (afdata read rwlock + llentry read 
(or write) lock)


Most annoying one is the rtentry mutex. The only goal of this mutex is to 
provide rtentry refcounting so consumer code can use it without the risk of 
rtentrry being deleted.
We solve this by saving all needed data into on-stack optimised structure 
instead of refcounting.
Additionally, we are trying to pre-calculate the data we need to pass by using 
special next-hop structures instead of route entries.
Several different (in terms of returned info and relative overhead) functions 
for retrieving routing data are provided.
Most of the consumers have already been switched to the new KPI. Actual 
output/forward path are not converted yet.

It should be noted, that since individual rtentries are not returned, it is not 
possible to do per-ifa output packet accouting (can be observed in netstat -s).

Route table lock is switched to ipfw-like dual-locking mode (read rmlock() for 
data path, rwlock for config changes, route export, etc..).
The reasons of having rwlock are to 1) provide serialization for things in 
control plane not directly used for data path and 2) avoid acquiring 
contested/sleeping locks for rmlock. See projects/routing r287078 for an 
example.

Lltable entry locks were eliminated in r291853, r292155.

Lltable lock is also planned to be converted to dual-locking model, with the 
similar reasoning.
However, instead of (ab)using AFDATA lock, it needs to be converted to 
per-lltable set of locks.


Open problems:
SCTP/Flowtable references rtentries directly. It is not possible to convert 
ip[6]_output() path without dealing with that.

Brief merge plan:
Discuss/merge new routing KPI for data path
Discuss/merge lltable dual-lock (WIP)
Discuss/merge  explicit nexthop changes
Discuss/merge IPv4/IPv6 output path (along with converted sctp/flowtable)
Discuss/merge route table dual-lock

Current outstanding reviews (I encourage you to take a look at these)

D5009 (IPv4 fast forwarding conversion)
D5010 (IPv6 forwarding conversion)
D4794 (Deal with per-ifa output counters)
D4962 (new LLE lookup functions, no sockaddrs in lltable data path)
D4751 (move all lltable code to separate files)

___
freebsd-a...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arch
To unsubscribe, send any mail to "freebsd-arch-unsubscr...@freebsd.org"


First, thanks for the effort. I personally very much appreciate
any improvements made to the network related stuff.

Second have you considered replacing the existing radix tree with
a faster data structure, specially the Luigi DXR
tables? (http://info.iet.unipi.it/~luigi/papers/20120601-dxr.pdf 
)
I apologize if the question is not much relevant to your work.



--
Best regards
Hooman Fazaeli

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: projects/routing announcement/status

2016-08-27 Thread Jim Thompson

> On Aug 27, 2016, at 11:50 AM, Hooman Fazaeli  wrote:
> 
> Second have you considered replacing the existing radix tree with a faster 
> data structure, specially the Luigi DXR
> tables?

DXR only supports IPv4. FYI. 
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: projects/routing announcement/status

2016-08-27 Thread Alexander V . Chernikov

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: projects/routing announcement/status

2016-08-27 Thread Alexander V . Chernikov

___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"


Re: projects/routing announcement/status

2016-08-27 Thread Alexander V . Chernikov
27.08.2016, 20:58, "Jim Thompson" :
>>  On Aug 27, 2016, at 11:50 AM, Hooman Fazaeli  
>> wrote:
>>
>>  Second have you considered replacing the existing radix tree with a faster 
>> data structure, specially the Luigi DXR
>>  tables?
(Sorry for re-posting #2, I incidentally sent an html-only reply).

One of the goals was to be able to ease switching between different structures 
for different purposes on-fly.
I did consider using DXR and there was even some glue code to make it modular 
lookup algo in ipfw tables:
https://svnweb.freebsd.org/base?view=revision&revision=271932

DXR is very fast when handling full-view, but, as Jim already mentioned it is 
ipv4-specific.
Also, it might be overkill when having small number of routes (e.g. typical 
non-routing host).
>
> DXR only supports IPv4. FYI.
> ___
> freebsd-hack...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
___
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"