Networking design proposal

Niels Möller Sat, 26 Oct 2002 11:18:39 -0700

I've tried to ponder on what the interfaces to the layer2 and layer3
code should look like. I'm thinking primarily of ipv4 and ipv6 over
ethernet, but I hope it's not too difficult to generalize to other
media.


Summary
~~~~~~~

I propose splitting the code into the following parts: layer 2 (device
driverish), layer 3 part 1 (media dependent ip functionality), layer 3
part 2 (media independent ip stuff, and interface management), layer 4
(implements tcp, udp, and icmp, and replaces current pfinet, in one
way or another).

Layer 2 should get its own translator in the filesystem, and the main
reason is to make it possible to run several pfinets in parallell. The
rest of the code should, at least for a start, be a single process.
Code for accessing layer 2 and layer 3 should preferably be put into a
library libif, analogous to libstore.


Layer 2 (Ethernet)
~~~~~~~~~~~~~~~~~~

Basically this piece of code represents a real physical ethernet card.
Each interface should have a rendezvous point somewhere in the
filesystem, e.g. /device/eth0. It should be implemented as a kernel
device, a userspace translator, or work divided up between kernel and
userspace. Access would usually be restricted, but not necessarily to
root only, for instance one could make the filesystem node owned by a
group "network".

The supported operations:

open()
  Gives you a port to the device.

close()
  Stop using it.

write(frame)
  Accepts a raw ethernet frame as argument, and puts it
  onto the wire.

listen(code, dst)
  Tells the device what traffic you want to see. Code is the ethernet
  type code, and dst is the destination address on frames. The frst
  argument distinguishes e.g. between ipv4 and ipv6, and the second is
  needed for multicast.

  You can listen on several (code, dst) descriptions at once, and if
  several processes have the device open at the same time, they can
  listen on the same or different codes/addresses.

  This call works as a filter, and it also lets the device configure
  the card to do filtering in hardware, as wel as putting the card in
  promiscous mode as necessary.

ignore(code, dst)
  The opposite to listen.

read(buffer)
  Reads a raw frame into a buffer.

Furthermore, the device should implement the usual calls needed by
select(), and there should be some calls to ask the device of its
type, maximum mtu, hardware mac-address and other properties (I'd
*still* like to see general propery lists on inodes, MUAHAHAHA).


Layer 3 (IPv4 and IPv6), part 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It would be cute to have the interfaces described in this section
available in the filesystem, but I don't think that's terribly
important. The "layer 3, part 1" interface is similar to raw sockets.
The code that implements the interface (which could be a library
libif, or some separate process) talks to a single layer 2 device, and
provides the following interface (via rpc, or ordinary function calls)
to it's users:

open(layer-2-device)
  Initialization

close()
  Shutdown

write(packet)
  Sends a raw ip-packet on the interface. Source and destination
  addresses, all headers, checksums, etc, are filled in by the caller.

listen(ip-address)
  Tells the interface that we're interested in packets with the given
  ip-address. Can be a unicast or multicast address. It's not clear
  whether or not the same layer 3 component should handle both ipv4
  and ipv6. One way might be to use ipv6 exclusively, and represent
  ipv4 addresses as ipv6 mapped addresses.

  The wildcard address is valid, which is useful for a packet
  forwarding process.

ignore(ip-addres)
  The inverse of the above.

read(buffer)
  Reads a raw ip-packet into the buffer.

This is pretty similar to the layer 2 interface. And like that, we
also need calls to get the interface's mtu, ip netmask, and perhaps
other properties like any hardware-based link-local address. If rpc:s
are used, we should use the same rpc:s as for layer 2.


Layer 3 (IPv4 and IPv6), part 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This component talkes to one or more "layer 3, part 1" interface. It
is responsible for routing decisions and the like. The point is that
is independent on the underlying media; everything media specific is
done by layer 3, part 1, while the rest is done here. Operations:

open()
  Initialization

close()
  Shutdown

list_interfaces()
delete_interface(interface-index)
add_interface(interface)
  Manage interfaces.

add_address(interface-index, ip-address)
delete_address(ip_address)
list_addresses(interface-index)
  Managing ip-address assignment.
  
write(packet)
  Writes a raw ip-packet. Source and destination addresses provided by
  the caller. Automatically chooses an appropriate interface.

listen(ip-address)
  Says what addresses the caller is interested in. The typical cases
  are (i) get packets with a specific ip-address as destination, (ii)
  get packets with a destination address assigned to any of the
  interfaces, and (iii) get all packets, no matter what the
  destination address is. The latter is for packet forwarding.

read(buffer)
  Reads an ip-packet into the given buffer.

select_address(src-set, dst-set)
  Given a set of possible source addresses, defaulting to all
  addresses on any of the interfaces, and a set of possible
  destination address, choose the best (according to apropriate address
  selection rules) source and destination address.

  Actually, this could be a plain library function, but if there's any
  local configuration of the rules, that configuration must be stored
  somewhere, and this seems like as good a place as any.


Layer 4 interface
~~~~~~~~~~~~~~~~~

This is the interface that is closest to what socket-using
applications will use. It implements tcp, icmp and udp (and perhaps
other protocols as well). I'm not really sure how this would look
like, it could be the current pfinet interface (where is that defined?
I looked in hurd/hurd, but the only rcp defined by
hurd/hurd/pfinet.defs is pfinet_siocgifconf, which isn't terribly
intersting for now), or something plan-9-ish with nodes
/.../<ip>/tcp/<port>.

For now, I think the nicest way is to have a directory tree with
ip-addresses, port numbers etc, including wildcard addresses and
ports. Then link socket-applications with a an -lsocket library that
knows how to deal with that tree. As far as possible one should use
ordinary file rpc:s. Perhaps e.g. SO_REUSEADDR could be mapped to
O_EXCL in some way?

It's also conceivable to do something more low-level, for instance
handling of wildcard addresses and ports could be delegated to
-lsocket. Or one could ge one step further, with the layer 4 server
only handling management of ip addresses and port numbers and not much
more. Clients would get handles that are associated with socket
quadruples <src-ip, src-port, dst-ip, dst-port>, on which they can
send and receive packets, and put the details of tcp, udp and icmp
into -lsocket.

Regards,
/Niels


_______________________________________________
Bug-hurd mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/bug-hurd

Networking design proposal

Reply via email to