Hello all,
Here is a summary of our initial design. Not all areas are covered so we would 
be glad  to discuss anything listed here and any other code/features we could 
leverage.
Thanks!
Eitan


A. Objectives:
[1] Create a NetLink (NL) driver interface for Windows which interoperates with
    the OVS NL user mode.
[2] User mode code should be mostly cross platform with some minimal changes to 
    support specific Windows OS calls.
[3] The Driver should not have to maintain a state or resources for transaction
    or dumps
[4] Reduce the number of system calls: User mode NL code should use Device IOCTL
    system call to send an NL commands and to receive the associated NL reply
        in the same system call, whenever possible (*).
[5] An event may be associated with a NL socket I/O request to signal a 
    completion for an outstanding receive operation on the socket. 
        (For simplicity a single outstanding I/O request could be associated 
with
        a socket for the signaling purpose)
   
(*) We assume Multiple NL transactions for the same socket can never be 
    interleaved   
        
B. Netlink operation types:
There are four types of interactions carried by processes through the NL layer:

[1] Transaction based DPIF primitives: these DPIF commands are mapped to 
    nl_sock_transact NL interface to  nl_sock_transact_multiple. The 
transaction 
        based command creates an ad hoc socket and submits a synchronous device 
        I/O to the driver. The driver constructs the NL reply and copies it to 
the
        output  buffer of the IRP representing the I/O transaction.
    (Provisioning of transaction based command can be brought up and exercised 
         through the ovs-dpctl command in parallel to the exsisting DPIF device)
        

[2] State aware DPIF Dump commands: port and flow dump calls the following NL 
    interfaces:
    a) nl_dump_start()
    b) nl_dump_next()
    c) nl_dump_done() 
        
        With the exception of nl_dump_start these NL primitives are based on a
        synchronous     IOCTL system call rather than Write/Read. Thus, the 
driver
        does not have to maintain any dump transaction outstanding request nor 
        need to allocate any resources for it.

[3] UpCall Port/PID/Unicast socket: 
    The driver maintains per socket queue for all packets which have no 
        matching flow in the flow table. The socket has a single overlapped 
(event)
        structure which will be signalled through a completion of a pending I/O 
        request sent by user mode on subscription (similar to the current 
        implementation). When dpif_recv_wait is called, the event associated 
with 
        the pending I/O request is passed poll_fd_wait_event inorder to wake the
        thread which polls the port queue.
        
        dpif_recv calls nl_socket_recv which in turn drains the queue 
        maintained by the kernel in a synchronous fashion (through the use of 
        system ioctl call). The overlapped structure is rearmed when the 
recv_set 
        DPIF callback function is called.

[4] Event notification / NL multicast subscription:
    An event (such as port addition/deletion link up/down) are propagated from
        the kernel to user mode through a subscription of a socket to a 
multicast 
        group (nl_sock_join_mcgroup()) and a synchronous Receive 
(nl_sock_recv()) 
        for retrieving the events. The driver maintains a single event queue for
        all events. Similar to the UpCall mechanism, a user mode process keeps 
an 
        outstanding I/O request in the driver which is triggered whenever a new 
        event is generated. The event associated with the overlapped structure 
of
        the socket is passed to poll_fd_wait_event() whenever 
dpif_port_poll_wait()
        callback function is called. dpif_poll() will drain the event queue 
through 
        the call of nl_sock_recv().

C. Implementation work flow:
The driver creates a device object which provides a NetLink interface  for user 
mode processes. During the development phase this device is created in addition 
to the existing DPIF device. (This means that the bring-up of the NL based user 
mode can be done on a live kernel with resident DPs, ports and flows) 
All transaction and dump based DPIF functions could be developed and brought up 
when the NL device is a secondary device (ovs-dpctl show and dump XXX should 
work). After the initial phase is completed (i.e. all transaction and dump 
based 
DPIF primitives are implemented), the original device interface will be removed 
and packet and event propagation path will be brought up (driven by vswicth.exe)

[1] Socket creation
    Since PID should be allocated on a system wide basis and unique across all 
processes, the kernel
    assigns the PID for a newly created socket. A new IOCTL command OVS_GET_PID 
returns the PID to a user
    mode client to be associated with the socket.  
        
[2] Detailed description
    nl_sock_transact_multiple() which calls into a series of nl_sock_send__()
    and nl_sock_recv__(). These can be implemented using ReadFile() and 
WriteFile()
    or an ioctl modeled on a transaction which does both read and write. One 
thing
    though is that, nl_sock_transact_multiple() might have to be modified to the
    series of nl_sock_send__() and nl_sock_recv__(), rather than doing a bunch 
of
    sends first and then doing the recvs. This is because Windows may not 
preserve
    message boundaries when we do the recv.


_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to