Hi Sam,
Here are some clarifications:

>o) "Transaction based DPIF primitive": does this mean that we do Reads and 
>Writes here?
Transaction based DPIF primitives are mapped into synchronous  device I/O 
control system calls.
The NL reply would be returned in the output buffer of the IOCTL parameter.

>You mean, whenever, say, a Flow dump request is issued, in one reply to give 
>back all flows?
Not necessarily. I meant that the driver does not have to maintain the state of 
the dump command.
Each dump command sent down to the driver would be self-contained. 

>o) "Event notification / NL multicast subscription"
>1. I understand you do not speak of events here as API waitable / notification 
>events, right?
Yes, these are OVS events that are placed in a custom queue.
There is a single Operating System event associated with the global socket 
which collects all OVS events.
It will be triggered through a completion of a pending I/O request in the 
driver.

> what is the format of the structs that would be read from nl_sock_recv()?
The socket structure would contain a system Overlapped structure (along with an 
event).
The Overlapped structure would be used only for unicast and multicast 
subscription.
Transaction and dump based sockets will always not be waitable.



>2. What would the relationship between hyper-v ports and hyper-v nics and dp 
>ports would be?
>I mean, in the sense that the dp port additions and deletions would be 
>requests coming from the userspace to the kernel (so no notification needed), 
>while we get OIDs when nics connect & disconnect. In this sense, I see the 
>hyper-v nic connection and disconnection as something that could be 
>>implemented as API notification events.
I assume that the above question is not related to the Netlink interface but I 
think your description is correct in general:
Hyper-V ports (unlike tunnel ports) are created by the Hyper-V. The driver gets 
notified on every port creation or delition (or attribute change). In turn the 
driver queues an OVS event to a global queue (which was initially created when 
a multicat subscription IOCTL was sent to driver). Then, the driver will 
complete the pending IRP associated with the event queue. The user mode thread 
wairing on the event (associated with the Overlapped structure for this socket) 
will wake up and subsequently a DP port operation would be excuted.

>o) "C. Implementation work flow"
>So our incremental development here would be:
>1. Add a new device (alongside the existing one) 2. Implement a netlink 
>protocol (for basic parsing attributes, etc.) for the new device 3. Implement 
>netlink datapath operations for this device (get and dump only) 
Yes

>4. further & more advanced things are to be dealt with later.
Event notification (multicast) and missed packet path (unicast) will be 
developed as a second phase. At this phase the FPID device object will be 
removed and the "new" vswitchd process will control the driver over the Netlink 
device interface

>If I understand what you mean, I think this is an implementation detail.
>Basically, for our driver, for unicast messages I know that we can do 
>sequential reads. We hold an 'offset' in the buffer where the next read must 
>begin from. However, as I remember, the implementation for "write" simply 
>overwrites the previous buffer (of the corresponding socket). I believe it is 
>good to >keep one-write then one-receive instead of doing all writes, then all 
>receives.
>However, I think we need to take into account the situation where the 
>userspace might be providing a smaller buffer than it is the total to read. 
>Also, I think the "dump" mechanism requires it.
I (want) to assume that each transaction is self-contained which means that the 
driver should not maintain a state of the transaction. Since, we will be using 
an IOCTL for that transaction the user mode buffer length will be specified in 
the command itself. 
All Write/Read dump pairs are replaced with a single IOCTL call. As I 
understand transactions and dump are (as used for DPIF) are not really socket 
operation per se. 


>My suggestions & opinions:
>o) I think we must do dumping via writes and reads. The main reason is the 
>fact that we don't know the total size to read when we request, say, a flow 
>dump.
        
/* Receive a reply. */
error = nl_sock_recv__(sock, buf_txn->reply, false);
I am not familiar with ofpbuf structure. I noticed that you guys used 
MAX_STACK_LENGTH for specifying the buffer length. I need to get back to you on 
this one.


>o) I believe we shouldn't use the netlink overhead (nlmsghdr, genlmsghdr, 
>attributes) when not needed (say, when registering a KEVENT notification) , 
>and, if w>e choose not to use netlink protocol always, we may need a way to 
>differentiate between netlink and non-netlink requests.
Possible, as phase for optimization


Thanks you Sam for reviewing these notes. Please feel free to ask or raise any 
comments.
Eitan


_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to