Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload
On Sep 4, 2014, at 9:08 PM, Simon Horman wrote: > On Thu, Sep 04, 2014 at 09:30:45AM -0700, Scott Feldman wrote: >> >> On Sep 4, 2014, at 2:04 AM, Simon Horman wrote: >> >>> >>> >>> [snip] >>> >>> In relation to ports and datapaths it seems to me that the API that >>> has been developed accommodates a model where a port may belong to a >>> switch device; that this topology is fixed before any API calls are made >>> and that all all ports belonging to the same switch belong to the same >>> datapath. >>> >>> This makes sense in the case of hardware that looks a lot like a switch. >>> But I think that other scenarios are possible. For example hardware that >>> is able to handle the same abstractions handled by the datapath: datapaths >>> may be created or destroyed; vports may be added and removed from datapaths. >>> >>> So one might have a piece of hardware that is configured with more than one >>> datapath configured and its different ports belong to it might be >>> associated with different data paths. >> >> I’ve tested multiple datapaths on one switch hardware with the current patch >> set and it works fine, without the need to push down any datapath id in the >> API. It works because a switch port can’t belong to more than one datapath. >> Datapaths can be created/destroyed and ports added/removed from datapaths >> dynamically and the right sw_flows are added/removed to program HW. > > And the flows added to a switch always match the in port? Thus > so a given flow is only ever for one in-port and thus one datapath? Correct, for the particular switch implementation we’re working with. If another implementation can’t match on in_port then it seems datapath_id may be needed to partition flows. >>> Or we might have hardware that is able to offload a tunnel vport. > > I think tunnel vports is still an unsolved part of the larger puzzle. Agreed, TBD work to offload tunnel vports. Current implementation only looking at VLAN vports so far. > >>> In short I am thinking in terms of API callbacks to manipulate datapaths >>> and vports. Although I have not thought about it in detail I believe >>> that the current model you have implemented using such a scheme because >>> the scheme I am suggesting maps to that of the datapath and you have >>> implemented your model there. -scott ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload
On 09/05/14 03:02, Scott Feldman wrote: On Thu, Sep 04, 2014 at 09:30:45AM -0700, Scott Feldman wrote: Correct, for the particular switch implementation we’re working with. Do you have L2/3 working with this interface on said switch? I am interested. cheers, jamal ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1] Windows NetLink Socket - Support for asynchronous event notification
Hi Alin, thanks a lot for the review. On the device name for the create file system call we need to use the following definition from the interface header file: #define OVS_DEVICE_NAME_USER TEXT(".\\OpenvSwitchDevice") This is its purpose. There is no GetLastError() for Linux. Good catch! I'll fix the typo and will send a new patch soon. Eitan -Original Message- From: Alin Serdean [mailto:aserd...@cloudbasesolutions.com] Sent: Thursday, September 04, 2014 11:43 PM To: Eitan Eliahu; dev@openvswitch.org Subject: RE: [ovs-dev] [PATCH v1] Windows NetLink Socket - Support for asynchronous event notification Hi Eitan, #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { +int last_error = GetLastError(); The above breaks linux build. You could leave last_error outside and use it wherever appropriate. #define OVS_DEVICE_NAME_NT L"\\Device\\OpenvSwitchDevice" in OvsDpInterfaceExt.h is different from sock->handle = CreateFile(TEXT(".\\OpenVSwitchDevice"). We should also include the file instead of hardcode it in the future. pend_io_request can be included in nl_sock_wait but that is just a preference. Beside spacing issues near createfile and some spelling "structure event associated" I thing this patch is good to go. Thanks, Alin. -Mesaj original- De la: dev [mailto:dev-boun...@openvswitch.org] În numele Eitan Eliahu Trimis: Thursday, September 4, 2014 1:24 PM Către: dev@openvswitch.org Subiect: [ovs-dev] [PATCH v1] Windows NetLink Socket - Support for asynchronous event notification We keep an outstanding, out of band, I/O request in the driver at all time. Once an event generated the driver queues the event message, completes the pending I/O and unblocks the calling thread through setting the event in the overlapped structure n the NL socket. The thread will read all all event messages synchronous through the call of nl_sock_recv() Signed-off-by: Eitan Eliahu elia...@vmware.com --- datapath-windows/include/OvsDpInterfaceExt.h | 1 + lib/netlink-socket.c | 82 ++-- 2 files changed, 79 insertions(+), 4 deletions(-) diff --git a/datapath-windows/include/OvsDpInterfaceExt.h b/datapath-windows/include/OvsDpInterfaceExt.h index 73dfcbe..ab2088a 100644 --- a/datapath-windows/include/OvsDpInterfaceExt.h +++ b/datapath-windows/include/OvsDpInterfaceExt.h @@ -70,6 +70,7 @@ /* Commands available under the OVS_WIN_CONTROL_FAMILY. */ enum ovs_win_control_cmd { OVS_CTRL_CMD_WIN_GET_PID, +OVS_CTRL_CMD_WIN_PEND_REQ }; #endif /* __OVS_DP_INTERFACE_EXT_H_ */ diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index a6be186..4b535f0 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -80,6 +80,7 @@ static int get_sock_pid_from_kernel(struct nl_sock *sock); struct nl_sock { #ifdef _WIN32 HANDLE handle; +OVERLAPPED overlapped; #else int fd; #endif @@ -139,21 +140,30 @@ nl_sock_create(int protocol, struct nl_sock **sockp) sock = xmalloc(sizeof *sock); #ifdef _WIN32 -sock->handle = CreateFileA(".\\OpenVSwitchDevice", +sock->handle = CreateFile(TEXT(".\\OpenVSwitchDevice"), GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, NULL); - -int last_error = GetLastError(); + FILE_FLAG_OVERLAPPED, NULL); if (sock->handle == INVALID_HANDLE_VALUE) { +int last_error = GetLastError(); +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} + +memset(&sock->overlapped, 0, sizeof sock->overlapped); +sock->overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); +if (sock->overlapped.hEvent == NULL) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); goto error; } + #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(errno)); goto error; } @@ -221,6 +231,9 @@ error: } } #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} if (sock->handle != INVALID_HANDLE_VALUE) { CloseHandle(sock->handle); } @@ -248,6 +261,9 @@ nl_sock_destroy(struct nl_sock *sock) { if (sock) { #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} CloseHandle(sock->handle); #else close(sock->fd); @@ -1040,12 +1056,70 @@ nl_dump_done(struct nl_dump *dump) return status == EOF ? 0 : status; } +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O +whenever +* an event or a packet i
[ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification
We keep an outstanding, out of band, I/O request in the driver at all time. Once an event generated the driver queues the event message, completes the pending I/O and unblocks the calling thread through setting the event in the overlapped structure n the NL socket. The thread will read all all event messages synchronous through the call of nl_sock_recv() Addressing review comments in v2. Signed-off-by: Eitan Eliahu elia...@vmware.com --- datapath-windows/include/OvsDpInterfaceExt.h | 1 + lib/netlink-socket.c | 87 +--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/datapath-windows/include/OvsDpInterfaceExt.h b/datapath-windows/include/OvsDpInterfaceExt.h index 73dfcbe..ab2088a 100644 --- a/datapath-windows/include/OvsDpInterfaceExt.h +++ b/datapath-windows/include/OvsDpInterfaceExt.h @@ -70,6 +70,7 @@ /* Commands available under the OVS_WIN_CONTROL_FAMILY. */ enum ovs_win_control_cmd { OVS_CTRL_CMD_WIN_GET_PID, +OVS_CTRL_CMD_WIN_PEND_REQ }; #endif /* __OVS_DP_INTERFACE_EXT_H_ */ diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index a6be186..ea02757 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -80,6 +80,7 @@ static int get_sock_pid_from_kernel(struct nl_sock *sock); struct nl_sock { #ifdef _WIN32 HANDLE handle; +OVERLAPPED overlapped; #else int fd; #endif @@ -139,18 +140,26 @@ nl_sock_create(int protocol, struct nl_sock **sockp) sock = xmalloc(sizeof *sock); #ifdef _WIN32 -sock->handle = CreateFileA(".\\OpenVSwitchDevice", - GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, NULL); - -int last_error = GetLastError(); +sock->handle = CreateFile(OVS_DEVICE_NAME_USER, + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_FLAG_OVERLAPPED, NULL); if (sock->handle == INVALID_HANDLE_VALUE) { +int last_error = GetLastError(); +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} + +memset(&sock->overlapped, 0, sizeof sock->overlapped); +sock->overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); +if (sock->overlapped.hEvent == NULL) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); goto error; } + #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { @@ -221,6 +230,9 @@ error: } } #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} if (sock->handle != INVALID_HANDLE_VALUE) { CloseHandle(sock->handle); } @@ -248,6 +260,9 @@ nl_sock_destroy(struct nl_sock *sock) { if (sock) { #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} CloseHandle(sock->handle); #else close(sock->fd); @@ -1040,12 +1055,70 @@ nl_dump_done(struct nl_dump *dump) return status == EOF ? 0 : status; } +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O whenever +* an event or a packet is ready to be read. Once the I/O is completed +* the overlapped structure event associated with the pending I/O will be set +*/ +static int +pend_io_request(const struct nl_sock *sock) +{ +struct ofpbuf request; +uint64_t request_stub[128]; +struct ovs_header *ovs_header; +struct nlmsghdr *nlmsg; +uint32_t seq; +int retval; +int error; +DWORD bytes; +OVERLAPPED *overlapped = &sock->overlapped; + +int ovs_msg_size = sizeof (struct nlmsghdr) + sizeof (struct genlmsghdr) + + sizeof (struct ovs_header); + +ofpbuf_use_stub(&request, request_stub, sizeof request_stub); + +seq = nl_sock_allocate_seq(sock, 1); +nl_msg_put_genlmsghdr(&request, 0, OVS_WIN_NL_CTRL_FAMILY_ID, 0, + OVS_CTRL_CMD_WIN_PEND_REQ, OVS_WIN_CONTROL_VERSION); +nlmsg = nl_msg_nlmsghdr(&request); +nlmsg->nlmsg_seq = seq; + +ovs_header = ofpbuf_put_uninit(&request, sizeof *ovs_header); +ovs_header->dp_ifindex = 0; + +if (!DeviceIoControl(sock->handle, OVS_IOCTL_WRITE, +ofpbuf_data(&request), ofpbuf_size(&request), +NULL, 0, &bytes, overlapped)) { +error = GetLastError(); +/* Check if the I/O got pended */ +if (error != ERROR_IO_INCOMPLETE && error != ERROR_IO_PENDING) { +VLOG_ERR("nl_sock_wait failed - %s\n", ovs_format_message(error)); +retval = EINVAL; +goto done; +} +} +else { +/* The I/O was completed synchronously
Re: [ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification
Hi Alin, Thanks for review. Acked-by: Ankur Sharma Regards, Ankur From: dev on behalf of Eitan Eliahu Sent: Friday, September 5, 2014 4:48 PM To: dev@openvswitch.org Subject: [ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification We keep an outstanding, out of band, I/O request in the driver at all time. Once an event generated the driver queues the event message, completes the pending I/O and unblocks the calling thread through setting the event in the overlapped structure n the NL socket. The thread will read all all event messages synchronous through the call of nl_sock_recv() Addressing review comments in v2. Signed-off-by: Eitan Eliahu elia...@vmware.com --- datapath-windows/include/OvsDpInterfaceExt.h | 1 + lib/netlink-socket.c | 87 +--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/datapath-windows/include/OvsDpInterfaceExt.h b/datapath-windows/include/OvsDpInterfaceExt.h index 73dfcbe..ab2088a 100644 --- a/datapath-windows/include/OvsDpInterfaceExt.h +++ b/datapath-windows/include/OvsDpInterfaceExt.h @@ -70,6 +70,7 @@ /* Commands available under the OVS_WIN_CONTROL_FAMILY. */ enum ovs_win_control_cmd { OVS_CTRL_CMD_WIN_GET_PID, +OVS_CTRL_CMD_WIN_PEND_REQ }; #endif /* __OVS_DP_INTERFACE_EXT_H_ */ diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index a6be186..ea02757 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -80,6 +80,7 @@ static int get_sock_pid_from_kernel(struct nl_sock *sock); struct nl_sock { #ifdef _WIN32 HANDLE handle; +OVERLAPPED overlapped; #else int fd; #endif @@ -139,18 +140,26 @@ nl_sock_create(int protocol, struct nl_sock **sockp) sock = xmalloc(sizeof *sock); #ifdef _WIN32 -sock->handle = CreateFileA(".\\OpenVSwitchDevice", - GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, NULL); - -int last_error = GetLastError(); +sock->handle = CreateFile(OVS_DEVICE_NAME_USER, + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_FLAG_OVERLAPPED, NULL); if (sock->handle == INVALID_HANDLE_VALUE) { +int last_error = GetLastError(); +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} + +memset(&sock->overlapped, 0, sizeof sock->overlapped); +sock->overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); +if (sock->overlapped.hEvent == NULL) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); goto error; } + #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { @@ -221,6 +230,9 @@ error: } } #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} if (sock->handle != INVALID_HANDLE_VALUE) { CloseHandle(sock->handle); } @@ -248,6 +260,9 @@ nl_sock_destroy(struct nl_sock *sock) { if (sock) { #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} CloseHandle(sock->handle); #else close(sock->fd); @@ -1040,12 +1055,70 @@ nl_dump_done(struct nl_dump *dump) return status == EOF ? 0 : status; } +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O whenever +* an event or a packet is ready to be read. Once the I/O is completed +* the overlapped structure event associated with the pending I/O will be set +*/ +static int +pend_io_request(const struct nl_sock *sock) +{ +struct ofpbuf request; +uint64_t request_stub[128]; +struct ovs_header *ovs_header; +struct nlmsghdr *nlmsg; +uint32_t seq; +int retval; +int error; +DWORD bytes; +OVERLAPPED *overlapped = &sock->overlapped; + +int ovs_msg_size = sizeof (struct nlmsghdr) + sizeof (struct genlmsghdr) + + sizeof (struct ovs_header); + +ofpbuf_use_stub(&request, request_stub, sizeof request_stub); + +seq = nl_sock_allocate_seq(sock, 1); +nl_msg_put_genlmsghdr(&request, 0, OVS_WIN_NL_CTRL_FAMILY_ID, 0, + OVS_CTRL_CMD_WIN_PEND_REQ, OVS_WIN_CONTROL_VERSION); +nlmsg = nl_msg_nlmsghdr(&request); +nlmsg->nlmsg_seq = seq; + +ovs_header = ofpbuf_put_uninit(&request, sizeof *ovs_header); +ovs_header->dp_ifindex = 0; + +if (!DeviceIoControl(sock->handle, OVS_IOCTL_WRITE, +ofpbuf_data(&request), ofpbuf_size(&request), +NULL, 0, &bytes, overlapped)) { +error = GetLastError(); +/*
Re: [ovs-dev] [PATCH v4 3/4] lib/odp: Use masked set actions.
On Aug 12, 2014, at 3:45 PM, Ben Pfaff wrote: > On Mon, Aug 11, 2014 at 09:15:00AM -0700, Jarno Rajahalme wrote: >> Signed-off-by: Jarno Rajahalme > > The amount of redundancy in the code at this point is starting to give > me a headache. How about a pattern like this (provided as an > incremental)? > I see what you mean :-) I see we can use these getters and setters also elsewhere in the file to cut down code duplication even more. v5 coming up shortly. Jarno ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification
Hi Eitan Beside some style issue in: +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O whenever +* an event or a packet is ready to be read. Once the I/O is completed +* the overlapped structure event associated with the pending I/O will be set +*/ Acked-by: Alin Gabriel Serdean -Mesaj original- De la: dev [mailto:dev-boun...@openvswitch.org] În numele Eitan Eliahu Trimis: Friday, September 5, 2014 4:49 PM Către: dev@openvswitch.org Subiect: [ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification We keep an outstanding, out of band, I/O request in the driver at all time. Once an event generated the driver queues the event message, completes the pending I/O and unblocks the calling thread through setting the event in the overlapped structure n the NL socket. The thread will read all all event messages synchronous through the call of nl_sock_recv() Addressing review comments in v2. Signed-off-by: Eitan Eliahu elia...@vmware.com --- datapath-windows/include/OvsDpInterfaceExt.h | 1 + lib/netlink-socket.c | 87 +--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/datapath-windows/include/OvsDpInterfaceExt.h b/datapath-windows/include/OvsDpInterfaceExt.h index 73dfcbe..ab2088a 100644 --- a/datapath-windows/include/OvsDpInterfaceExt.h +++ b/datapath-windows/include/OvsDpInterfaceExt.h @@ -70,6 +70,7 @@ /* Commands available under the OVS_WIN_CONTROL_FAMILY. */ enum ovs_win_control_cmd { OVS_CTRL_CMD_WIN_GET_PID, +OVS_CTRL_CMD_WIN_PEND_REQ }; #endif /* __OVS_DP_INTERFACE_EXT_H_ */ diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index a6be186..ea02757 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -80,6 +80,7 @@ static int get_sock_pid_from_kernel(struct nl_sock *sock); struct nl_sock { #ifdef _WIN32 HANDLE handle; +OVERLAPPED overlapped; #else int fd; #endif @@ -139,18 +140,26 @@ nl_sock_create(int protocol, struct nl_sock **sockp) sock = xmalloc(sizeof *sock); #ifdef _WIN32 -sock->handle = CreateFileA(".\\OpenVSwitchDevice", - GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, NULL); - -int last_error = GetLastError(); +sock->handle = CreateFile(OVS_DEVICE_NAME_USER, + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_FLAG_OVERLAPPED, NULL); if (sock->handle == INVALID_HANDLE_VALUE) { +int last_error = GetLastError(); +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} + +memset(&sock->overlapped, 0, sizeof sock->overlapped); +sock->overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); +if (sock->overlapped.hEvent == NULL) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); goto error; } + #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { @@ -221,6 +230,9 @@ error: } } #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} if (sock->handle != INVALID_HANDLE_VALUE) { CloseHandle(sock->handle); } @@ -248,6 +260,9 @@ nl_sock_destroy(struct nl_sock *sock) { if (sock) { #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} CloseHandle(sock->handle); #else close(sock->fd); @@ -1040,12 +1055,70 @@ nl_dump_done(struct nl_dump *dump) return status == EOF ? 0 : status; } +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O +whenever +* an event or a packet is ready to be read. Once the I/O is completed +* the overlapped structure event associated with the pending I/O will +be set */ static int pend_io_request(const struct nl_sock *sock) { +struct ofpbuf request; +uint64_t request_stub[128]; +struct ovs_header *ovs_header; +struct nlmsghdr *nlmsg; +uint32_t seq; +int retval; +int error; +DWORD bytes; +OVERLAPPED *overlapped = &sock->overlapped; + +int ovs_msg_size = sizeof (struct nlmsghdr) + sizeof (struct genlmsghdr) + + sizeof (struct ovs_header); + +ofpbuf_use_stub(&request, request_stub, sizeof request_stub); + +seq = nl_sock_allocate_seq(sock, 1); +nl_msg_put_genlmsghdr(&request, 0, OVS_WIN_NL_CTRL_FAMILY_ID, 0, + OVS_CTRL_CMD_WIN_PEND_REQ, OVS_WIN_CONTROL_VERSION); +nlmsg = nl_msg_nlmsghdr(&request); +nlmsg->nlmsg_seq = seq; + +ovs_header = o
Re: [ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification
Hi Alin, I'm sorry I don't follow you on this one. Which style issue you refer to? I would like to fix it before checking in. Thanks, Eitan -Original Message- From: Alin Serdean [mailto:aserd...@cloudbasesolutions.com] Sent: Friday, September 05, 2014 11:03 AM To: Eitan Eliahu; dev@openvswitch.org Subject: RE: [ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification Hi Eitan Beside some style issue in: +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O +whenever +* an event or a packet is ready to be read. Once the I/O is completed +* the overlapped structure event associated with the pending I/O will +be set */ Acked-by: Alin Gabriel Serdean -Mesaj original- De la: dev [mailto:dev-boun...@openvswitch.org] În numele Eitan Eliahu Trimis: Friday, September 5, 2014 4:49 PM Către: dev@openvswitch.org Subiect: [ovs-dev] [PATCH v2] Windows NetLink Socket - Support for asynchronous event notification We keep an outstanding, out of band, I/O request in the driver at all time. Once an event generated the driver queues the event message, completes the pending I/O and unblocks the calling thread through setting the event in the overlapped structure n the NL socket. The thread will read all all event messages synchronous through the call of nl_sock_recv() Addressing review comments in v2. Signed-off-by: Eitan Eliahu elia...@vmware.com --- datapath-windows/include/OvsDpInterfaceExt.h | 1 + lib/netlink-socket.c | 87 +--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/datapath-windows/include/OvsDpInterfaceExt.h b/datapath-windows/include/OvsDpInterfaceExt.h index 73dfcbe..ab2088a 100644 --- a/datapath-windows/include/OvsDpInterfaceExt.h +++ b/datapath-windows/include/OvsDpInterfaceExt.h @@ -70,6 +70,7 @@ /* Commands available under the OVS_WIN_CONTROL_FAMILY. */ enum ovs_win_control_cmd { OVS_CTRL_CMD_WIN_GET_PID, +OVS_CTRL_CMD_WIN_PEND_REQ }; #endif /* __OVS_DP_INTERFACE_EXT_H_ */ diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index a6be186..ea02757 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -80,6 +80,7 @@ static int get_sock_pid_from_kernel(struct nl_sock *sock); struct nl_sock { #ifdef _WIN32 HANDLE handle; +OVERLAPPED overlapped; #else int fd; #endif @@ -139,18 +140,26 @@ nl_sock_create(int protocol, struct nl_sock **sockp) sock = xmalloc(sizeof *sock); #ifdef _WIN32 -sock->handle = CreateFileA(".\\OpenVSwitchDevice", - GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, NULL); - -int last_error = GetLastError(); +sock->handle = CreateFile(OVS_DEVICE_NAME_USER, + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_FLAG_OVERLAPPED, NULL); if (sock->handle == INVALID_HANDLE_VALUE) { +int last_error = GetLastError(); +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} + +memset(&sock->overlapped, 0, sizeof sock->overlapped); +sock->overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); +if (sock->overlapped.hEvent == NULL) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); goto error; } + #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { @@ -221,6 +230,9 @@ error: } } #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} if (sock->handle != INVALID_HANDLE_VALUE) { CloseHandle(sock->handle); } @@ -248,6 +260,9 @@ nl_sock_destroy(struct nl_sock *sock) { if (sock) { #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} CloseHandle(sock->handle); #else close(sock->fd); @@ -1040,12 +1055,70 @@ nl_dump_done(struct nl_dump *dump) return status == EOF ? 0 : status; } +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O +whenever +* an event or a packet is ready to be read. Once the I/O is completed +* the overlapped structure event associated with the pending I/O will +be set */ static int pend_io_request(const struct nl_sock *sock) { +struct ofpbuf request; +uint64_t request_stub[128]; +struct ovs_header *ovs_header; +struct nlmsghdr *nlmsg; +uint32_t seq; +int retval; +int error; +DWORD bytes; +OVERLAPPED *overlapped = &sock->overlapped; + +int ovs_msg_size = sizeof (struct nlmsghdr) + sizeof (struct genlmsg
[ovs-dev] [PATCH v3] Windows NetLink Socket - Support for asynchronous event notification
We keep an outstanding, out of band, I/O request in the driver at all time. Once an event generated the driver queues the event message, completes the pending I/O and unblocks the calling thread through setting the event in the overlapped structure n the NL socket. The thread will read all all event messages synchronous through the call of nl_sock_recv() Addressing review comments in v2. Signed-off-by: Eitan Eliahu elia...@vmware.com --- datapath-windows/include/OvsDpInterfaceExt.h | 1 + lib/netlink-socket.c | 87 +--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/datapath-windows/include/OvsDpInterfaceExt.h b/datapath-windows/include/OvsDpInterfaceExt.h index 73dfcbe..ab2088a 100644 --- a/datapath-windows/include/OvsDpInterfaceExt.h +++ b/datapath-windows/include/OvsDpInterfaceExt.h @@ -70,6 +70,7 @@ /* Commands available under the OVS_WIN_CONTROL_FAMILY. */ enum ovs_win_control_cmd { OVS_CTRL_CMD_WIN_GET_PID, +OVS_CTRL_CMD_WIN_PEND_REQ }; #endif /* __OVS_DP_INTERFACE_EXT_H_ */ diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index a6be186..61f840a 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -80,6 +80,7 @@ static int get_sock_pid_from_kernel(struct nl_sock *sock); struct nl_sock { #ifdef _WIN32 HANDLE handle; +OVERLAPPED overlapped; #else int fd; #endif @@ -139,18 +140,26 @@ nl_sock_create(int protocol, struct nl_sock **sockp) sock = xmalloc(sizeof *sock); #ifdef _WIN32 -sock->handle = CreateFileA(".\\OpenVSwitchDevice", - GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, NULL); - -int last_error = GetLastError(); +sock->handle = CreateFile(OVS_DEVICE_NAME_USER, + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_FLAG_OVERLAPPED, NULL); if (sock->handle == INVALID_HANDLE_VALUE) { +int last_error = GetLastError(); +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} + +memset(&sock->overlapped, 0, sizeof sock->overlapped); +sock->overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); +if (sock->overlapped.hEvent == NULL) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); goto error; } + #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { @@ -221,6 +230,9 @@ error: } } #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} if (sock->handle != INVALID_HANDLE_VALUE) { CloseHandle(sock->handle); } @@ -248,6 +260,9 @@ nl_sock_destroy(struct nl_sock *sock) { if (sock) { #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} CloseHandle(sock->handle); #else close(sock->fd); @@ -1040,12 +1055,70 @@ nl_dump_done(struct nl_dump *dump) return status == EOF ? 0 : status; } +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O whenever + * an event or a packet is ready to be read. Once the I/O is completed + * the overlapped structure event associated with the pending I/O will be set + */ +static int +pend_io_request(const struct nl_sock *sock) +{ +struct ofpbuf request; +uint64_t request_stub[128]; +struct ovs_header *ovs_header; +struct nlmsghdr *nlmsg; +uint32_t seq; +int retval; +int error; +DWORD bytes; +OVERLAPPED *overlapped = &sock->overlapped; + +int ovs_msg_size = sizeof (struct nlmsghdr) + sizeof (struct genlmsghdr) + + sizeof (struct ovs_header); + +ofpbuf_use_stub(&request, request_stub, sizeof request_stub); + +seq = nl_sock_allocate_seq(sock, 1); +nl_msg_put_genlmsghdr(&request, 0, OVS_WIN_NL_CTRL_FAMILY_ID, 0, + OVS_CTRL_CMD_WIN_PEND_REQ, OVS_WIN_CONTROL_VERSION); +nlmsg = nl_msg_nlmsghdr(&request); +nlmsg->nlmsg_seq = seq; + +ovs_header = ofpbuf_put_uninit(&request, sizeof *ovs_header); +ovs_header->dp_ifindex = 0; + +if (!DeviceIoControl(sock->handle, OVS_IOCTL_WRITE, + ofpbuf_data(&request), ofpbuf_size(&request), + NULL, 0, &bytes, overlapped)) { +error = GetLastError(); +/* Check if the I/O got pended */ +if (error != ERROR_IO_INCOMPLETE && error != ERROR_IO_PENDING) { +VLOG_ERR("nl_sock_wait failed - %s\n", ovs_format_message(error)); +retval = EINVAL; +goto done; +} +} +else { +/
Re: [ovs-dev] [PATCH v3] Windows NetLink Socket - Support for asynchronous event notification
Acked-by: Alin Gabriel Serdean Acked-by: Ankur Sharma -Original Message- From: Eitan Eliahu [mailto:elia...@vmware.com] Sent: Friday, September 05, 2014 7:39 PM To: dev@openvswitch.org Cc: Eitan Eliahu Subject: [PATCH v3] Windows NetLink Socket - Support for asynchronous event notification We keep an outstanding, out of band, I/O request in the driver at all time. Once an event generated the driver queues the event message, completes the pending I/O and unblocks the calling thread through setting the event in the overlapped structure n the NL socket. The thread will read all all event messages synchronous through the call of nl_sock_recv() Addressing review comments in v2. Signed-off-by: Eitan Eliahu elia...@vmware.com --- datapath-windows/include/OvsDpInterfaceExt.h | 1 + lib/netlink-socket.c | 87 +--- 2 files changed, 81 insertions(+), 7 deletions(-) diff --git a/datapath-windows/include/OvsDpInterfaceExt.h b/datapath-windows/include/OvsDpInterfaceExt.h index 73dfcbe..ab2088a 100644 --- a/datapath-windows/include/OvsDpInterfaceExt.h +++ b/datapath-windows/include/OvsDpInterfaceExt.h @@ -70,6 +70,7 @@ /* Commands available under the OVS_WIN_CONTROL_FAMILY. */ enum ovs_win_control_cmd { OVS_CTRL_CMD_WIN_GET_PID, +OVS_CTRL_CMD_WIN_PEND_REQ }; #endif /* __OVS_DP_INTERFACE_EXT_H_ */ diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index a6be186..61f840a 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -80,6 +80,7 @@ static int get_sock_pid_from_kernel(struct nl_sock *sock); struct nl_sock { #ifdef _WIN32 HANDLE handle; +OVERLAPPED overlapped; #else int fd; #endif @@ -139,18 +140,26 @@ nl_sock_create(int protocol, struct nl_sock **sockp) sock = xmalloc(sizeof *sock); #ifdef _WIN32 -sock->handle = CreateFileA(".\\OpenVSwitchDevice", - GENERIC_READ | GENERIC_WRITE, - FILE_SHARE_READ | FILE_SHARE_WRITE, - NULL, OPEN_EXISTING, - FILE_ATTRIBUTE_NORMAL, NULL); - -int last_error = GetLastError(); +sock->handle = CreateFile(OVS_DEVICE_NAME_USER, + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_FLAG_OVERLAPPED, NULL); if (sock->handle == INVALID_HANDLE_VALUE) { +int last_error = GetLastError(); +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} + +memset(&sock->overlapped, 0, sizeof sock->overlapped); +sock->overlapped.hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); +if (sock->overlapped.hEvent == NULL) { +int last_error = GetLastError(); VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); goto error; } + #else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { @@ -221,6 +230,9 @@ error: } } #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} if (sock->handle != INVALID_HANDLE_VALUE) { CloseHandle(sock->handle); } @@ -248,6 +260,9 @@ nl_sock_destroy(struct nl_sock *sock) { if (sock) { #ifdef _WIN32 +if (sock->overlapped.hEvent) { +CloseHandle(sock->overlapped.hEvent); +} CloseHandle(sock->handle); #else close(sock->fd); @@ -1040,12 +1055,70 @@ nl_dump_done(struct nl_dump *dump) return status == EOF ? 0 : status; } +#ifdef _WIN32 +/* Pend an I/O request in the driver. The driver completes the I/O +whenever + * an event or a packet is ready to be read. Once the I/O is completed + * the overlapped structure event associated with the pending I/O will +be set */ static int pend_io_request(const struct nl_sock *sock) { +struct ofpbuf request; +uint64_t request_stub[128]; +struct ovs_header *ovs_header; +struct nlmsghdr *nlmsg; +uint32_t seq; +int retval; +int error; +DWORD bytes; +OVERLAPPED *overlapped = &sock->overlapped; + +int ovs_msg_size = sizeof (struct nlmsghdr) + sizeof (struct genlmsghdr) + + sizeof (struct ovs_header); + +ofpbuf_use_stub(&request, request_stub, sizeof request_stub); + +seq = nl_sock_allocate_seq(sock, 1); +nl_msg_put_genlmsghdr(&request, 0, OVS_WIN_NL_CTRL_FAMILY_ID, 0, + OVS_CTRL_CMD_WIN_PEND_REQ, OVS_WIN_CONTROL_VERSION); +nlmsg = nl_msg_nlmsghdr(&request); +nlmsg->nlmsg_seq = seq; + +ovs_header = ofpbuf_put_uninit(&request, sizeof *ovs_header); +ovs_header->dp_ifindex = 0; + +if (!DeviceIoControl(sock->handle, OVS_IOCTL_WRITE, + ofpbuf_data(&request), ofpbuf_size(&request), + NULL, 0, &bytes, overlapped)) { +erro
Re: [ovs-dev] [RFC] Proposal for enhanced select groups
On Thu, Sep 4, 2014 at 12:28 AM, Simon Horman wrote: > On Tue, Sep 02, 2014 at 07:20:30PM -0700, Pravin Shelar wrote: >> On Tue, Sep 2, 2014 at 6:55 PM, Jesse Gross wrote: >> > On Mon, Sep 1, 2014 at 1:10 AM, Simon Horman >> > wrote: >> >> On Thu, Aug 28, 2014 at 10:12:49AM +0900, Simon Horman wrote: >> >>> On Wed, Aug 27, 2014 at 03:03:53PM -0500, Jesse Gross wrote: >> >>> > On Wed, Aug 27, 2014 at 11:51 AM, Ben Pfaff wrote: >> >>> > > On Wed, Aug 27, 2014 at 10:26:14AM +0900, Simon Horman wrote: >> >>> > >> On Fri, Aug 22, 2014 at 08:30:08AM -0700, Ben Pfaff wrote: >> >>> > >> > On Fri, Aug 22, 2014 at 09:19:41PM +0900, Simon Horman wrote: >> >>> > >> What we would like to do is to provide something generally useful >> >>> > >> which may be used as appropriate to: >> >>> > > >> >>> > > I'm going to skip past these ideas, which do sound interesting, >> >>> > > because >> >>> > > I think that they're more for Pravin and Jesse than for me. I hope >> >>> > > that >> >>> > > they will provide some reactions to them. >> >>> > >> >>> > For the hardware offloading piece in particular, I would take a look >> >>> > at the discussion that has been going on in the netdev mailing list. I >> >>> > think the general consensus (to the extent that there is one) is that >> >>> > the hardware offload interface should be a block outside of OVS and >> >>> > then OVS (mostly likely from userspace) configures it. >> >>> >> >>> Thanks, I am now digesting that conversation. >> >> >> >> A lively conversation indeed. >> >> >> >> We are left with two questions for you: >> >> >> >> 1. Would you look at a proposal (I have some rough code that even works) >> >>for a select group action in the datapath prior to the finalisation >> >>of the question of offloads infrastructure in the kernel? >> >> >> >>From our point of view we would ultimately like to use such an action >> >> to >> >>offload to hardware. But it seems that there might be use-cases (not >> >> the >> >>one that I have rough code for) where such an action may be useful. For >> >>example to allow parts of IPVS to be used to provide stateful load >> >>balancing. >> >> >> >>Put another: It doesn't seem that a select group action is dependent on >> >>an offloads tough there are cases where they could be used together. >> > >> > I agree that this is orthogonal to offloading and seems fine to do >> > now. It seems particularly nice if we can use IPVS in a clean way, >> > similar to what is currently being worked on for connection tracking. >> > >> > I guess I'm not entirely sure how you plan to offload this to hardware >> > so it's hard to say how it would intersect in the future. However, the >> > current plan is to have offloading be directed for a higher point >> > (i.e. userspace) and have the OVS kernel module remain a software path >> > so probably it doesn't really matter. >> > >> > However, I'll Pravin comment since he'll be the one reviewing the code. > > Ok, my reading of the recent offload thread, which is somewhat clouded by > preconceptions, is that offloads could be handled by hooks in the datapath. > But I understand other ideas are also under discussion. Indeed it is > more clear to me that other ideas are under discussion now that you have > pointed it out. Thanks. I'm curious about about what exactly you are trying to offload though. Is it the actual group selection operation? Is it the whole datapath and these use cases happen to contain groups? >> I agree it is good to integrate datapath with IPVS. I would like to >> see the design proposal. > > So far I have got as far as a prototype select group action for the > datapath. In its current incarnation it just implements a hash, > using the RSS hash. > > The attributes of a select group action are one or more nested > bucket attributes. And bucket attributes contain a weight attribute > and nested action attributes. I have it in mind to add a selection method > attribute to the selection group attribute, as per my proposal for Open > Flow[1]. > > As such the current hash usage of a hash to select buckets is not > particularly important as I would like to support provision of > implementations of multiple selection methods. > > I have not yet fleshed out an IPVS proposal. But my general idea > is that when the datapath executes a select group action for > an IPVS group that it would call the IPVS scheduler (the IPVS term > for its connection tracker) to determine where to forward a packet. > > On this IPVS side this would probably require adding support for zones, > so that the entries relating to OVS would be separate from anything > else it is doing. This sounds an awful lot like a cross between how bonding is implemented (which I think is pretty much the same as the RSS hash backed version that you describe above) and an IPVS version of the connection tracking proposal that Justin sent out recently. Both of these use recirculation as the "select group" action. I
Re: [ovs-dev] [PATCH v4 4/4] datapath: Allow masks for set actions.
On Aug 11, 2014, at 5:39 PM, Jesse Gross wrote: > On Mon, Aug 11, 2014 at 9:15 AM, Jarno Rajahalme > wrote: >> Masked set action allows more megaflow wildcarding. Masked set action >> is now supported for all writeable key types, except for the tunnel >> key. >> >> The set tunnel action is an exception as any input tunnel info is >> cleared before action processing starts, so there is no tunnel info to >> mask. >> >> The kernel module converts all (non-tunnel) set actions to masked set >> actions. This makes action processing more uniform, and results in >> less branching and duplicating the action processing code. >> >> Signed-off-by: Jarno Rajahalme > > This triggers some sparse warnings that would be nice to avoid: > > CHECK /home/jesse/openvswitch/datapath/linux/actions.c > /home/jesse/openvswitch/datapath/linux/actions.c:498:44: warning: > incorrect type in return expression (different base types) > /home/jesse/openvswitch/datapath/linux/actions.c:498:44:expected bool > /home/jesse/openvswitch/datapath/linux/actions.c:498:44:got > restricted __be32 > Thanks! >> diff --git a/datapath/actions.c b/datapath/actions.c >> index 25c5d77..d57d779 100644 >> --- a/datapath/actions.c >> +++ b/datapath/actions.c >> @@ -432,35 +462,47 @@ static int set_ipv4(struct sk_buff *skb, const struct >> ovs_key_ipv4 *ipv4_key) >> >>nh = ip_hdr(skb); >> >> - if (ipv4_key->ipv4_src != nh->saddr) { >> - set_ip_addr(skb, nh, &nh->saddr, ipv4_key->ipv4_src); >> - flow_key_set_ipv4_src(skb, ipv4_key->ipv4_src); >> - } >> + /* Setting an IP addresses is typically only a side effect of >> +* matching on them in the current userspace implementation, so it >> +* makes sense to check if the value actually changed. */ >> + if (mask->ipv4_src) { >> + new_addr = MASKED(nh->saddr, key->ipv4_src, mask->ipv4_src); >> >> - if (ipv4_key->ipv4_dst != nh->daddr) { > > Here we are checking first that the mask is non-zero and then later > that the values are different. In other places we do only the second > step. What is the rationale? > If checksumming is required, then it makes sense to check if the value actually changed. If masking is not very cheap, then it makes sense to check if the mask is non-zero before doing the masking, especially if it can be expected that the mask is typically zero. >From the perspective of masked set operation the cleanest thing to do is to >just check the mask, as that carries the intent to change the field value. >However, in the current userspace implementation we set the mask also when >matching on the field, not just when it is set, which makes it rather likely >that the value does not actually change for fields like IP destination >address. Therefore that additional cheek for the actual change. In other places (like in when setting ports) there are fewer fields, so the likelihood that a mask for a field is zero is lower so I figured there is no need to check for the mask values, as masking with a zero mask will keep the current value. I’ll add comment explaining this. >> +/* Mask is at the midpoint of the data. */ >> +#define get_mask(a, type) \ >> + ((const type *)((const char *)nla_data(a) + nla_len(a))) > > This doesn't seem right to me. From a netlink perspective, the > attribute length covers the whole thing, both the value and mask. So > won't this always find the end of the attribute, not the middle? Oops :-) This should work better: #define get_mask(a, type) ((const type *)nla_data(a) + 1) > >> diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c >> index e525c9d..937f4d6 100644 >> --- a/datapath/flow_netlink.c >> +++ b/datapath/flow_netlink.c >> @@ -1543,42 +1563,62 @@ static int validate_set(const struct nlattr *a, >>break; >> >>case OVS_KEY_ATTR_TUNNEL: >> - *set_tun = true; >> + if (masked) >> + return -EINVAL; /* Masked tunnel set not supported. >> */ >> + *skip_copy = true; >>err = validate_and_copy_set_tun(a, sfa); >>if (err) >>return err; >> - break; >> + return 0; > > Returning 0 here makes me a little nervous - if we ever add more > validation at the end of the function, it might be silently skipped. Reverted, and added a check below so that we do not convert set tunnels to masked sets. > >>case OVS_KEY_ATTR_IPV4: > [...] >> - if (ipv4_key->ipv4_frag != flow_key->ip.frag) >> - return -EINVAL; >> + /* Non-writeable fields. */ >> + if (mask->ipv4_proto || mask->ipv4_frag) >> + return -EINVAL; >> + } else { >> + if (!flow_key->ip.proto) >> + return -EINVAL; > > I believe tha
Re: [ovs-dev] [PATCH 1/4] ovs-numa: Replace name 'cpu_socket' with 'numa_node'.
On Thu, Sep 4, 2014 at 3:17 PM, Alex Wang wrote: > 'numa' and 'socket' are currently used interchangeably in ovs-numa. > But they are not always equivalent as some platform can have multiple > sockets on a numa node. To avoid confusion, this commit renames all > the 'cpu_socket' to 'numa_node'. > > Signed-off-by: Alex Wang LGTM Acked-by: Pravin B Shelar ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/4] ovs-numa: Relax the ovs_numa_*() input argument check.
On Thu, Sep 4, 2014 at 3:17 PM, Alex Wang wrote: > Many of the ovs_numa_*() functions abort the program when the > input cpu socket or core id is invalid. This commit relaxes > the input check and makes these functions return OVS_*_UNSPEC > when the check fails. > > Signed-off-by: Alex Wang Due to CPU hot plugging, the range based check for core id and numa id is not good enough, but it can improved later on. Otherwise looks good. Acked-by: Pravin B Shelar ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 4/4] ovs-numa: Add module description.
On Thu, Sep 4, 2014 at 3:17 PM, Alex Wang wrote: > Add a short description of the module and its assumption. > > Signed-off-by: Alex Wang LGTM Acked-by: Pravin B Shelar ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/4] ovs-numa: Add function for getting numa node id from core id.
On Thu, Sep 4, 2014 at 3:17 PM, Alex Wang wrote: > Signed-off-by: Alex Wang LGTM Acked-by: Pravin B Shelar ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [ovs-dev 1/3] ovs-dev.py: Add aggressive compile optimization options.
These options don't make sense when building portable code, but when using the dev script, OVS is built on the same system it's run on. They make a small difference in the OVS DPDK testing, hence their addition. Signed-off-by: Ethan Jackson --- utilities/ovs-dev.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index 496f6dc..e454e18 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -63,7 +63,7 @@ def conf(): "--with-logdir=%s/log" % ROOT, "--with-rundir=%s/run" % ROOT, "--enable-silent-rules", "--with-dbdir=" + ROOT, "--silent"] -cflags = "-g -fno-omit-frame-pointer" +cflags = "-g -fno-omit-frame-pointer -march=native" if options.werror: configure.append("--enable-Werror") @@ -83,8 +83,6 @@ def conf(): cflags += " -O%d" % options.optimize -ENV["CFLAGS"] = cflags - _sh("./boot.sh") try: @@ -93,6 +91,7 @@ def conf(): pass # Directory exists. os.chdir(BUILD_GCC) +ENV["CFLAGS"] = cflags + " -finline-limit=5" _sh(*(configure + ["--with-linux=/lib/modules/%s/build" % uname()])) try: @@ -114,6 +113,7 @@ def conf(): pass # Directory exists. ENV["CC"] = "clang" +ENV["CFLAGS"] = cflags os.chdir(BUILD_CLANG) _sh(*configure) -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [ovs-dev 2/3] ovs-dev.py: Support additional optimization flags.
They may or may not make a difference, but there's no reason not to support passing them. Signed-off-by: Ethan Jackson --- utilities/ovs-dev.py | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index e454e18..3686391 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -81,7 +81,7 @@ def conf(): if options.optimize is None: options.optimize = 0 -cflags += " -O%d" % options.optimize +cflags += " -O%s" % str(options.optimize) _sh("./boot.sh") @@ -362,10 +362,13 @@ def main(): help="configure the man documentation install directory") group.add_option("--with-dpdk", dest="with_dpdk", metavar="DPDK_BUILD", help="built with dpdk libraries located at DPDK_BUILD"); +parser.add_option_group(group) -for i in range(4): -group.add_option("--O%d" % i, dest="optimize", action="store_const", - const=i, help="compile with -O%d" % i) +group = optparse.OptionGroup(parser, "Optimization Flags") +for i in ["s", "g"] + range(4) + ["fast"]: +group.add_option("--O%s" % str(i), dest="optimize", + action="store_const", const=i, + help="compile with -O%s" % str(i)) parser.add_option_group(group) group = optparse.OptionGroup(parser, "check") -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [ovs-dev 3/3] ovs-dev.py: Support running the clang binaries.
They have slightly different support characteristics, so it's nice to easily switch between them for testing. Signed-off-by: Ethan Jackson --- utilities/ovs-dev.py | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index 3686391..491d5ff 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -26,14 +26,15 @@ OVS_SRC = HOME + "/ovs" ROOT = HOME + "/root" BUILD_GCC = OVS_SRC + "/_build-gcc" BUILD_CLANG = OVS_SRC + "/_build-clang" -PATH = "%(ovs)s/utilities:%(ovs)s/ovsdb:%(ovs)s/vswitchd" % {"ovs": BUILD_GCC} - -ENV["PATH"] = PATH + ":" + ENV["PATH"] options = None parser = None commands = [] +def set_path(build): +PATH = "%(ovs)s/utilities:%(ovs)s/ovsdb:%(ovs)s/vswitchd" % {"ovs": build} + +ENV["PATH"] = PATH + ":" + ENV["PATH"] def _sh(*args, **kwargs): print "--> " + " ".join(args) @@ -236,7 +237,8 @@ def run(): _sh("ovs-vsctl --no-wait set Open_vSwitch %s ovs_version=%s" % (root_uuid, version)) -cmd = [BUILD_GCC + "/vswitchd/ovs-vswitchd"] +build = BUILD_CLANG if options.clang else BUILD_GCC +cmd = [build + "/vswitchd/ovs-vswitchd"] if options.dpdk: cmd.append("--dpdk") @@ -387,6 +389,9 @@ def main(): group.add_option("--dpdk", dest="dpdk", action="callback", callback=parse_subargs, help="run ovs-vswitchd with dpdk subopts (ended by --)") +group.add_option("--clang", dest="clang", action="store_true", + help="Use binaries built by clang") + parser.add_option_group(group) options, args = parser.parse_args() @@ -396,6 +401,11 @@ def main(): print "Unknown argument " + arg doc() +if options.clang: +set_path(BUILD_CLANG) +else: +set_path(BUILD_GCC) + try: os.chdir(OVS_SRC) except OSError: -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 4/4] datapath: Allow masks for set actions.
On Sep 5, 2014, at 2:12 PM, Jarno Rajahalme wrote: >>> >>> case OVS_KEY_ATTR_IPV4: >> [...] >>> - if (ipv4_key->ipv4_frag != flow_key->ip.frag) >>> - return -EINVAL; >>> + /* Non-writeable fields. */ >>> + if (mask->ipv4_proto || mask->ipv4_frag) >>> + return -EINVAL; >>> + } else { >>> + if (!flow_key->ip.proto) >>> + return -EINVAL; >> >> I believe that this check on ip.proto is being used to verify that the >> IP header is actually present, so this would mean that we can write >> off the end of the packet in the masked case. > > But this is at flow set-up time, and the mask could still wildcard the > ip.proto field, I checked this and flow_key is the masked flow key, so it can not be wildcarded, sorry for the confusion. However, it is conceivable that a userspace app wants to set a flow to match on all IP packets with eth_type(0x0800), and then e.g. set(ipv4(tos=0)). So, I’d like to get rid of the ip.proto check anyway! Is the make_writeable() check sufficient? Jarno > so it could be zero or missing in an actual packet. Doesn’t the > make_writeable take care of the verification?: > > static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *key, > const struct ovs_key_ipv4 *mask) > { > struct iphdr *nh; > __be32 new_addr; > int err; > > err = make_writable(skb, skb_network_offset(skb) + >sizeof(struct iphdr)); > if (unlikely(err)) > return err; > > nh = ip_hdr(skb); > > >> >>> case OVS_KEY_ATTR_IPV6: >> [...] >>> - if (ipv6_key->ipv6_frag != flow_key->ip.frag) >>> - return -EINVAL; >>> + /* Non-writeable fields. */ >>> + if (mask->ipv6_proto || mask->ipv6_frag) >>> + return -EINVAL; >>> + >>> + /* Invalid bits in the flow label mask? */ >>> + if (ntohl(mask->ipv6_label) & 0xFFF0) >>> + return -EINVAL; >>> + } else { >>> + if (!flow_key->ip.proto) >>> + return -EINVAL; >> >> Same issue with header validation here. >> >>> @@ -1605,12 +1649,43 @@ static int validate_set(const struct nlattr *a, >>> + /* Convert non-masked set actions to masked set actions. >>> +* Tunnel set action returns before getting here. */ >>> + if (!masked) { >>> + int start, len = key_len * 2; >>> + struct nlattr *at; >>> + >>> + *skip_copy = true; >>> + >>> + start = add_nested_action_start(sfa, >>> + OVS_ACTION_ATTR_SET_MASKED); >>> + if (start < 0) >>> + return start; >>> + >>> + at = reserve_sfa_size(sfa, nla_attr_size(len)); >>> + if (IS_ERR(at)) >>> + return PTR_ERR(at); >>> + >>> + at->nla_type = key_type; >>> + at->nla_len = nla_attr_size(len); >> >> I think this can be simplified by using add_action() or __add_action(). >> > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/4] ovs-numa: Relax the ovs_numa_*() input argument check.
Thx for pointing it out, I'll make a note in the comment. On Fri, Sep 5, 2014 at 2:19 PM, Pravin Shelar wrote: > On Thu, Sep 4, 2014 at 3:17 PM, Alex Wang wrote: > > Many of the ovs_numa_*() functions abort the program when the > > input cpu socket or core id is invalid. This commit relaxes > > the input check and makes these functions return OVS_*_UNSPEC > > when the check fails. > > > > Signed-off-by: Alex Wang > > Due to CPU hot plugging, the range based check for core id and numa id > is not good enough, but it can improved later on. > Otherwise looks good. > > Acked-by: Pravin B Shelar > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [v8 2/2] datapath: Implement recirc action without recursion
Since kernel stack is limited in size, it is not wise to using recursive function with large stack frames. This patch provides an alternative implementation of recirc action without using recursion. A per CPU fixed sized, 'deferred action FIFO', is used to store either recirc or sample actions encountered during execution of an action list. Not executing recirc or sample action in place, but rather execute them laster as 'deferred actions' avoids recursion. Deferred actions are only executed after all other actions has been executed, including the ones triggered by loopback from the kernel network stack. The size of the private FIFO, currently set to 20, limits the number of total 'deferred actions' any one packet can accumulate. Signed-off-by: Andy Zhou --- v8->v7: Always free skb from execute_recirc() Dynamically allocating per CPU action fifos v6->v7: Always clone the packet key v5->v6: Remove ovs_ prefix for internal symbols. Remove actions.h Rename ovs_exec_actions_count to exec_actions_limit Rename ovs_process_deferred_packets() to process_deferred_actions() v4->v5: Reset fifo after processing deferred actions move private data structures from actions.h to actions.c remove action_fifo init functions, since default percpu data will be zero. --- datapath/actions.c | 169 +++- datapath/datapath.c | 5 ++ datapath/datapath.h | 3 + 3 files changed, 162 insertions(+), 15 deletions(-) diff --git a/datapath/actions.c b/datapath/actions.c index 0a22e55..95f1e51 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -40,12 +40,80 @@ #include "vlan.h" #include "vport.h" +struct deferred_action { + struct sk_buff *skb; + const struct nlattr *actions; + + /* Store pkt_key clone when creating deferred action. */ + struct sw_flow_key pkt_key; +}; + +#define DEFERRED_ACTION_FIFO_SIZE 10 +struct action_fifo { + int head; + int tail; + /* Deferred action fifo queue storage. */ + struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE]; +}; + +static struct action_fifo *action_fifos; +#define EXEC_ACTIONS_LEVEL_LIMIT 4 /* limit used to detect packet + looping by the network stack */ +static DEFINE_PER_CPU(int, exec_actions_level); + +static void action_fifo_init(struct action_fifo *fifo) +{ + fifo->head = 0; + fifo->tail = 0; +} + +static bool action_fifo_is_empty(struct action_fifo *fifo) +{ + return (fifo->head == fifo->tail); +} + +static struct deferred_action * +action_fifo_get(struct action_fifo *fifo) +{ + if (action_fifo_is_empty(fifo)) + return NULL; + + return &fifo->fifo[fifo->tail++]; +} + +static struct deferred_action * +action_fifo_put(struct action_fifo *fifo) +{ + if (fifo->head >= DEFERRED_ACTION_FIFO_SIZE - 1) + return NULL; + + return &fifo->fifo[fifo->head++]; +} + static void flow_key_clone(struct sk_buff *skb, struct sw_flow_key *new_key) { *new_key = *OVS_CB(skb)->pkt_key; OVS_CB(skb)->pkt_key = new_key; } +/* Return True iff fifo is not full */ +static bool add_deferred_actions(struct sk_buff *skb, +const struct nlattr *attr) +{ + struct action_fifo *fifo; + struct deferred_action *da; + + fifo = this_cpu_ptr(action_fifos); + da = action_fifo_put(fifo); + if (da) { + da->skb = skb; + da->actions = attr; + flow_key_clone(skb, &da->pkt_key); + } + + return (da != NULL); +} + static void flow_key_set_recirc_id(struct sk_buff *skb, u32 recirc_id) { OVS_CB(skb)->pkt_key->recirc_id = recirc_id; @@ -689,7 +757,6 @@ static bool last_action(const struct nlattr *a, int rem) static int sample(struct datapath *dp, struct sk_buff *skb, const struct nlattr *attr) { - struct sw_flow_key sample_key; const struct nlattr *acts_list = NULL; const struct nlattr *a; int rem; @@ -728,10 +795,15 @@ static int sample(struct datapath *dp, struct sk_buff *skb, /* Skip the sample action when out of memory. */ return 0; - flow_key_clone(skb, &sample_key); + if (!add_deferred_actions(skb, a)) { + if (net_ratelimit()) + pr_warn("%s: deferred actions limit reached, dropping sample action\n", + ovs_dp_name(dp)); - /* do_execute_actions() will consume the cloned skb. */ - return do_execute_actions(dp, skb, a, rem); + kfree_skb(skb); + } + + return 0; } static void execute_hash(struct sk_buff *skb, const struct nlattr *attr) @@ -750,7 +822,7 @@ static void execute_hash(struct sk_buff *skb, const struct nlattr *attr) } static int execute_set_action
[ovs-dev] [v8 1/2] datapath: Remove recirc stack depth limit check
Future patches will change the recirc action implementation to not using recursion. The stack depth detection is no longer necessary. Signed-off-by: Andy Zhou Acked-by: Pravin B Shelar --- datapath/actions.c | 63 - datapath/datapath.c | 6 ++--- datapath/datapath.h | 4 ++-- datapath/vport.c| 2 +- 4 files changed, 10 insertions(+), 65 deletions(-) diff --git a/datapath/actions.c b/datapath/actions.c index 43ca2a0..0a22e55 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -831,7 +831,7 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, } flow_key_set_recirc_id(skb, nla_get_u32(a)); - ovs_dp_process_packet(skb, true); + ovs_dp_process_packet(skb); return 0; } @@ -924,63 +924,8 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, return 0; } -/* We limit the number of times that we pass into execute_actions() - * to avoid blowing out the stack in the event that we have a loop. - * - * Each loop adds some (estimated) cost to the kernel stack. - * The loop terminates when the max cost is exceeded. - * */ -#define RECIRC_STACK_COST 1 -#define DEFAULT_STACK_COST 4 -/* Allow up to 4 regular services, and up to 3 recirculations */ -#define MAX_STACK_COST (DEFAULT_STACK_COST * 4 + RECIRC_STACK_COST * 3) - -struct loop_counter { - u8 stack_cost; /* loop stack cost. */ - bool looping; /* Loop detected? */ -}; - -static DEFINE_PER_CPU(struct loop_counter, loop_counters); - -static int loop_suppress(struct datapath *dp, struct sw_flow_actions *actions) -{ - if (net_ratelimit()) - pr_warn("%s: flow loop detected, dropping\n", - ovs_dp_name(dp)); - actions->actions_len = 0; - return -ELOOP; -} - /* Execute a list of actions against 'skb'. */ -int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, - struct sw_flow_actions *acts, bool recirc) -{ - const u8 stack_cost = recirc ? RECIRC_STACK_COST : DEFAULT_STACK_COST; - struct loop_counter *loop; - int error; - - /* Check whether we've looped too much. */ - loop = &__get_cpu_var(loop_counters); - loop->stack_cost += stack_cost; - if (unlikely(loop->stack_cost > MAX_STACK_COST)) - loop->looping = true; - if (unlikely(loop->looping)) { - error = loop_suppress(dp, acts); - kfree_skb(skb); - goto out_loop; - } - - error = do_execute_actions(dp, skb, acts->actions, acts->actions_len); - - /* Check whether sub-actions looped too much. */ - if (unlikely(loop->looping)) - error = loop_suppress(dp, acts); - -out_loop: - /* Decrement loop stack cost. */ - loop->stack_cost -= stack_cost; - if (!loop->stack_cost) - loop->looping = false; - - return error; +int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb, struct sw_flow_actions *acts) +{ + return do_execute_actions(dp, skb, acts->actions, acts->actions_len); } diff --git a/datapath/datapath.c b/datapath/datapath.c index b6eadef..a668222 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -251,7 +251,7 @@ void ovs_dp_detach_port(struct vport *p) } /* Must be called with rcu_read_lock. */ -void ovs_dp_process_packet(struct sk_buff *skb, bool recirc) +void ovs_dp_process_packet(struct sk_buff *skb) { const struct vport *p = OVS_CB(skb)->input_vport; struct sw_flow_key *pkt_key = OVS_CB(skb)->pkt_key; @@ -283,7 +283,7 @@ void ovs_dp_process_packet(struct sk_buff *skb, bool recirc) ovs_flow_stats_update(flow, pkt_key->tp.flags, skb); sf_acts = rcu_dereference(flow->sf_acts); - ovs_execute_actions(dp, skb, sf_acts, recirc); + ovs_execute_actions(dp, skb, sf_acts); stats_counter = &stats->n_hit; out: @@ -581,7 +581,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) sf_acts = rcu_dereference(flow->sf_acts); local_bh_disable(); - err = ovs_execute_actions(dp, packet, sf_acts, false); + err = ovs_execute_actions(dp, packet, sf_acts); local_bh_enable(); rcu_read_unlock(); diff --git a/datapath/datapath.h b/datapath/datapath.h index e414225..eba2fc4 100644 --- a/datapath/datapath.h +++ b/datapath/datapath.h @@ -188,7 +188,7 @@ extern struct notifier_block ovs_dp_device_notifier; extern struct genl_family dp_vport_genl_family; extern struct genl_multicast_group ovs_dp_vport_multicast_group; -void ovs_dp_process_packet(struct sk_buff *, bool recirc); +void ovs_dp_process_packet(struct sk_buff *c); void ovs_dp_detach_port(struct vport *); int ovs_dp_upcall(struct datapath *, struct sk_buff *, const struct dp_upcall_info *); @@ -198,7 +198,7 @@ struct sk_buff *ovs_vport_cmd_build_inf
Re: [ovs-dev] [PATCH 4/4] ovs-numa: Add module description.
Thanks, applied all to master On Fri, Sep 5, 2014 at 2:20 PM, Pravin Shelar wrote: > On Thu, Sep 4, 2014 at 3:17 PM, Alex Wang wrote: > > Add a short description of the module and its assumption. > > > > Signed-off-by: Alex Wang > > > LGTM > Acked-by: Pravin B Shelar > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 4/4] datapath: Allow masks for set actions.
On Sep 5, 2014, at 2:26 PM, Jarno Rajahalme wrote: > > On Sep 5, 2014, at 2:12 PM, Jarno Rajahalme wrote: > case OVS_KEY_ATTR_IPV4: >>> [...] - if (ipv4_key->ipv4_frag != flow_key->ip.frag) - return -EINVAL; + /* Non-writeable fields. */ + if (mask->ipv4_proto || mask->ipv4_frag) + return -EINVAL; + } else { + if (!flow_key->ip.proto) + return -EINVAL; >>> >>> I believe that this check on ip.proto is being used to verify that the >>> IP header is actually present, so this would mean that we can write >>> off the end of the packet in the masked case. >> >> But this is at flow set-up time, and the mask could still wildcard the >> ip.proto field, > > I checked this and flow_key is the masked flow key, so it can not be > wildcarded, sorry for the confusion. However, it is conceivable that a > userspace app wants to set a flow to match on all IP packets with > eth_type(0x0800), and then e.g. set(ipv4(tos=0)). So, I’d like to get rid of > the ip.proto check anyway! > > Is the make_writeable() check sufficient? Just discussed this with Pravin, planning to check the extracted packet key to make sure the specific header exists in the packet. Jarno ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 4/4] datapath: Allow masks for set actions.
Jesse, I’ll send a new patch series soon, but here are the fixes to the comments below for your viewing pleasure: Jarno diff --git a/datapath/actions.c b/datapath/actions.c index 0cb7c9e..243b672 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -510,7 +510,7 @@ static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *key, static bool is_ipv6_mask_nonzero(const __be32 addr[4]) { - return addr[0] | addr[1] | addr[2] | addr[3]; + return !!(addr[0] | addr[1] | addr[2] | addr[3]); } static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *key, @@ -565,7 +565,7 @@ static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *key, if (mask->ipv6_tclass) { ipv6_change_dsfield(nh, ~mask->ipv6_tclass, key->ipv6_tclass); flow_key_set_ip_tos(skb, ipv6_get_dsfield(nh)); - } + } if (mask->ipv6_label) { set_ipv6_fl(nh, ntohl(key->ipv6_label), ntohl(mask->ipv6_label)); @@ -599,6 +599,7 @@ static int set_udp(struct sk_buff *skb, const struct ovs_key_udp *key, return err; uh = udp_hdr(skb); +/* Either of the masks is non-zero, so do not bother checking them. */ src = MASKED(uh->source, key->udp_src, mask->udp_src); dst = MASKED(uh->dest, key->udp_dst, mask->udp_dst); @@ -824,8 +825,7 @@ static int execute_set_action(struct sk_buff *skb, const struct nlattr *a) } /* Mask is at the midpoint of the data. */ -#define get_mask(a, type) \ - ((const type *)((const char *)nla_data(a) + nla_len(a))) +#define get_mask(a, type) ((const type *)nla_data(a) + 1) static int execute_masked_set_action(struct sk_buff *skb, const struct nlattr *a) diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c index faabeb2..722640a 100644 --- a/datapath/flow_netlink.c +++ b/datapath/flow_netlink.c @@ -325,6 +325,20 @@ static bool is_all_zero(const u8 *fp, size_t size) return true; } +static bool is_all_ones(const u8 *fp, size_t size) +{ + int i; + + if (!fp) + return false; + + for (i = 0; i < size; i++) + if (~fp[i]) + return false; + + return true; +} + static int __parse_flow_nlattrs(const struct nlattr *attr, const struct nlattr *a[], u64 *attrsp, bool nz) @@ -1646,13 +1660,17 @@ static int validate_set(const struct nlattr *a, err = validate_and_copy_set_tun(a, sfa); if (err) return err; - return 0; + break; case OVS_KEY_ATTR_IPV4: if (eth_type != htons(ETH_P_IP)) return -EINVAL; + if (!flow_key->ip.proto) + return -EINVAL; + ipv4_key = nla_data(ovs_key); + if (masked) { const struct ovs_key_ipv4 *mask = ipv4_key + 1; @@ -1660,9 +1678,6 @@ static int validate_set(const struct nlattr *a, if (mask->ipv4_proto || mask->ipv4_frag) return -EINVAL; } else { - if (!flow_key->ip.proto) - return -EINVAL; - if (ipv4_key->ipv4_proto != flow_key->ip.proto) return -EINVAL; @@ -1675,6 +1690,9 @@ static int validate_set(const struct nlattr *a, if (eth_type != htons(ETH_P_IPV6)) return -EINVAL; + if (!flow_key->ip.proto) + return -EINVAL; + ipv6_key = nla_data(ovs_key); if (masked) { const struct ovs_key_ipv6 *mask = ipv6_key + 1; @@ -1687,9 +1705,6 @@ static int validate_set(const struct nlattr *a, if (ntohl(mask->ipv6_label) & 0xFFF0) return -EINVAL; } else { - if (!flow_key->ip.proto) - return -EINVAL; - if (ipv6_key->ipv6_proto != flow_key->ip.proto) return -EINVAL; @@ -1734,35 +1749,36 @@ static int validate_set(const struct nlattr *a, return -EINVAL; } - /* Convert non-masked set actions to masked set actions. -* Tunnel set action returns before getting here. */ - if (!masked) { + /* Convert non-masked non-tunnel set actions to masked set actions. */ + if (!masked && key_type != OVS_KEY_ATTR_TUNNEL) { int start, len = key_len * 2; struct nlattr *at; *skip_copy = true; - start = add_nested_action_start(sfa, - OVS_ACTION_ATTR_SET_MASKED); +
[ovs-dev] [PATCH v5 09/13] lib/odp-util: Fix mapping to Netlink frag mask.
The frag member in the Netlink interface is an uint8_t enumeration type, not a bitrfield, so it should always be either fully masked or not masked at all. Signed-off-by: Jarno Rajahalme --- lib/odp-util.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index 8ac43c5..3c398ce 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -2485,20 +2485,20 @@ odp_flow_from_string(const char *s, const struct simap *port_names, static uint8_t ovs_to_odp_frag(uint8_t nw_frag) { -return (nw_frag == 0 ? OVS_FRAG_TYPE_NONE - : nw_frag == FLOW_NW_FRAG_ANY ? OVS_FRAG_TYPE_FIRST - : OVS_FRAG_TYPE_LATER); +return !(nw_frag & FLOW_NW_FRAG_ANY) ? OVS_FRAG_TYPE_NONE +: nw_frag & FLOW_NW_FRAG_LATER ? OVS_FRAG_TYPE_LATER +: OVS_FRAG_TYPE_FIRST; } +/* + * Netlink interface 'enum ovs_frag_type' is an 8-bit enumeration type, not a + * set of flags or bitfields. Hence, if the struct flow nw_frag mask, which is + * a set of bits, has the FLOW_NW_FRAG_ANY as zero, we must use a zero mask for + * the netlink frag field, and all ones mask otherwise. */ static uint8_t ovs_to_odp_frag_mask(uint8_t nw_frag_mask) { -uint8_t frag_mask = ~(OVS_FRAG_TYPE_FIRST | OVS_FRAG_TYPE_LATER); - -frag_mask |= (nw_frag_mask & FLOW_NW_FRAG_ANY) ? OVS_FRAG_TYPE_FIRST : 0; -frag_mask |= (nw_frag_mask & FLOW_NW_FRAG_LATER) ? OVS_FRAG_TYPE_LATER : 0; - -return frag_mask; +return (nw_frag_mask & FLOW_NW_FRAG_ANY) ? UINT8_MAX : 0; } static void @@ -2858,6 +2858,7 @@ odp_to_ovs_frag(uint8_t odp_frag, struct flow *flow) return false; } +flow->nw_frag = 0; if (odp_frag != OVS_FRAG_TYPE_NONE) { flow->nw_frag |= FLOW_NW_FRAG_ANY; if (odp_frag == OVS_FRAG_TYPE_LATER) { -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v5 05/13] lib/odp-util: Add tunnel tp_src, tp_dst parsing and formatting.
tp_src and tp_dst fields were recently added to struct flow_tnl, but parsing and printing was missing. Signed-off-by: Jarno Rajahalme --- lib/odp-util.c | 38 -- tests/bfd.at|6 ++-- tests/odp.at| 16 +-- tests/tunnel.at | 82 +++ 4 files changed, 81 insertions(+), 61 deletions(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index c3d213c..d205473 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -1318,19 +1318,26 @@ format_odp_key_attr(const struct nlattr *a, const struct nlattr *ma, ds_put_format(ds, "tun_id=%#"PRIx64"/%#"PRIx64 ",src="IP_FMT"/"IP_FMT",dst="IP_FMT"/"IP_FMT ",tos=%#"PRIx8"/%#"PRIx8",ttl=%"PRIu8"/%#"PRIx8 + ",tp_src=%"PRIu16"/%#"PRIx16 + ",tp_dst=%"PRIu16"/%#"PRIx16 ",flags(", ntohll(tun_key.tun_id), ntohll(tun_mask.tun_id), IP_ARGS(tun_key.ip_src), IP_ARGS(tun_mask.ip_src), IP_ARGS(tun_key.ip_dst), IP_ARGS(tun_mask.ip_dst), tun_key.ip_tos, tun_mask.ip_tos, - tun_key.ip_ttl, tun_mask.ip_ttl); + tun_key.ip_ttl, tun_mask.ip_ttl, + tun_key.tp_src, tun_mask.tp_src, + tun_key.tp_dst, tun_mask.tp_dst); } else { ds_put_format(ds, "tun_id=0x%"PRIx64",src="IP_FMT",dst="IP_FMT"," - "tos=0x%"PRIx8",ttl=%"PRIu8",flags(", + "tos=0x%"PRIx8",ttl=%"PRIu8"," + "tp_src=%"PRIu16",tp_dst=%"PRIu16"," + "flags(", ntohll(tun_key.tun_id), IP_ARGS(tun_key.ip_src), IP_ARGS(tun_key.ip_dst), - tun_key.ip_tos, tun_key.ip_ttl); + tun_key.ip_tos, tun_key.ip_ttl, + tun_key.tp_src, tun_key.tp_dst); } if (ma && ~tun_mask.flags & FLOW_TNL_F_MASK) { /* Partially masked. */ format_flags_masked(ds, NULL, flow_tun_flag_to_string, @@ -1923,6 +1930,7 @@ parse_odp_key_mask_attr(const char *s, const struct simap *port_names, { uint64_t tun_id, tun_id_mask; +int tp_src, tp_src_mask, tp_dst, tp_dst_mask; struct flow_tnl tun_key, tun_key_mask; int n = -1; @@ -1930,21 +1938,29 @@ parse_odp_key_mask_attr(const char *s, const struct simap *port_names, memset(&tun_key_mask, 0, sizeof tun_key_mask); if (mask && ovs_scan(s, "tunnel(tun_id=%"SCNi64"/%"SCNi64"," - "src="IP_SCAN_FMT"/"IP_SCAN_FMT",dst="IP_SCAN_FMT - "/"IP_SCAN_FMT",tos=%"SCNi8"/%"SCNi8"," - "ttl=%"SCNi8"/%"SCNi8",flags%n", + "src="IP_SCAN_FMT"/"IP_SCAN_FMT"," + "dst="IP_SCAN_FMT"/"IP_SCAN_FMT"," + "tos=%"SCNi8"/%"SCNi8"," + "ttl=%"SCNi8"/%"SCNi8"," + "tp_src=%i/%i,tp_dst=%i/%i,flags%n", &tun_id, &tun_id_mask, IP_SCAN_ARGS(&tun_key.ip_src), IP_SCAN_ARGS(&tun_key_mask.ip_src), IP_SCAN_ARGS(&tun_key.ip_dst), IP_SCAN_ARGS(&tun_key_mask.ip_dst), &tun_key.ip_tos, &tun_key_mask.ip_tos, - &tun_key.ip_ttl, &tun_key_mask.ip_ttl, &n)) { + &tun_key.ip_ttl, &tun_key_mask.ip_ttl, + &tp_src, &tp_src_mask, &tp_dst, &tp_dst_mask, + &n)) { int res; uint32_t flags, fmask; tun_key.tun_id = htonll(tun_id); tun_key_mask.tun_id = htonll(tun_id_mask); +tun_key.tp_src = htons(tp_src); +tun_key_mask.tp_src = htons(tp_src_mask); +tun_key.tp_dst = htons(tp_dst); +tun_key_mask.tp_dst = htons(tp_dst_mask); res = parse_flags(&s[n], flow_tun_flag_to_string, &flags, FLOW_TNL_F_MASK, &fmask); tun_key.flags = flags; @@ -1965,14 +1981,18 @@ parse_odp_key_mask_attr(const char *s, const struct simap *port_names, return n; } else if (ovs_scan(s, "tunnel(tun_id=%"SCNi64"," "src="IP_SCAN_FMT",dst="IP_SCAN_FMT -",tos=%"SCNi8",ttl=%"SCNi8",flags%n", &tun_id, +",tos=%"SCNi8",ttl=%"SCNi8",tp_src=%i,tp_dst=%i," +"flags%n", &tun_id, IP_SCAN_ARGS(&tun_key.ip_src),
[ovs-dev] [PATCH v5 04/13] lib: Unify flags parsing and formatting.
Use the "+-" syntax more uniformly when printing masked flags, and use the syntax of delimited 1-flags also for formatting fully masked TCP flags. The "+-" syntax only deals with masked flags, but if there are many of those, the printout becomes long and confusing. Typically there are many flags only when flags are fully masked, but even then most of them are zeros, so it makes sense to print the flags that are set (ones) and omit the zero flags. Signed-off-by: Jarno Rajahalme --- lib/flow.h|2 + lib/match.c | 11 ++- lib/odp-util.c| 180 - lib/odp-util.h|8 +++ tests/ofp-print.at| 16 ++--- tests/ofproto-dpif.at | 92 - 6 files changed, 205 insertions(+), 104 deletions(-) diff --git a/lib/flow.h b/lib/flow.h index 9c59037..e4c3b34 100644 --- a/lib/flow.h +++ b/lib/flow.h @@ -67,6 +67,8 @@ BUILD_ASSERT_DECL(FLOW_NW_FRAG_LATER == NX_IP_FRAG_LATER); #define FLOW_TNL_F_KEY (1 << 2) #define FLOW_TNL_F_OAM (1 << 3) +#define FLOW_TNL_F_MASK ((1 << 4) - 1) + const char *flow_tun_flag_to_string(uint32_t flags); /* Maximum number of supported MPLS labels. */ diff --git a/lib/match.c b/lib/match.c index 8364906..9d1f737 100644 --- a/lib/match.c +++ b/lib/match.c @@ -1182,9 +1182,16 @@ match_format(const struct match *match, struct ds *s, unsigned int priority) } if (is_ip_any(f) && f->nw_proto == IPPROTO_TCP && wc->masks.tcp_flags) { uint16_t mask = TCP_FLAGS(wc->masks.tcp_flags); + if (mask == TCP_FLAGS(OVS_BE16_MAX)) { -ds_put_format(s, "tcp_flags=0x%03"PRIx16",", ntohs(f->tcp_flags)); -} else { +ds_put_cstr(s, "tcp_flags="); +if (f->tcp_flags) { +format_flags(s, packet_tcp_flag_to_string, ntohs(f->tcp_flags), + '|'); +} else { +ds_put_cstr(s, "0x000"); /* Zero flags. */ +} +} else if (mask) { format_flags_masked(s, "tcp_flags", packet_tcp_flag_to_string, ntohs(f->tcp_flags), mask); } diff --git a/lib/odp-util.c b/lib/odp-util.c index dde97e5..c3d213c 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -207,22 +207,96 @@ slow_path_reason_to_explanation(enum slow_path_reason reason) static int parse_flags(const char *s, const char *(*bit_to_string)(uint32_t), -uint32_t *res) +uint32_t *res_flags, uint32_t allowed, uint32_t *res_mask) { uint32_t result = 0; -int n = 0; +int n; -if (s[n] != '(') { +/* Parse masked flags in numeric format? */ +if (res_mask && ovs_scan(s, "%"SCNi32"/%"SCNi32")%n", + res_flags, res_mask, &n) && n > 0) { +if (*res_flags & ~allowed || *res_mask & ~allowed) { +return -EINVAL; +} +return n; +} + +if (*s != '(') { return -EINVAL; } -n++; +n = 1; +if (res_mask && (s[n] == '+' || s[n] == '-')) { +uint32_t flags = 0, mask = 0; + +/* Parse masked flags. */ +while (s[n] != ')') { +bool set; +uint32_t bit; +int name_len; + +if (s[n] == '+') { +set = true; +} else if (s[n] == '-') { +set = false; +} else { +return -EINVAL; +} +n++; + +name_len = strcspn(s + n, "+-)"); + +for (bit = 1; bit; bit <<= 1) { +const char *fname = bit_to_string(bit); +size_t len; + +if (!fname) { +continue; +} + +len = strlen(fname); +if (len != name_len) { +continue; +} +if (!strncmp(s + n, fname, len)) { +if (mask & bit) { +/* bit already set. */ +return -EINVAL; +} +if (!(bit & allowed)) { +return -EINVAL; +} +if (set) { +flags |= bit; +} +mask |= bit; +break; +} +} + +if (!bit) { +return -EINVAL; /* Unknown flag name */ +} +s += name_len; +} +n++; + +*res_flags = flags; +*res_mask = mask; +return n; +} + +/* Parse unmasked flags. If a flag is present, it is set, otherwise + * it is not set. */ while (s[n] != ')') { unsigned long long int flags; uint32_t bit; int n0; if (ovs_scan(&s[n], "%lli%n", &flags, &n0)) { +if (flags & ~allowed) { +return -EINVAL; +} n += n0 + (s[n + n0] == '
[ovs-dev] [PATCH v5 06/13] lib/odp-util: Skip ignored fields when parsing and formatting.
When a whole field of a key value is ignored, skip it when formatting the key, and allow it to be left out when parsing the key from a string. However, when the unmasked bits have non-zero values (as in keys received from a datapath), or when the 'verbose' formatting is requested those are still formatted, as it may help in debugging. Now the named key fields can also be given in arbitrary order. Duplicate field values are not checked for, so the last one will remain in effect. Signed-off-by: Jarno Rajahalme --- lib/odp-util.c| 1830 + tests/dpif-netdev.at |4 +- tests/odp.at | 59 +- tests/ofproto-dpif.at | 16 +- tests/tunnel.at | 68 +- 5 files changed, 853 insertions(+), 1124 deletions(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index d205473..b5c8c2b 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -213,7 +213,7 @@ parse_flags(const char *s, const char *(*bit_to_string)(uint32_t), int n; /* Parse masked flags in numeric format? */ -if (res_mask && ovs_scan(s, "%"SCNi32"/%"SCNi32")%n", +if (res_mask && ovs_scan(s, "%"SCNi32"/%"SCNi32"%n", res_flags, res_mask, &n) && n > 0) { if (*res_flags & ~allowed || *res_mask & ~allowed) { return -EINVAL; @@ -221,12 +221,9 @@ parse_flags(const char *s, const char *(*bit_to_string)(uint32_t), return n; } -if (*s != '(') { -return -EINVAL; -} -n = 1; +n = 0; -if (res_mask && (s[n] == '+' || s[n] == '-')) { +if (res_mask && (*s == '+' || *s == '-')) { uint32_t flags = 0, mask = 0; /* Parse masked flags. */ @@ -279,7 +276,6 @@ parse_flags(const char *s, const char *(*bit_to_string)(uint32_t), } s += name_len; } -n++; *res_flags = flags; *res_mask = mask; @@ -326,7 +322,6 @@ parse_flags(const char *s, const char *(*bit_to_string)(uint32_t), return -EINVAL; } } -n++; *res_flags = result; if (res_mask) { @@ -425,14 +420,27 @@ format_odp_userspace_action(struct ds *ds, const struct nlattr *attr) } static void -format_vlan_tci(struct ds *ds, ovs_be16 vlan_tci) +format_vlan_tci(struct ds *ds, ovs_be16 tci, ovs_be16 mask, bool verbose) { -ds_put_format(ds, "vid=%"PRIu16",pcp=%d", - vlan_tci_to_vid(vlan_tci), - vlan_tci_to_pcp(vlan_tci)); -if (!(vlan_tci & htons(VLAN_CFI))) { -ds_put_cstr(ds, ",cfi=0"); +if (verbose || vlan_tci_to_vid(tci) || vlan_tci_to_vid(mask)) { +ds_put_format(ds, "vid=%"PRIu16, vlan_tci_to_vid(tci)); +if (vlan_tci_to_vid(mask) != VLAN_VID_MASK) { /* Partially masked. */ +ds_put_format(ds, "/0x%"PRIx16, vlan_tci_to_vid(mask)); +}; +ds_put_char(ds, ','); +} +if (verbose || vlan_tci_to_pcp(tci) || vlan_tci_to_pcp(mask)) { +ds_put_format(ds, "pcp=%d", vlan_tci_to_pcp(tci)); +if (vlan_tci_to_pcp(mask) != (VLAN_PCP_MASK >> VLAN_PCP_SHIFT)) { +ds_put_format(ds, "/0x%x", vlan_tci_to_pcp(mask)); +} +ds_put_char(ds, ','); +} +if (!(tci & htons(VLAN_CFI))) { +ds_put_cstr(ds, "cfi=0"); +ds_put_char(ds, ','); } +ds_chomp(ds, ','); } static void @@ -543,9 +551,9 @@ format_odp_action(struct ds *ds, const struct nlattr *a) mask->nla_len = attr->nla_len = NLA_HDRLEN + size; memcpy(attr + 1, (char *)(a + 1), size); memcpy(mask + 1, (char *)(a + 1) + size, size); -format_odp_key_attr(attr, mask, NULL, ds, true); +format_odp_key_attr(attr, mask, NULL, ds, false); } else { -format_odp_key_attr(a, NULL, NULL, ds, true); +format_odp_key_attr(a, NULL, NULL, ds, false); } ds_put_cstr(ds, ")"); break; @@ -560,7 +568,7 @@ format_odp_action(struct ds *ds, const struct nlattr *a) if (vlan->vlan_tpid != htons(ETH_TYPE_VLAN)) { ds_put_format(ds, "tpid=0x%04"PRIx16",", ntohs(vlan->vlan_tpid)); } -format_vlan_tci(ds, vlan->vlan_tci); +format_vlan_tci(ds, vlan->vlan_tci, OVS_BE16_MAX, false); ds_put_char(ds, ')'); break; case OVS_ACTION_ATTR_POP_VLAN: @@ -660,7 +668,7 @@ parse_odp_userspace_action(const char *s, struct ofpbuf *actions) cookie.sflow.output = output; user_data = &cookie; user_data_size = sizeof cookie.sflow; -} else if (ovs_scan(&s[n], ",slow_path%n", +} else if (ovs_scan(&s[n], ",slow_path(%n", &n1)) { int res; @@ -672,10 +680,10 @@ parse_odp_userspace_action(const char *s, struct ofpbuf *actions) res = parse_flags(&s[n], slow_path_reason_to_string, &cookie.slow_path.reason, SLOW_
[ovs-dev] [PATCH v5 02/13] lib/util: Change is_all_zeros and is_all_ones to take a void *.
is_all_zeros() and is_all_ones() operate on bytes, but just like with memset, it is easier to use if the first argument is a void *. Signed-off-by: Jarno Rajahalme --- lib/meta-flow.c | 14 +++--- lib/odp-util.c |7 +++ lib/util.c |6 -- lib/util.h |4 ++-- 4 files changed, 16 insertions(+), 15 deletions(-) diff --git a/lib/meta-flow.c b/lib/meta-flow.c index b76c11d..3b82e62 100644 --- a/lib/meta-flow.c +++ b/lib/meta-flow.c @@ -1057,8 +1057,8 @@ mf_is_mask_valid(const struct mf_field *mf, const union mf_value *mask) { switch (mf->maskable) { case MFM_NONE: -return (is_all_zeros((const uint8_t *) mask, mf->n_bytes) || -is_all_ones((const uint8_t *) mask, mf->n_bytes)); +return (is_all_zeros(mask, mf->n_bytes) || +is_all_ones(mask, mf->n_bytes)); case MFM_FULLY: return true; @@ -1904,7 +1904,7 @@ mf_is_zero(const struct mf_field *mf, const struct flow *flow) union mf_value value; mf_get_value(mf, flow, &value); -return is_all_zeros((const uint8_t *) &value, mf->n_bytes); +return is_all_zeros(&value, mf->n_bytes); } /* Makes 'match' wildcard field 'mf'. @@ -2130,10 +2130,10 @@ mf_set(const struct mf_field *mf, const union mf_value *value, const union mf_value *mask, struct match *match) { -if (!mask || is_all_ones((const uint8_t *) mask, mf->n_bytes)) { +if (!mask || is_all_ones(mask, mf->n_bytes)) { mf_set_value(mf, value, match); return mf->usable_protocols; -} else if (is_all_zeros((const uint8_t *) mask, mf->n_bytes)) { +} else if (is_all_zeros(mask, mf->n_bytes)) { mf_set_wild(mf, match); return OFPUTIL_P_ANY; } @@ -2861,10 +2861,10 @@ mf_format(const struct mf_field *mf, struct ds *s) { if (mask) { -if (is_all_zeros((const uint8_t *) mask, mf->n_bytes)) { +if (is_all_zeros(mask, mf->n_bytes)) { ds_put_cstr(s, "ANY"); return; -} else if (is_all_ones((const uint8_t *) mask, mf->n_bytes)) { +} else if (is_all_ones(mask, mf->n_bytes)) { mask = NULL; } } diff --git a/lib/odp-util.c b/lib/odp-util.c index 475502b..5c17f06 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -706,7 +706,7 @@ parse_odp_action(const char *s, const struct simap *port_names, size = nl_attr_get_size(mask); if (size == nl_attr_get_size(key)) { /* Change to masked set action if not fully masked. */ -if (!is_all_ones((uint8_t *)(mask + 1), size)) { +if (!is_all_ones(mask + 1, size)) { key->nla_len += size; ofpbuf_put(actions, mask + 1, size); /* 'actions' may have been reallocated by ofpbuf_put(). */ @@ -1097,7 +1097,7 @@ odp_mask_attr_is_exact(const struct nlattr *ma) | FLOW_TNL_F_OAM)) { /* The flags are exact match, check the remaining fields. */ tun_mask.flags = 0x; -is_exact = is_all_ones((uint8_t *)&tun_mask, +is_exact = is_all_ones(&tun_mask, offsetof(struct flow_tnl, ip_ttl)); } } else { @@ -3341,8 +3341,7 @@ parse_l2_5_onward(const struct nlattr *attrs[OVS_KEY_ATTR_MAX + 1], memcpy(flow->arp_sha, nd_key->nd_sll, ETH_ADDR_LEN); memcpy(flow->arp_tha, nd_key->nd_tll, ETH_ADDR_LEN); if (is_mask) { -if (!is_all_zeros((const uint8_t *) nd_key, - sizeof *nd_key) && +if (!is_all_zeros(nd_key, sizeof *nd_key) && (flow->tp_src != htons(0x) || flow->tp_dst != htons(0x))) { return ODP_FIT_ERROR; diff --git a/lib/util.c b/lib/util.c index 4493ee0..f3e47b1 100644 --- a/lib/util.c +++ b/lib/util.c @@ -1025,8 +1025,9 @@ const uint8_t count_1bits_8[256] = { /* Returns true if the 'n' bytes starting at 'p' are zeros. */ bool -is_all_zeros(const uint8_t *p, size_t n) +is_all_zeros(const void *p_, size_t n) { +const uint8_t *p = p_; size_t i; for (i = 0; i < n; i++) { @@ -1039,8 +1040,9 @@ is_all_zeros(const uint8_t *p, size_t n) /* Returns true if the 'n' bytes starting at 'p' are 0xff. */ bool -is_all_ones(const uint8_t *p, size_t n) +is_all_ones(const void *p_, size_t n) { +const uint8_t *p = p_; size_t i; for (i = 0; i < n; i++) { diff --git a/lib/util.h b/lib/util.h index dc34ee5..261b4b3 100644 --- a/lib/util.h +++ b/lib/util.h @@ -485,8 +485,8 @@ leftmost_1bit_idx(uint32_t x) return x ? log_2_floor(x) : 32; } -bool is_all_zeros(const uint8_t *, size_t); -bool is_all_ones(const uint8_t *, size_t); +bool is_all_zeros(const void *, size_t); +bool is_all_ones(const void *, size_t); void bitwi
[ovs-dev] [PATCH v5 03/13] lib/odp-util: Refine odp_mask_attr_is_exact().
Some attributes are exact matches even when all bits are not ones. Make odp_mask_attr_is_exact() to return true if the mask is set for all the bits we actually care about. Signed-off-by: Jarno Rajahalme --- lib/odp-util.c | 42 +++--- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index 5c17f06..dde97e5 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -1080,26 +1080,38 @@ odp_mask_attr_is_wildcard(const struct nlattr *ma) static bool odp_mask_attr_is_exact(const struct nlattr *ma) { -bool is_exact = false; +bool is_exact; enum ovs_key_attr attr = nl_attr_type(ma); -if (attr == OVS_KEY_ATTR_TUNNEL) { -/* XXX this is a hack for now. Should change - * the exact match dection to per field - * instead of per attribute. - */ +if (attr == OVS_KEY_ATTR_TCP_FLAGS) { +is_exact = TCP_FLAGS_BE16(nl_attr_get_be16(ma)) == htons(0x0fff); +} else if (attr == OVS_KEY_ATTR_IPV6) { +const struct ovs_key_ipv6 *mask = nl_attr_get(ma); + +is_exact = +(mask->ipv6_label & htonl(IPV6_LABEL_MASK)) == htonl(IPV6_LABEL_MASK) +&& mask->ipv6_proto == UINT8_MAX +&& mask->ipv6_tclass == UINT8_MAX +&& mask->ipv6_hlimit == UINT8_MAX +&& mask->ipv6_frag == UINT8_MAX +&& ipv6_mask_is_exact((const struct in6_addr *)mask->ipv6_src) +&& ipv6_mask_is_exact((const struct in6_addr *)mask->ipv6_dst); +} else if (attr == OVS_KEY_ATTR_TUNNEL) { struct flow_tnl tun_mask; + memset(&tun_mask, 0, sizeof tun_mask); odp_tun_key_from_attr(ma, &tun_mask); -if (tun_mask.flags == (FLOW_TNL_F_KEY - | FLOW_TNL_F_DONT_FRAGMENT - | FLOW_TNL_F_CSUM - | FLOW_TNL_F_OAM)) { -/* The flags are exact match, check the remaining fields. */ -tun_mask.flags = 0x; -is_exact = is_all_ones(&tun_mask, - offsetof(struct flow_tnl, ip_ttl)); -} +is_exact = tun_mask.flags == (FLOW_TNL_F_KEY + | FLOW_TNL_F_DONT_FRAGMENT + | FLOW_TNL_F_CSUM + | FLOW_TNL_F_OAM) +&& tun_mask.tun_id == OVS_BE64_MAX +&& tun_mask.ip_src == OVS_BE32_MAX +&& tun_mask.ip_dst == OVS_BE32_MAX +&& tun_mask.ip_tos == UINT8_MAX +&& tun_mask.ip_ttl == UINT8_MAX +&& tun_mask.tp_src == OVS_BE16_MAX +&& tun_mask.tp_dst == OVS_BE16_MAX; } else { is_exact = is_all_ones(nl_attr_get(ma), nl_attr_get_size(ma)); } -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v5 08/13] lib/odp: Use masked set actions.
Signed-off-by: Jarno Rajahalme --- v5: Using pattern with less code duplication suggested by Ben. lib/odp-util.c | 487 ++ lib/odp-util.h |5 +- ofproto/ofproto-dpif-xlate.c | 15 +- tests/ofproto-dpif.at| 66 +++--- tests/tunnel.at |8 +- 5 files changed, 344 insertions(+), 237 deletions(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index b5c8c2b..8ac43c5 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -1167,43 +1167,67 @@ odp_mask_attr_is_wildcard(const struct nlattr *ma) } static bool -odp_mask_attr_is_exact(const struct nlattr *ma) +odp_mask_is_exact(enum ovs_key_attr attr, const void *mask, size_t size) { -bool is_exact; -enum ovs_key_attr attr = nl_attr_type(ma); - if (attr == OVS_KEY_ATTR_TCP_FLAGS) { -is_exact = TCP_FLAGS(nl_attr_get_be16(ma)) == TCP_FLAGS(OVS_BE16_MAX); -} else if (attr == OVS_KEY_ATTR_IPV6) { -const struct ovs_key_ipv6 *mask = nl_attr_get(ma); +return TCP_FLAGS(*(ovs_be16 *)mask) == TCP_FLAGS(OVS_BE16_MAX); +} +if (attr == OVS_KEY_ATTR_IPV6) { +const struct ovs_key_ipv6 *ipv6_mask = mask; -is_exact = -((mask->ipv6_label & htonl(IPV6_LABEL_MASK)) +return +((ipv6_mask->ipv6_label & htonl(IPV6_LABEL_MASK)) == htonl(IPV6_LABEL_MASK)) -&& mask->ipv6_proto == UINT8_MAX -&& mask->ipv6_tclass == UINT8_MAX -&& mask->ipv6_hlimit == UINT8_MAX -&& mask->ipv6_frag == UINT8_MAX -&& ipv6_mask_is_exact((const struct in6_addr *)mask->ipv6_src) -&& ipv6_mask_is_exact((const struct in6_addr *)mask->ipv6_dst); -} else if (attr == OVS_KEY_ATTR_TUNNEL) { -struct flow_tnl tun_mask; +&& ipv6_mask->ipv6_proto == UINT8_MAX +&& ipv6_mask->ipv6_tclass == UINT8_MAX +&& ipv6_mask->ipv6_hlimit == UINT8_MAX +&& ipv6_mask->ipv6_frag == UINT8_MAX +&& ipv6_mask_is_exact((const struct in6_addr *)ipv6_mask->ipv6_src) +&& ipv6_mask_is_exact((const struct in6_addr *)ipv6_mask->ipv6_dst); +} +if (attr == OVS_KEY_ATTR_TUNNEL) { +const struct flow_tnl *tun_mask = mask; + +return tun_mask->flags == FLOW_TNL_F_MASK +&& tun_mask->tun_id == OVS_BE64_MAX +&& tun_mask->ip_src == OVS_BE32_MAX +&& tun_mask->ip_dst == OVS_BE32_MAX +&& tun_mask->ip_tos == UINT8_MAX +&& tun_mask->ip_ttl == UINT8_MAX +&& tun_mask->tp_src == OVS_BE16_MAX +&& tun_mask->tp_dst == OVS_BE16_MAX; +} + +if (attr == OVS_KEY_ATTR_ARP) { +/* ARP key has padding, ignore it. */ +BUILD_ASSERT_DECL(sizeof(struct ovs_key_arp) == 24); +BUILD_ASSERT_DECL(offsetof(struct ovs_key_arp, arp_tha) == 10 + 6); +size = offsetof(struct ovs_key_arp, arp_tha) + ETH_ADDR_LEN; +ovs_assert(((uint16_t *)mask)[size/2] == 0); +} +return is_all_ones(mask, size); +} + +static bool +odp_mask_attr_is_exact(const struct nlattr *ma) +{ +struct flow_tnl tun_mask; +enum ovs_key_attr attr = nl_attr_type(ma); +const void *mask; +size_t size; + +if (attr == OVS_KEY_ATTR_TUNNEL) { memset(&tun_mask, 0, sizeof tun_mask); odp_tun_key_from_attr(ma, &tun_mask); -is_exact = tun_mask.flags == FLOW_TNL_F_MASK -&& tun_mask.tun_id == OVS_BE64_MAX -&& tun_mask.ip_src == OVS_BE32_MAX -&& tun_mask.ip_dst == OVS_BE32_MAX -&& tun_mask.ip_tos == UINT8_MAX -&& tun_mask.ip_ttl == UINT8_MAX -&& tun_mask.tp_src == OVS_BE16_MAX -&& tun_mask.tp_dst == OVS_BE16_MAX; +mask = &tun_mask; +size = sizeof tun_mask; } else { -is_exact = is_all_ones(nl_attr_get(ma), nl_attr_get_size(ma)); +mask = nl_attr_get(ma); +size = nl_attr_get_size(ma); } -return is_exact; +return odp_mask_is_exact(attr, mask, size); } void @@ -3525,14 +3549,6 @@ commit_masked_set_action(struct ofpbuf *odp_actions, nl_msg_end_nested(odp_actions, offset); } -void -odp_put_pkt_mark_action(const uint32_t pkt_mark, -struct ofpbuf *odp_actions) -{ -commit_set_action(odp_actions, OVS_KEY_ATTR_SKB_MARK, &pkt_mark, - sizeof(pkt_mark)); -} - /* If any of the flow key data that ODP actions can modify are different in * 'base->tunnel' and 'flow->tunnel', appends a set_tunnel ODP action to * 'odp_actions' that change the flow tunneling information in key from @@ -3553,29 +3569,63 @@ commit_odp_tunnel_action(const struct flow *flow, struct flow *base, } } -static void -commit_set_ether_addr_action(const struct flow *flow, struct flow *base, - struct ofpbuf *odp_actions, - struct flow_wildcards *wc) +static b
[ovs-dev] [PATCH v5 12/13] datapath: Relax flow set-up time validations.
Allow setting of fields without matching on the same fields. Field existence check is done on set action execution time instead, using the extracted flow key. Signed-off-by: Jarno Rajahalme --- datapath/actions.c | 21 - datapath/flow_netlink.c | 46 +++--- 2 files changed, 35 insertions(+), 32 deletions(-) diff --git a/datapath/actions.c b/datapath/actions.c index 243b672..3b2de3f 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -470,6 +470,10 @@ static int set_ipv4(struct sk_buff *skb, const struct ovs_key_ipv4 *key, __be32 new_addr; int err; + /* eth.type is checked at flow install time already. */ + if (!OVS_CB(skb)->pkt_key->ip.proto) + return -EINVAL; + err = make_writable(skb, skb_network_offset(skb) + sizeof(struct iphdr)); if (unlikely(err)) @@ -519,6 +523,10 @@ static int set_ipv6(struct sk_buff *skb, const struct ovs_key_ipv6 *key, struct ipv6hdr *nh; int err; + /* eth.type is checked at flow install time already. */ + if (!OVS_CB(skb)->pkt_key->ip.proto) + return -EINVAL; + err = make_writable(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr)); if (unlikely(err)) @@ -593,6 +601,10 @@ static int set_udp(struct sk_buff *skb, const struct ovs_key_udp *key, __be16 src, dst; int err; + /* Must have extracted ports. */ + if (!OVS_CB(skb)->pkt_key->tp.src && !OVS_CB(skb)->pkt_key->tp.dst) + return -EINVAL; + err = make_writable(skb, skb_transport_offset(skb) + sizeof(struct udphdr)); if (unlikely(err)) @@ -634,6 +646,10 @@ static int set_tcp(struct sk_buff *skb, const struct ovs_key_tcp *key, __be16 src, dst; int err; + /* Must have extracted ports. */ + if (!OVS_CB(skb)->pkt_key->tp.src && !OVS_CB(skb)->pkt_key->tp.dst) + return -EINVAL; + err = make_writable(skb, skb_transport_offset(skb) + sizeof(struct tcphdr)); if (unlikely(err)) @@ -664,6 +680,10 @@ static int set_sctp(struct sk_buff *skb, const struct ovs_key_sctp *key, __le32 old_correct_csum, new_csum, old_csum; int err; + /* Must have extracted ports. */ + if (!OVS_CB(skb)->pkt_key->tp.src && !OVS_CB(skb)->pkt_key->tp.dst) + return -EINVAL; + err = make_writable(skb, sctphoff + sizeof(struct sctphdr)); if (unlikely(err)) return err; @@ -821,7 +841,6 @@ static int execute_set_action(struct sk_buff *skb, const struct nlattr *a) } return -EINVAL; - } /* Mask is at the midpoint of the data. */ diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c index ec32a00..09aaf41 100644 --- a/datapath/flow_netlink.c +++ b/datapath/flow_netlink.c @@ -1511,16 +1511,6 @@ static int validate_and_copy_sample(const struct nlattr *attr, return 0; } -static int validate_tp_port(const struct sw_flow_key *flow_key, - __be16 eth_type) -{ - if ((eth_type == htons(ETH_P_IP) || eth_type == htons(ETH_P_IPV6)) && - (flow_key->tp.src || flow_key->tp.dst)) - return 0; - - return -EINVAL; -} - void ovs_match_init(struct sw_flow_match *match, struct sw_flow_key *key, struct sw_flow_mask *mask) @@ -1666,9 +1656,6 @@ static int validate_set(const struct nlattr *a, if (eth_type != htons(ETH_P_IP)) return -EINVAL; - if (!flow_key->ip.proto) - return -EINVAL; - ipv4_key = nla_data(ovs_key); if (masked) { @@ -1690,9 +1677,6 @@ static int validate_set(const struct nlattr *a, if (eth_type != htons(ETH_P_IPV6)) return -EINVAL; - if (!flow_key->ip.proto) - return -EINVAL; - ipv6_key = nla_data(ovs_key); if (masked) { const struct ovs_key_ipv6 *mask = ipv6_key + 1; @@ -1717,35 +1701,35 @@ static int validate_set(const struct nlattr *a, break; case OVS_KEY_ATTR_TCP: - if (flow_key->ip.proto != IPPROTO_TCP) + if (eth_type != htons(ETH_P_IP) && + eth_type != htons(ETH_P_IPV6)) return -EINVAL; - err = validate_tp_port(flow_key, eth_type); - if (err) - return err; + if (flow_key->ip.proto != IPPROTO_TCP) + return -EINVAL; break; case OVS_KEY_ATTR_UDP: - if (flow_key->ip.proto != IPPROTO_UDP) + if (eth_type != htons(ETH_P_IP) && +
[ovs-dev] [PATCH v5 07/13] ofproto: Probe for masked set action support.
Signed-off-by: Jarno Rajahalme Reviewed-by: YAMAMOTO Takashi Acked-by: Ben Pfaff --- v5: rebase, no other changes. ofproto/ofproto-dpif-xlate.c | 18 + ofproto/ofproto-dpif-xlate.h |3 ++- ofproto/ofproto-dpif.c | 58 +- 3 files changed, 72 insertions(+), 7 deletions(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 6e33a27..02b9538 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -106,6 +106,10 @@ struct xbridge { /* Number of MPLS label stack entries that the datapath supports * in matches. */ size_t max_mpls_depth; + +/* True if the datapath supports masked data in OVS_ACTION_ATTR_SET + * actions. */ +bool masked_set_action; }; struct xbundle { @@ -367,7 +371,8 @@ static void xlate_xbridge_set(struct xbridge *xbridge, bool forward_bpdu, bool has_in_band, bool enable_recirc, bool variable_length_userdata, - size_t max_mpls_depth); + size_t max_mpls_depth, + bool masked_set_action); static void xlate_xbundle_set(struct xbundle *xbundle, enum port_vlan_mode vlan_mode, int vlan, unsigned long *trunks, bool use_priority_tags, @@ -431,7 +436,8 @@ xlate_xbridge_set(struct xbridge *xbridge, bool forward_bpdu, bool has_in_band, bool enable_recirc, bool variable_length_userdata, - size_t max_mpls_depth) + size_t max_mpls_depth, + bool masked_set_action) { if (xbridge->ml != ml) { mac_learning_unref(xbridge->ml); @@ -477,6 +483,7 @@ xlate_xbridge_set(struct xbridge *xbridge, xbridge->enable_recirc = enable_recirc; xbridge->variable_length_userdata = variable_length_userdata; xbridge->max_mpls_depth = max_mpls_depth; +xbridge->masked_set_action = masked_set_action; } static void @@ -552,7 +559,7 @@ xlate_xbridge_copy(struct xbridge *xbridge) xbridge->ipfix, xbridge->netflow, xbridge->frag, xbridge->forward_bpdu, xbridge->has_in_band, xbridge->enable_recirc, xbridge->variable_length_userdata, - xbridge->max_mpls_depth); + xbridge->max_mpls_depth, xbridge->masked_set_action); LIST_FOR_EACH (xbundle, list_node, &xbridge->xbundles) { xlate_xbundle_copy(new_xbridge, xbundle); } @@ -706,7 +713,8 @@ xlate_ofproto_set(struct ofproto_dpif *ofproto, const char *name, bool forward_bpdu, bool has_in_band, bool enable_recirc, bool variable_length_userdata, - size_t max_mpls_depth) + size_t max_mpls_depth, + bool masked_set_action) { struct xbridge *xbridge; @@ -726,7 +734,7 @@ xlate_ofproto_set(struct ofproto_dpif *ofproto, const char *name, xlate_xbridge_set(xbridge, dpif, miss_rule, no_packet_in_rule, ml, stp, ms, mbridge, sflow, ipfix, netflow, frag, forward_bpdu, has_in_band, enable_recirc, variable_length_userdata, - max_mpls_depth); + max_mpls_depth, masked_set_action); } static void diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h index b0dfc18..b9f95bc 100644 --- a/ofproto/ofproto-dpif-xlate.h +++ b/ofproto/ofproto-dpif-xlate.h @@ -153,7 +153,8 @@ void xlate_ofproto_set(struct ofproto_dpif *, const char *name, enum ofp_config_flags, bool forward_bpdu, bool has_in_band, bool enable_recirc, bool variable_length_userdata, - size_t mpls_label_stack_length); + size_t mpls_label_stack_length, + bool masked_set_action); void xlate_remove_ofproto(struct ofproto_dpif *); void xlate_bundle_set(struct ofproto_dpif *, struct ofbundle *, diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index a635ca6..e2b746d 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -263,6 +263,10 @@ struct dpif_backer { * False if the datapath supports only 8-byte (or shorter) userdata. */ bool variable_length_userdata; +/* True if the datapath supports masked data in OVS_ACTION_ATTR_SET + * actions. */ +bool masked_set_action; + /* Maximum number of MPLS label stack entries that the datapath supports * in a match */ size_t max_mpls_depth; @@ -602,7 +606,8 @@ type_run(const char *type) connmgr_has_in_band(ofproto->up.connmgr), ofproto->backer->enable_recirc,
[ovs-dev] [PATCH v5 13/13] datapath: Relax match_validate.
When userspace inserts masked flows, it is not necessary to demand that flows matching in a known ethertype also must have the corresponding key, as a missing key will be automatically wildcarded. For example, if a drop flow dropping all UDP packets is installed, this patch allows the userspace application to not specify the OVS_KEY_ATTR_UDP key. Signed-off-by: Jarno Rajahalme --- datapath/flow_netlink.c | 23 --- 1 file changed, 23 deletions(-) diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c index 09aaf41..690f0e9 100644 --- a/datapath/flow_netlink.c +++ b/datapath/flow_netlink.c @@ -114,7 +114,6 @@ static void update_range(struct sw_flow_match *match, static bool match_validate(const struct sw_flow_match *match, u64 key_attrs, u64 mask_attrs) { - u64 key_expected = 1ULL << OVS_KEY_ATTR_ETHERNET; u64 mask_allowed = key_attrs; /* At most allow all key attributes */ /* The following mask attributes allowed only if they @@ -139,39 +138,32 @@ static bool match_validate(const struct sw_flow_match *match, /* Check key attributes. */ if (match->key->eth.type == htons(ETH_P_ARP) || match->key->eth.type == htons(ETH_P_RARP)) { - key_expected |= 1ULL << OVS_KEY_ATTR_ARP; if (match->mask && (match->mask->key.eth.type == htons(0x))) mask_allowed |= 1ULL << OVS_KEY_ATTR_ARP; } if (eth_p_mpls(match->key->eth.type)) { - key_expected |= 1ULL << OVS_KEY_ATTR_MPLS; if (match->mask && (match->mask->key.eth.type == htons(0x))) mask_allowed |= 1ULL << OVS_KEY_ATTR_MPLS; } if (match->key->eth.type == htons(ETH_P_IP)) { - key_expected |= 1ULL << OVS_KEY_ATTR_IPV4; if (match->mask && (match->mask->key.eth.type == htons(0x))) mask_allowed |= 1ULL << OVS_KEY_ATTR_IPV4; if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { if (match->key->ip.proto == IPPROTO_UDP) { - key_expected |= 1ULL << OVS_KEY_ATTR_UDP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1ULL << OVS_KEY_ATTR_UDP; } if (match->key->ip.proto == IPPROTO_SCTP) { - key_expected |= 1ULL << OVS_KEY_ATTR_SCTP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1ULL << OVS_KEY_ATTR_SCTP; } if (match->key->ip.proto == IPPROTO_TCP) { - key_expected |= 1ULL << OVS_KEY_ATTR_TCP; - key_expected |= 1ULL << OVS_KEY_ATTR_TCP_FLAGS; if (match->mask && (match->mask->key.ip.proto == 0xff)) { mask_allowed |= 1ULL << OVS_KEY_ATTR_TCP; mask_allowed |= 1ULL << OVS_KEY_ATTR_TCP_FLAGS; @@ -179,7 +171,6 @@ static bool match_validate(const struct sw_flow_match *match, } if (match->key->ip.proto == IPPROTO_ICMP) { - key_expected |= 1ULL << OVS_KEY_ATTR_ICMP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1ULL << OVS_KEY_ATTR_ICMP; } @@ -187,26 +178,21 @@ static bool match_validate(const struct sw_flow_match *match, } if (match->key->eth.type == htons(ETH_P_IPV6)) { - key_expected |= 1ULL << OVS_KEY_ATTR_IPV6; if (match->mask && (match->mask->key.eth.type == htons(0x))) mask_allowed |= 1ULL << OVS_KEY_ATTR_IPV6; if (match->key->ip.frag != OVS_FRAG_TYPE_LATER) { if (match->key->ip.proto == IPPROTO_UDP) { - key_expected |= 1ULL << OVS_KEY_ATTR_UDP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1ULL << OVS_KEY_ATTR_UDP; } if (match->key->ip.proto == IPPROTO_SCTP) { - key_expected |= 1ULL << OVS_KEY_ATTR_SCTP; if (match->mask && (match->mask->key.ip.proto == 0xff)) mask_allowed |= 1ULL << OVS_KEY_ATTR_SCTP; } if (match->key->ip.proto == IPPROTO_TCP) { - key_expected |= 1ULL << OVS_KEY_ATTR_TCP; -
[ovs-dev] [PATCH v5 01/13] lib/odp: Masked set action execution and printing.
Add a new action type OVS_ACTION_ATTR_SET_MASKED, and support for parsing, printing, and committing them. Masked set actions add a mask, immediately following the netlink attribute data, within the netlink attribute itself. Thus the key attribute size for a masked set action is exactly double of the non-masked set action. Signed-off-by: Jarno Rajahalme --- v5: Addressed Ben's comments, moved OVS_KEY_ATTR_SET_MASKED comment to this patch. datapath/linux/compat/include/linux/openvswitch.h | 10 + lib/dpif-netdev.c |1 + lib/dpif.c|1 + lib/odp-execute.c | 241 +++-- lib/odp-util.c| 77 ++- lib/odp-util.h|3 + tests/odp.at | 22 ++ 7 files changed, 332 insertions(+), 23 deletions(-) diff --git a/datapath/linux/compat/include/linux/openvswitch.h b/datapath/linux/compat/include/linux/openvswitch.h index 9c18b3b..6910dc4 100644 --- a/datapath/linux/compat/include/linux/openvswitch.h +++ b/datapath/linux/compat/include/linux/openvswitch.h @@ -592,6 +592,12 @@ struct ovs_action_hash { * @OVS_ACTION_ATTR_SET: Replaces the contents of an existing header. The * single nested %OVS_KEY_ATTR_* attribute specifies a header to modify and its * value. + * @OVS_ACTION_ATTR_SET_MASKED: Replaces the contents of an existing header. A + * nested %OVS_KEY_ATTR_* attribute specifies a header to modify, its value, + * and a mask. For every bit set in the mask, the corresponding bit value + * is copied from the value to the packet header field, rest of the bits are + * left unchanged. The non-masked value bits must be passed in as zeroes. + * Masking is not supported for the %OVS_KEY_ATTR_TUNNEL attribute. * @OVS_ACTION_RECIRC: Recirculate within the data path. * @OVS_ACTION_HASH: Compute and set flow hash value. * @OVS_ACTION_ATTR_PUSH_MPLS: Push a new MPLS label stack entry onto the @@ -621,6 +627,10 @@ enum ovs_action_attr { OVS_ACTION_ATTR_HASH, /* struct ovs_action_hash. */ OVS_ACTION_ATTR_PUSH_MPLS,/* struct ovs_action_push_mpls. */ OVS_ACTION_ATTR_POP_MPLS, /* __be16 ethertype. */ + OVS_ACTION_ATTR_SET_MASKED, /* One nested OVS_KEY_ATTR_* including + * data immediately followed by a mask. + * The data must be zero for the unmasked + * bits. */ __OVS_ACTION_ATTR_MAX }; diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 869fb55..409c9bf 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2558,6 +2558,7 @@ dp_execute_cb(void *aux_, struct dpif_packet **packets, int cnt, case OVS_ACTION_ATTR_PUSH_MPLS: case OVS_ACTION_ATTR_POP_MPLS: case OVS_ACTION_ATTR_SET: +case OVS_ACTION_ATTR_SET_MASKED: case OVS_ACTION_ATTR_SAMPLE: case OVS_ACTION_ATTR_UNSPEC: case __OVS_ACTION_ATTR_MAX: diff --git a/lib/dpif.c b/lib/dpif.c index b2a1972..bf2c5f9 100644 --- a/lib/dpif.c +++ b/lib/dpif.c @@ -1041,6 +1041,7 @@ dpif_execute_helper_cb(void *aux_, struct dpif_packet **packets, int cnt, case OVS_ACTION_ATTR_PUSH_MPLS: case OVS_ACTION_ATTR_POP_MPLS: case OVS_ACTION_ATTR_SET: +case OVS_ACTION_ATTR_SET_MASKED: case OVS_ACTION_ATTR_SAMPLE: case OVS_ACTION_ATTR_UNSPEC: case __OVS_ACTION_ATTR_MAX: diff --git a/lib/odp-execute.c b/lib/odp-execute.c index bfcd6ac..aba6a19 100644 --- a/lib/odp-execute.c +++ b/lib/odp-execute.c @@ -17,6 +17,8 @@ #include #include "odp-execute.h" +#include +#include #include #include @@ -31,16 +33,110 @@ #include "unaligned.h" #include "util.h" +/* Masked copy of an ethernet address. 'src' is already properly masked. */ static void -odp_eth_set_addrs(struct ofpbuf *packet, - const struct ovs_key_ethernet *eth_key) +ether_addr_copy_masked(uint8_t *dst, const uint8_t *src, + const uint8_t *mask) +{ +int i; + +for (i = 0; i < ETH_ADDR_LEN; i++) { +dst[i] = src[i] | (dst[i] & ~mask[i]); +} +} + +static void +odp_eth_set_addrs(struct ofpbuf *packet, const struct ovs_key_ethernet *key, + const struct ovs_key_ethernet *mask) { struct eth_header *eh = ofpbuf_l2(packet); if (eh) { -memcpy(eh->eth_src, eth_key->eth_src, sizeof eh->eth_src); -memcpy(eh->eth_dst, eth_key->eth_dst, sizeof eh->eth_dst); +if (!mask) { +memcpy(eh->eth_src, key->eth_src, sizeof eh->eth_src); +memcpy(eh->eth_dst, key->eth_dst, sizeof eh->eth_dst); +} else { +ether_addr_copy_masked(eh->eth_src, key->eth_src, mask->eth_src); +ether_addr_copy_masked(eh->eth_dst, key->eth_dst, mask->eth_dst); +} +} +} + +static void +odp_set_ipv4(struct ofpbuf
[ovs-dev] [PATCH v5 11/13] datapath: Allow masks for set actions.
Masked set action allows more megaflow wildcarding. Masked set action is now supported for all writeable key types, except for the tunnel key. The set tunnel action is an exception as any input tunnel info is cleared before action processing starts, so there is no tunnel info to mask. The kernel module converts all (non-tunnel) set actions to masked set actions. This makes action processing more uniform, and results in less branching and duplicating the action processing code. When returning actions to userspace, the fully masked set actions are converted to normal set actions. To make this mapping consistent, we reject fully masked set actions at flow set-up time. Signed-off-by: Jarno Rajahalme --- v5: Fixed issues spotted by Jesse. datapath/actions.c | 333 ++- datapath/flow_netlink.c | 161 --- 2 files changed, 357 insertions(+), 137 deletions(-) diff --git a/datapath/actions.c b/datapath/actions.c index 43ca2a0..243b672 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -225,9 +225,15 @@ static int pop_mpls(struct sk_buff *skb, const __be16 ethertype) return 0; } -static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse) +/* 'KEY' must not have any bits set outside of the 'MASK' */ +#define MASKED(OLD, KEY, MASK) ((KEY) | ((OLD) & ~(MASK))) +#define SET_MASKED(OLD, KEY, MASK) (OLD) = MASKED(OLD, KEY, MASK) + +static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse, + const __be32 *mask) { __be32 *stack = (__be32 *)mac_header_end(skb); + __be32 lse = MASKED(*stack, *mpls_lse, *mask); int err; err = make_writable(skb, skb->mac_len + MPLS_HLEN); @@ -235,13 +241,13 @@ static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse) return err; if (skb->ip_summed == CHECKSUM_COMPLETE) { - __be32 diff[] = { ~(*stack), *mpls_lse }; + __be32 diff[] = { ~(*stack), lse }; skb->csum = ~csum_partial((char *)diff, sizeof(diff), ~skb->csum); } - *stack = *mpls_lse; - flow_key_set_mpls_top_lse(skb, *stack); + *stack = lse; + flow_key_set_mpls_top_lse(skb, lse); return 0; } @@ -331,8 +337,21 @@ static int push_vlan(struct sk_buff *skb, const struct ovs_action_push_vlan *vla return 0; } +/* 'src' is already properly masked. */ +static void ether_addr_copy_masked(u8 *dst_, const u8 *src_, const u8 *mask_) +{ + u16 *dst = (u16 *)dst_; + const u16 *src = (const u16 *)src_; + const u16 *mask = (const u16 *)mask_; + + SET_MASKED(dst[0], src[0], mask[0]); + SET_MASKED(dst[1], src[1], mask[1]); + SET_MASKED(dst[2], src[2], mask[2]); +} + static int set_eth_addr(struct sk_buff *skb, - const struct ovs_key_ethernet *eth_key) + const struct ovs_key_ethernet *key, + const struct ovs_key_ethernet *mask) { int err; err = make_writable(skb, ETH_HLEN); @@ -341,13 +360,15 @@ static int set_eth_addr(struct sk_buff *skb, skb_postpull_rcsum(skb, eth_hdr(skb), ETH_ALEN * 2); - ether_addr_copy(eth_hdr(skb)->h_source, eth_key->eth_src); - ether_addr_copy(eth_hdr(skb)->h_dest, eth_key->eth_dst); + ether_addr_copy_masked(eth_hdr(skb)->h_source, key->eth_src, + mask->eth_src); + ether_addr_copy_masked(eth_hdr(skb)->h_dest, key->eth_dst, + mask->eth_dst); ovs_skb_postpush_rcsum(skb, eth_hdr(skb), ETH_ALEN * 2); - flow_key_set_eth_src(skb, eth_key->eth_src); - flow_key_set_eth_dst(skb, eth_key->eth_dst); + flow_key_set_eth_src(skb, eth_hdr(skb)->h_source); + flow_key_set_eth_dst(skb, eth_hdr(skb)->h_dest); return 0; } @@ -405,6 +426,15 @@ static void update_ipv6_checksum(struct sk_buff *skb, u8 l4_proto, } } +static void mask_ipv6_addr(const __be32 old[4], const __be32 addr[4], + const __be32 mask[4], __be32 masked[4]) +{ + masked[0] = MASKED(old[0], addr[0], mask[0]); + masked[1] = MASKED(old[1], addr[1], mask[1]); + masked[2] = MASKED(old[2], addr[2], mask[2]); + masked[3] = MASKED(old[3], addr[3], mask[3]); +} + static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto, __be32 addr[4], const __be32 new_addr[4], bool recalculate_csum) @@ -416,28 +446,28 @@ static void set_ipv6_addr(struct sk_buff *skb, u8 l4_proto, memcpy(addr, new_addr, sizeof(__be32[4])); } -static void set_ipv6_tc(struct ipv6hdr *nh, u8 tc) +static void set_ipv6_fl(struct ipv6hdr *nh, u32 fl, u32 mask) { - nh->priority = tc >> 4; - nh->flow_lbl[0] = (nh->flow_lbl[0] & 0x0F) | ((tc & 0x0F) << 4); + /* Bits 21-24 are always unmasked, so this re
[ovs-dev] [PATCH v5 10/13] lib/odp-util: Reduce duplicated code.
Signed-off-by: Jarno Rajahalme --- lib/odp-util.c | 217 1 file changed, 94 insertions(+), 123 deletions(-) diff --git a/lib/odp-util.c b/lib/odp-util.c index 3c398ce..967ed27 100644 --- a/lib/odp-util.c +++ b/lib/odp-util.c @@ -2483,23 +2483,43 @@ odp_flow_from_string(const char *s, const struct simap *port_names, } static uint8_t -ovs_to_odp_frag(uint8_t nw_frag) +ovs_to_odp_frag(uint8_t nw_frag, bool is_mask) { +if (is_mask) { +/* Netlink interface 'enum ovs_frag_type' is an 8-bit enumeration type, + * not a set of flags or bitfields. Hence, if the struct flow nw_frag + * mask, which is a set of bits, has the FLOW_NW_FRAG_ANY as zero, we + * must use a zero mask for the netlink frag field, and all ones mask + * otherwise. */ +return (nw_frag & FLOW_NW_FRAG_ANY) ? UINT8_MAX : 0; +} return !(nw_frag & FLOW_NW_FRAG_ANY) ? OVS_FRAG_TYPE_NONE : nw_frag & FLOW_NW_FRAG_LATER ? OVS_FRAG_TYPE_LATER : OVS_FRAG_TYPE_FIRST; } -/* - * Netlink interface 'enum ovs_frag_type' is an 8-bit enumeration type, not a - * set of flags or bitfields. Hence, if the struct flow nw_frag mask, which is - * a set of bits, has the FLOW_NW_FRAG_ANY as zero, we must use a zero mask for - * the netlink frag field, and all ones mask otherwise. */ -static uint8_t -ovs_to_odp_frag_mask(uint8_t nw_frag_mask) -{ -return (nw_frag_mask & FLOW_NW_FRAG_ANY) ? UINT8_MAX : 0; -} +static void get_ethernet_key(const struct flow *, struct ovs_key_ethernet *); +static void put_ethernet_key(const struct ovs_key_ethernet *, struct flow *); +static void get_ipv4_key(const struct flow *, struct ovs_key_ipv4 *, + bool is_mask); +static void put_ipv4_key(const struct ovs_key_ipv4 *, struct flow *, + bool is_mask); +static void get_ipv6_key(const struct flow *, struct ovs_key_ipv6 *, + bool is_mask); +static void put_ipv6_key(const struct ovs_key_ipv6 *, struct flow *, + bool is_mask); +static void get_arp_key(const struct flow *, struct ovs_key_arp *); +static void put_arp_key(const struct ovs_key_arp *, struct flow *); + +/* These share the same layout. */ +union ovs_key_tp { +struct ovs_key_tcp tcp; +struct ovs_key_udp udp; +struct ovs_key_sctp sctp; +}; + +static void get_tp_key(const struct flow *, union ovs_key_tp *); +static void put_tp_key(const union ovs_key_tp *, struct flow *); static void odp_flow_key_from_flow__(struct ofpbuf *buf, const struct flow *flow, @@ -2531,8 +2551,7 @@ odp_flow_key_from_flow__(struct ofpbuf *buf, const struct flow *flow, eth_key = nl_msg_put_unspec_uninit(buf, OVS_KEY_ATTR_ETHERNET, sizeof *eth_key); -memcpy(eth_key->eth_src, data->dl_src, ETH_ADDR_LEN); -memcpy(eth_key->eth_dst, data->dl_dst, ETH_ADDR_LEN); +get_ethernet_key(data, eth_key); if (flow->vlan_tci != htons(0) || flow->dl_type == htons(ETH_TYPE_VLAN)) { if (export_mask) { @@ -2574,37 +2593,20 @@ odp_flow_key_from_flow__(struct ofpbuf *buf, const struct flow *flow, ipv4_key = nl_msg_put_unspec_uninit(buf, OVS_KEY_ATTR_IPV4, sizeof *ipv4_key); -ipv4_key->ipv4_src = data->nw_src; -ipv4_key->ipv4_dst = data->nw_dst; -ipv4_key->ipv4_proto = data->nw_proto; -ipv4_key->ipv4_tos = data->nw_tos; -ipv4_key->ipv4_ttl = data->nw_ttl; -ipv4_key->ipv4_frag = export_mask ? ovs_to_odp_frag_mask(data->nw_frag) - : ovs_to_odp_frag(data->nw_frag); +get_ipv4_key(data, ipv4_key, export_mask); } else if (flow->dl_type == htons(ETH_TYPE_IPV6)) { struct ovs_key_ipv6 *ipv6_key; ipv6_key = nl_msg_put_unspec_uninit(buf, OVS_KEY_ATTR_IPV6, sizeof *ipv6_key); -memcpy(ipv6_key->ipv6_src, &data->ipv6_src, sizeof ipv6_key->ipv6_src); -memcpy(ipv6_key->ipv6_dst, &data->ipv6_dst, sizeof ipv6_key->ipv6_dst); -ipv6_key->ipv6_label = data->ipv6_label; -ipv6_key->ipv6_proto = data->nw_proto; -ipv6_key->ipv6_tclass = data->nw_tos; -ipv6_key->ipv6_hlimit = data->nw_ttl; -ipv6_key->ipv6_frag = export_mask ? ovs_to_odp_frag_mask(data->nw_frag) - : ovs_to_odp_frag(data->nw_frag); +get_ipv6_key(data, ipv6_key, export_mask); } else if (flow->dl_type == htons(ETH_TYPE_ARP) || flow->dl_type == htons(ETH_TYPE_RARP)) { struct ovs_key_arp *arp_key; -arp_key = nl_msg_put_unspec_zero(buf, OVS_KEY_ATTR_ARP, - sizeof *arp_key); -arp_key->arp_sip = data->nw_src; -arp_key->arp_tip = data->nw_dst; -arp_key->arp_op = htons(data->nw_proto); -memcpy(arp_key-
Re: [ovs-dev] [v8 2/2] datapath: Implement recirc action without recursion
On Fri, Sep 5, 2014 at 2:52 PM, Andy Zhou wrote: > Since kernel stack is limited in size, it is not wise to using > recursive function with large stack frames. > > This patch provides an alternative implementation of recirc action > without using recursion. > > A per CPU fixed sized, 'deferred action FIFO', is used to store either > recirc or sample actions encountered during execution of an action > list. Not executing recirc or sample action in place, but rather execute > them laster as 'deferred actions' avoids recursion. > > Deferred actions are only executed after all other actions has been > executed, including the ones triggered by loopback from the kernel > network stack. > > The size of the private FIFO, currently set to 20, limits the number > of total 'deferred actions' any one packet can accumulate. > > Signed-off-by: Andy Zhou > Thanks for fixing all issue. Acked-by: Pravin B Shelar I have couple of minor comments below. > --- > v8->v7: > Always free skb from execute_recirc() > Dynamically allocating per CPU action fifos > > v6->v7: > Always clone the packet key > > v5->v6: > Remove ovs_ prefix for internal symbols. > Remove actions.h > Rename ovs_exec_actions_count to exec_actions_limit > Rename ovs_process_deferred_packets() to > process_deferred_actions() > > v4->v5: > Reset fifo after processing deferred actions > move private data structures from actions.h to actions.c > remove action_fifo init functions, since default percpu data >will be zero. > --- > datapath/actions.c | 169 > +++- > datapath/datapath.c | 5 ++ > datapath/datapath.h | 3 + > 3 files changed, 162 insertions(+), 15 deletions(-) > > diff --git a/datapath/actions.c b/datapath/actions.c > index 0a22e55..95f1e51 100644 > --- a/datapath/actions.c > +++ b/datapath/actions.c > @@ -40,12 +40,80 @@ > #include "vlan.h" > #include "vport.h" > > +struct deferred_action { > + struct sk_buff *skb; > + const struct nlattr *actions; > + > + /* Store pkt_key clone when creating deferred action. */ > + struct sw_flow_key pkt_key; > +}; > + > +#define DEFERRED_ACTION_FIFO_SIZE 10 > +struct action_fifo { > + int head; > + int tail; > + /* Deferred action fifo queue storage. */ > + struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE]; > +}; > + > +static struct action_fifo *action_fifos; > +#define EXEC_ACTIONS_LEVEL_LIMIT 4 /* limit used to detect packet > + looping by the network stack */ > +static DEFINE_PER_CPU(int, exec_actions_level); > + > +static void action_fifo_init(struct action_fifo *fifo) > +{ > + fifo->head = 0; > + fifo->tail = 0; > +} > + > +static bool action_fifo_is_empty(struct action_fifo *fifo) > +{ > + return (fifo->head == fifo->tail); > +} > + > +static struct deferred_action * > +action_fifo_get(struct action_fifo *fifo) > +{ > + if (action_fifo_is_empty(fifo)) > + return NULL; > + > + return &fifo->fifo[fifo->tail++]; > +} > + > +static struct deferred_action * > +action_fifo_put(struct action_fifo *fifo) > +{ > + if (fifo->head >= DEFERRED_ACTION_FIFO_SIZE - 1) > + return NULL; > + > + return &fifo->fifo[fifo->head++]; > +} > + > static void flow_key_clone(struct sk_buff *skb, struct sw_flow_key *new_key) > { > *new_key = *OVS_CB(skb)->pkt_key; > OVS_CB(skb)->pkt_key = new_key; > } > > +/* Return True iff fifo is not full */ Typo. > +static bool add_deferred_actions(struct sk_buff *skb, > +const struct nlattr *attr) > +{ > + struct action_fifo *fifo; > + struct deferred_action *da; > + > + fifo = this_cpu_ptr(action_fifos); > + da = action_fifo_put(fifo); > + if (da) { > + da->skb = skb; > + da->actions = attr; > + flow_key_clone(skb, &da->pkt_key); > + } > + > + return (da != NULL); > +} > + > static void flow_key_set_recirc_id(struct sk_buff *skb, u32 recirc_id) > { > OVS_CB(skb)->pkt_key->recirc_id = recirc_id; > @@ -689,7 +757,6 @@ static bool last_action(const struct nlattr *a, int rem) > static int sample(struct datapath *dp, struct sk_buff *skb, > const struct nlattr *attr) > { > - struct sw_flow_key sample_key; > const struct nlattr *acts_list = NULL; > const struct nlattr *a; > int rem; > @@ -728,10 +795,15 @@ static int sample(struct datapath *dp, struct sk_buff > *skb, > /* Skip the sample action when out of memory. */ > return 0; > > - flow_key_clone(skb, &sample_key); > + if (!add_deferred_actions(skb, a)) { > + if (net_ratelimit()) > + pr_warn("%s: deferred actions limit reached, dropping > sample action\n",
[ovs-dev] [PATCH 2/2] dpif-netdev: Store miniflow length in exact match cache
This optimization is done to avoid calling count_1bits(), which, if the popcnt istruction, is not available might is slow. popcnt may not be available because: - We are running on old hardware - (more likely) We're using a generic build (i.e. packaged OVS from a distro), not tuned for the specific CPU Signed-off-by: Daniele Di Proietto --- This commit improves 1-flow UDP 64-bytes packets test throughput by 6% (compiled without -march=native) --- lib/dpif-netdev.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 112cd5a..ea14838 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -129,6 +129,7 @@ struct netdev_flow_key { struct emc_entry { uint32_t hash; +uint32_t mf_len; struct netdev_flow_key mf; struct dp_netdev_flow *flow; }; @@ -398,6 +399,7 @@ emc_cache_init(struct emc_cache *flow_cache) for (i = 0; i < ARRAY_SIZE(flow_cache->entries); i++) { flow_cache->entries[i].flow = NULL; flow_cache->entries[i].hash = 0; +flow_cache->entries[i].mf_len = 0; miniflow_initialize(&flow_cache->entries[i].mf.flow, flow_cache->entries[i].mf.buf); } @@ -1171,11 +1173,10 @@ netdev_flow_key_size(uint32_t flow_u32s) /* Used to compare 'netdev_flow_key's (miniflows) in the exact match cache. */ static inline bool netdev_flow_key_equal(const struct netdev_flow_key *a, - const struct netdev_flow_key *b) + const struct netdev_flow_key *b, + uint32_t size) { -uint32_t size = count_1bits(a->flow.map); - -return !memcmp(a, b, netdev_flow_key_size(size)); +return !memcmp(a, b, size); } static inline void @@ -1183,7 +1184,7 @@ netdev_flow_key_clone(struct netdev_flow_key *dst, const struct netdev_flow_key *src, uint32_t size) { -memcpy(dst, src, netdev_flow_key_size(size)); +memcpy(dst, src, size); } static inline bool @@ -1217,8 +1218,11 @@ emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow, } } if (mf) { -netdev_flow_key_clone(&ce->mf, mf, count_1bits(mf->flow.map)); +uint32_t mf_len = netdev_flow_key_size(count_1bits(mf->flow.map)); + +netdev_flow_key_clone(&ce->mf, mf, mf_len); ce->hash = hash; +ce->mf_len = mf_len; } } @@ -1232,7 +1236,8 @@ emc_insert(struct emc_cache *cache, const struct miniflow *mf, uint32_t hash, EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) { if (current_entry->hash == hash && netdev_flow_key_equal(¤t_entry->mf, - miniflow_to_netdev_flow_key(mf))) { + miniflow_to_netdev_flow_key(mf), + current_entry->mf_len)) { /* We found the entry with the 'mf' miniflow */ emc_change_entry(current_entry, flow, NULL, 0); @@ -1263,7 +1268,8 @@ emc_lookup(struct emc_cache *cache, const struct miniflow *mf, uint32_t hash) EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) { if (current_entry->hash == hash && emc_entry_alive(current_entry) && netdev_flow_key_equal(¤t_entry->mf, - miniflow_to_netdev_flow_key(mf))) { + miniflow_to_netdev_flow_key(mf), + current_entry->mf_len)) { /* We found the entry with the 'mf' miniflow */ return current_entry->flow; -- 2.1.0.rc1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/2] dpif-netdev: Introduce netdev_flow_key_* functions
netdev_flow_key is a miniflow with the following constraints: 1) It is used only inside dpif-netdev.c. 2) It always has inline values. 3) It contains only miniflows created by miniflow_extract(). Therefore, by using these new functions instead of the miniflow_* ones, we get the following (performance related) benefits: - Because of (1) the functions can be inlined. - Because of (2) and (3) the netdev_flow_key can be treated as POD. Specifically, because of (3), we can do comparisons with memcmp, since if the map is different the miniflow must be different. Signed-off-by: Daniele Di Proietto --- lib/dpif-netdev.c | 68 ++- 1 file changed, 62 insertions(+), 6 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 869fb55..112cd5a 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -87,7 +87,7 @@ static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex) static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); -/* Stores a miniflow */ +/* Stores a miniflow with inline values */ /* There are fields in the flow structure that we never use. Therefore we can * save a few words of memory */ @@ -1133,6 +1133,59 @@ static bool dp_netdev_flow_ref(struct dp_netdev_flow *flow) return ovs_refcount_try_ref_rcu(&flow->ref_cnt); } +/* netdev_flow_key utilities. + * + * netdev_flow_key is basically a miniflow. We use these functions + * (netdev_flow_key_clone, netdev_flow_key_equal, ...) instead of the miniflow + * functions (miniflow_clone_inline, miniflow_equal, ...), because: + * + * - Since we are dealing exclusively with miniflows created by + * miniflow_extract(), if the map is different the miniflow is different. + * Therefore we can be faster by comparing the map and the miniflow in a + * single memcmp(). + * _ netdev_flow_key's miniflow has always inline values. + * - These functions can be inlined by the compiler. + * + * The following assertions make sure that what we're doing with miniflow is + * safe + */ +BUILD_ASSERT_DECL(offsetof(struct miniflow, inline_values) + == sizeof(uint64_t)); +BUILD_ASSERT_DECL(offsetof(struct netdev_flow_key, flow) == 0); + +static inline struct netdev_flow_key * +miniflow_to_netdev_flow_key(const struct miniflow *mf) +{ +return (struct netdev_flow_key *) CONST_CAST(struct miniflow *, mf); +} + +/* Given the number of bits set in the miniflow map, returns the size of the + * netdev_flow key */ +static inline uint32_t +netdev_flow_key_size(uint32_t flow_u32s) +{ +return MINIFLOW_VALUES_SIZE(flow_u32s) + + offsetof(struct miniflow, inline_values); +} + +/* Used to compare 'netdev_flow_key's (miniflows) in the exact match cache. */ +static inline bool +netdev_flow_key_equal(const struct netdev_flow_key *a, + const struct netdev_flow_key *b) +{ +uint32_t size = count_1bits(a->flow.map); + +return !memcmp(a, b, netdev_flow_key_size(size)); +} + +static inline void +netdev_flow_key_clone(struct netdev_flow_key *dst, + const struct netdev_flow_key *src, + uint32_t size) +{ +memcpy(dst, src, netdev_flow_key_size(size)); +} + static inline bool emc_entry_alive(struct emc_entry *ce) { @@ -1150,7 +1203,7 @@ emc_clear_entry(struct emc_entry *ce) static inline void emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow, - const struct miniflow *mf, uint32_t hash) + const struct netdev_flow_key *mf, uint32_t hash) { if (ce->flow != flow) { if (ce->flow) { @@ -1164,7 +1217,7 @@ emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow, } } if (mf) { -miniflow_clone_inline(&ce->mf.flow, mf, count_1bits(mf->map)); +netdev_flow_key_clone(&ce->mf, mf, count_1bits(mf->flow.map)); ce->hash = hash; } } @@ -1178,7 +1231,8 @@ emc_insert(struct emc_cache *cache, const struct miniflow *mf, uint32_t hash, EMC_FOR_EACH_POS_WITH_HASH(cache, current_entry, hash) { if (current_entry->hash == hash -&& miniflow_equal(¤t_entry->mf.flow, mf)) { +&& netdev_flow_key_equal(¤t_entry->mf, + miniflow_to_netdev_flow_key(mf))) { /* We found the entry with the 'mf' miniflow */ emc_change_entry(current_entry, flow, NULL, 0); @@ -1197,7 +1251,8 @@ emc_insert(struct emc_cache *cache, const struct miniflow *mf, uint32_t hash, /* We didn't find the miniflow in the cache. * The 'to_be_replaced' entry is where the new flow will be stored */ -emc_change_entry(to_be_replaced, flow, mf, hash); +emc_change_entry(to_be_replaced, flow, miniflow_to_netdev_flow_key(mf), + hash); } static inline struct dp_netdev_flow * @@ -1207,7 +1262,8 @@ emc_lookup(struct emc_cache *cache, const struct miniflow *mf, uint32_t hash) EMC_FOR
Re: [ovs-dev] [v8 2/2] datapath: Implement recirc action without recursion
Thanks. Pushed to master with suggested changes. On Fri, Sep 5, 2014 at 4:23 PM, Pravin Shelar wrote: > On Fri, Sep 5, 2014 at 2:52 PM, Andy Zhou wrote: >> Since kernel stack is limited in size, it is not wise to using >> recursive function with large stack frames. >> >> This patch provides an alternative implementation of recirc action >> without using recursion. >> >> A per CPU fixed sized, 'deferred action FIFO', is used to store either >> recirc or sample actions encountered during execution of an action >> list. Not executing recirc or sample action in place, but rather execute >> them laster as 'deferred actions' avoids recursion. >> >> Deferred actions are only executed after all other actions has been >> executed, including the ones triggered by loopback from the kernel >> network stack. >> >> The size of the private FIFO, currently set to 20, limits the number >> of total 'deferred actions' any one packet can accumulate. >> >> Signed-off-by: Andy Zhou >> > Thanks for fixing all issue. > Acked-by: Pravin B Shelar > > I have couple of minor comments below. > >> --- >> v8->v7: >> Always free skb from execute_recirc() >> Dynamically allocating per CPU action fifos >> >> v6->v7: >> Always clone the packet key >> >> v5->v6: >> Remove ovs_ prefix for internal symbols. >> Remove actions.h >> Rename ovs_exec_actions_count to exec_actions_limit >> Rename ovs_process_deferred_packets() to >> process_deferred_actions() >> >> v4->v5: >> Reset fifo after processing deferred actions >> move private data structures from actions.h to actions.c >> remove action_fifo init functions, since default percpu data >>will be zero. >> --- >> datapath/actions.c | 169 >> +++- >> datapath/datapath.c | 5 ++ >> datapath/datapath.h | 3 + >> 3 files changed, 162 insertions(+), 15 deletions(-) >> >> diff --git a/datapath/actions.c b/datapath/actions.c >> index 0a22e55..95f1e51 100644 >> --- a/datapath/actions.c >> +++ b/datapath/actions.c >> @@ -40,12 +40,80 @@ >> #include "vlan.h" >> #include "vport.h" >> >> +struct deferred_action { >> + struct sk_buff *skb; >> + const struct nlattr *actions; >> + >> + /* Store pkt_key clone when creating deferred action. */ >> + struct sw_flow_key pkt_key; >> +}; >> + >> +#define DEFERRED_ACTION_FIFO_SIZE 10 >> +struct action_fifo { >> + int head; >> + int tail; >> + /* Deferred action fifo queue storage. */ >> + struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE]; >> +}; >> + >> +static struct action_fifo *action_fifos; >> +#define EXEC_ACTIONS_LEVEL_LIMIT 4 /* limit used to detect packet >> + looping by the network stack */ >> +static DEFINE_PER_CPU(int, exec_actions_level); >> + >> +static void action_fifo_init(struct action_fifo *fifo) >> +{ >> + fifo->head = 0; >> + fifo->tail = 0; >> +} >> + >> +static bool action_fifo_is_empty(struct action_fifo *fifo) >> +{ >> + return (fifo->head == fifo->tail); >> +} >> + >> +static struct deferred_action * >> +action_fifo_get(struct action_fifo *fifo) >> +{ >> + if (action_fifo_is_empty(fifo)) >> + return NULL; >> + >> + return &fifo->fifo[fifo->tail++]; >> +} >> + >> +static struct deferred_action * >> +action_fifo_put(struct action_fifo *fifo) >> +{ >> + if (fifo->head >= DEFERRED_ACTION_FIFO_SIZE - 1) >> + return NULL; >> + >> + return &fifo->fifo[fifo->head++]; >> +} >> + >> static void flow_key_clone(struct sk_buff *skb, struct sw_flow_key *new_key) >> { >> *new_key = *OVS_CB(skb)->pkt_key; >> OVS_CB(skb)->pkt_key = new_key; >> } >> >> +/* Return True iff fifo is not full */ > Typo. > >> +static bool add_deferred_actions(struct sk_buff *skb, >> +const struct nlattr *attr) >> +{ >> + struct action_fifo *fifo; >> + struct deferred_action *da; >> + >> + fifo = this_cpu_ptr(action_fifos); >> + da = action_fifo_put(fifo); >> + if (da) { >> + da->skb = skb; >> + da->actions = attr; >> + flow_key_clone(skb, &da->pkt_key); >> + } >> + >> + return (da != NULL); >> +} >> + >> static void flow_key_set_recirc_id(struct sk_buff *skb, u32 recirc_id) >> { >> OVS_CB(skb)->pkt_key->recirc_id = recirc_id; >> @@ -689,7 +757,6 @@ static bool last_action(const struct nlattr *a, int rem) >> static int sample(struct datapath *dp, struct sk_buff *skb, >> const struct nlattr *attr) >> { >> - struct sw_flow_key sample_key; >> const struct nlattr *acts_list = NULL; >> const struct nlattr *a; >> int rem; >> @@ -728,10 +795,15 @@ static int sample(struct datapath *dp, struct sk_buff >> *skb, >> /* Skip the sample action when out of memory. */