Hello,
while doing robustness tests on OVS with DPDK datapath, I'm experiencing
vswitchd terminating due to a buffer overrun in DPDK
fdset_event_dispatch().
The underlying root cause for this, I'm quite sure, lies in our modified
version
of OpenStack or tests cases where some vhostuser ports happen to be left
behind when VMs with more than 16 vNICs are deleted using "nova delete".
Thus, currently, the test leads to an accumulation of vhostuser sockets and
the termination is triggered when around >500 sockets are open at the
same time (accumulated sockets + sockets for 4 X VMs X 24 vNICs/VM).
Neither OVS (2.4.0, with corrections from the 2.4 branch) or DPDK (2.1.0)
properly protects against a situation where file descriptor numbers exceeds
what can be held by the fd_set bit mask, as far as I can determine. The
question then is if this is a bug or if I'm missing something?
I'm well aware that this is an unlikely scenario but I do prefer graceful
handling if possible.
1) should there be a limit to the number of vhostuser ports that can be
created in OVS or should the limit be dictated by DPDK only?
2) in rte_vhost_driver_register() there seems to be an implicit
assumption
that the file descriptor number and the vserver_cnt are related,
which in
general, does not hold. Can this check be improved?
3) should fdset_event_dispatch() protect against events from file
descriptors
greater than or equal to FD_SETSIZE (=__FD_SETSIZE=1024)?
What is your opinion?
---
In short, what happens is roughly that a vhostuser port is registered in
(OVS, netdev-dpdk.c):
static int
netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
{
struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
int err;
ovs_mutex_lock(&dpdk_mutex);
/* Take the name of the vhost-user port and append it to the
location where
* the socket is to be created, then register the socket.
*/
snprintf(netdev->vhost_id, sizeof(netdev->vhost_id), "%s/%s",
vhost_sock_dir, netdev_->name);
err = rte_vhost_driver_register(netdev->vhost_id);
....
}
The call to rte_vhost_driver_register() ends up in (DPDK,
vhost-net-user.c):
int
rte_vhost_driver_register(const char *path)
{
struct vhost_server *vserver;
pthread_mutex_lock(&g_vhost_server.server_mutex);
if (ops == NULL)
ops = get_virtio_net_callbacks();
if (g_vhost_server.vserver_cnt == MAX_VHOST_SERVER) {
RTE_LOG(ERR, VHOST_CONFIG,
"error: the number of servers reaches maximum\n");
pthread_mutex_unlock(&g_vhost_server.server_mutex);
return -1;
}
...
}
Here a check is done against MAX_VHOST_SERVER (=1024),
the number of vhostuser ports are not yet at maximum number.
But still the process can have more file descriptors open
than it has vhostuser sockets.
Then we come to the actual failure, where an event on a file
descriptor (fd=1024) leads to a memory access outside of the
fd_set bit mask, at (DPDK, fd_man.c, FD_ISSET line shown by "-->"):
void
fdset_event_dispatch(struct fdset *pfdset)
{
fd_set rfds, wfds;
int i, maxfds;
struct fdentry *pfdentry;
int num = MAX_FDS;
fd_cb rcb, wcb;
void *dat;
int fd;
int remove1, remove2;
int ret;
if (pfdset == NULL)
return;
while (1) {
struct timeval tv;
tv.tv_sec = 1;
tv.tv_usec = 0;
FD_ZERO(&rfds);
FD_ZERO(&wfds);
pthread_mutex_lock(&pfdset->fd_mutex);
maxfds = fdset_fill(&rfds, &wfds, pfdset);
pthread_mutex_unlock(&pfdset->fd_mutex);
/*
* When select is blocked, other threads might unregister
* listenfds from and register new listenfds into fdset.
* When select returns, the entries for listenfds in the fdset
* might have been updated. It is ok if there is unwanted call
* for new listenfds.
*/
ret = select(maxfds + 1, &rfds, &wfds, NULL, &tv);
if (ret <= 0)
continue;
for (i = 0; i < num; i++) {
remove1 = remove2 = 0;
pthread_mutex_lock(&pfdset->fd_mutex);
pfdentry = &pfdset->fd[i];
fd = pfdentry->fd;
rcb = pfdentry->rcb;
wcb = pfdentry->wcb;
dat = pfdentry->dat;
pfdentry->busy = 1;
pthread_mutex_unlock(&pfdset->fd_mutex);
----> if (fd >= 0 && FD_ISSET(fd, &rfds) && rcb)
rcb(fd, dat, &remove1);
if (fd >= 0 && FD_ISSET(fd, &wfds) && wcb)
wcb(fd, dat, &remove2);
...
}
}
}
Regards,
Patrik
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss