On 4/29/2019 3:02 PM, Burakov, Anatoly wrote: > On 29-Apr-19 2:53 PM, Ferruh Yigit wrote: >> On 4/25/2019 6:17 PM, Herakliusz Lipiec wrote: >>> When secondary to primary process synchronization occours >>> there is no check for number of fds which could cause buffer overrun. >>> >>> Bugzilla ID: 252 >>> Fixes: c9aa56edec8e ("net/tap: access primary process queues from >>> secondary") >>> Cc: rasl...@mellanox.com >>> Cc: sta...@dpdk.org >>> >>> Signed-off-by: Herakliusz Lipiec <herakliusz.lip...@intel.com> >>> --- >>> drivers/net/tap/rte_eth_tap.c | 13 +++++++++++-- >>> 1 file changed, 11 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c >>> index e9fda8cf6..4a2ef5ce7 100644 >>> --- a/drivers/net/tap/rte_eth_tap.c >>> +++ b/drivers/net/tap/rte_eth_tap.c >>> @@ -2111,6 +2111,10 @@ tap_mp_attach_queues(const char *port_name, struct >>> rte_eth_dev *dev) >>> TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name); >>> >>> /* Attach the queues from received file descriptors */ >>> + if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) { >>> + TAP_LOG(ERR, "Unexpected number of fds received"); >>> + return -1; >>> + } >> >> Is there a way this can happen? If not I suggest remove the check. > > Normally no, but theoretically this can trigger a buffer overrun if not > checked. After all, something could either fail on the other side, or > someone could send a fake message :) This data is coming from an > external source, so we need to sanity-check it.
Both sender and receiver are in the same driver, primary and secondary application paths, there is no communication with external source, and I don't see any code path that will cause this failure. After above said, this is just an additional reasonable check and not in the data path, so having this won't hurt, I don't object to have it. > >> >>> dev->data->nb_rx_queues = reply_param->rxq_count; >>> dev->data->nb_tx_queues = reply_param->txq_count; >>> fd_iterator = 0; >>> @@ -2151,12 +2155,16 @@ tap_mp_sync_queues(const struct rte_mp_msg >>> *request, const void *peer) >>> /* Fill file descriptors for all queues */ >>> reply.num_fds = 0; >>> reply_param->rxq_count = 0; >>> + if (dev->data->nb_rx_queues + dev->data->nb_tx_queues > >>> + RTE_MP_MAX_FD_NUM){ >>> + TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of >>> fds"); >>> + return -1; >>> + } >> >> +1 for the check. >> But what it does when return "-1", not send a message at all? If so would it >> be >> better to send and error message back instead of waiting the receiver to >> timeout? > > There will be a different patch fixing this specific issue. Probably > this patch would need to be rebased on top of that. +1 to fix this issue but I assume it won't be for this release, so can get this patch now and this part can be updated with mentioned patch. > >> >>> for (queue = 0; queue < dev->data->nb_rx_queues; queue++) { >>> reply.fds[reply.num_fds++] = process_private->rxq_fds[queue]; >>> reply_param->rxq_count++; >>> } >>> RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues); >>> - RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues); >>> RTE_ASSERT(reply.num_fds <= RTE_MP_MAX_FD_NUM); >> >> Since there is dynamic check above for "RTE_MP_MAX_FD_NUM", we can remove >> this >> assert I think. >> >>> >>> reply_param->txq_count = 0; >>> @@ -2164,7 +2172,8 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, >>> const void *peer) >>> reply.fds[reply.num_fds++] = process_private->txq_fds[queue]; >>> reply_param->txq_count++; >>> } >>> - >>> + RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues); >>> + RTE_ASSERT(reply.num_fds <= RTE_MP_MAX_FD_NUM); >> >> Same for this assert, we can remove it. >> And as syntax, please keep the empty line before next block. >> >>> /* Send reply */ >>> strlcpy(reply.name, request->name, sizeof(reply.name)); >>> strlcpy(reply_param->port_name, request_param->port_name, >>> >> >> > >