[dpdk-dev] [PATCH 0/4] virtio support for container
Hi Amit, On 1/13/2016 11:00 PM, Amit Tomer wrote: > Hello, > >> You can use below patch for l2fwd to send out an arp packet when it gets >> started. > I tried to send out arp packet using this patch but buffer allocation > for arp packets itself gets failed: > > m = rte_pktmbuf_alloc(mp); > > Return a NULL Value. Can you send out how you start this l2fwd program? Thanks, Jianfeng > > Thanks, > Amit.
[dpdk-dev] [PATCH v2 0/6] Support VxLAN & NVGRE checksum off-load on X550
This patch set add the VxLAN & NVGRE checksum off-load support. Both RX and TX checksum off-load can be used for VxLAN & NVGRE. And the VxLAN port can be set, it's implemented in this patch set either. Wenzhuo Lu (6): lib/librte_ether: change function name of tunnel port config i40e: rename the tunnel port config functions ixgbe: support UDP tunnel port config ixgbe: support VxLAN & NVGRE RX checksum off-load ixgbe: support VxLAN & NVGRE TX checksum off-load doc: update release note for VxLAN & NVGRE checksum off-load support app/test-pmd/cmdline.c | 6 ++- doc/guides/rel_notes/release_2_3.rst | 8 +++ drivers/net/i40e/i40e_ethdev.c | 22 drivers/net/ixgbe/ixgbe_ethdev.c | 95 ++ drivers/net/ixgbe/ixgbe_rxtx.c | 63 ++ drivers/net/ixgbe/ixgbe_rxtx.h | 6 ++- examples/tep_termination/vxlan_setup.c | 2 +- lib/librte_ether/rte_ethdev.c | 45 lib/librte_ether/rte_ethdev.h | 18 +++ lib/librte_mbuf/rte_mbuf.c | 1 + lib/librte_mbuf/rte_mbuf.h | 3 ++ 11 files changed, 244 insertions(+), 25 deletions(-) -- 1.9.3
[dpdk-dev] [PATCH v2 1/6] lib/librte_ether: change function name of tunnel port config
The names of function for tunnel port configuration are not accurate. They're tunnel_add/del, better change them to tunnel_port_add/del. As it may be an ABI change if change the names directly, the new functions are added but not remove the old ones. The old ones will be removed in the next release after an ABI change announcement. Signed-off-by: Wenzhuo Lu --- app/test-pmd/cmdline.c | 6 +++-- examples/tep_termination/vxlan_setup.c | 2 +- lib/librte_ether/rte_ethdev.c | 45 ++ lib/librte_ether/rte_ethdev.h | 18 ++ 4 files changed, 68 insertions(+), 3 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 73298c9..4e71e90 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -6780,9 +6780,11 @@ cmd_tunnel_udp_config_parsed(void *parsed_result, tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN; if (!strcmp(res->what, "add")) - ret = rte_eth_dev_udp_tunnel_add(res->port_id, &tunnel_udp); + ret = rte_eth_dev_udp_tunnel_port_add(res->port_id, + &tunnel_udp); else - ret = rte_eth_dev_udp_tunnel_delete(res->port_id, &tunnel_udp); + ret = rte_eth_dev_udp_tunnel_port_delete(res->port_id, +&tunnel_udp); if (ret < 0) printf("udp tunneling add error: (%s)\n", strerror(-ret)); diff --git a/examples/tep_termination/vxlan_setup.c b/examples/tep_termination/vxlan_setup.c index 51ad133..8836603 100644 --- a/examples/tep_termination/vxlan_setup.c +++ b/examples/tep_termination/vxlan_setup.c @@ -191,7 +191,7 @@ vxlan_port_init(uint8_t port, struct rte_mempool *mbuf_pool) /* Configure UDP port for UDP tunneling */ tunnel_udp.udp_port = udp_port; tunnel_udp.prot_type = RTE_TUNNEL_TYPE_VXLAN; - retval = rte_eth_dev_udp_tunnel_add(port, &tunnel_udp); + retval = rte_eth_dev_udp_tunnel_port_add(port, &tunnel_udp); if (retval < 0) return retval; rte_eth_macaddr_get(port, &ports_eth_addr[port]); diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c index ed971b4..74428f4 100644 --- a/lib/librte_ether/rte_ethdev.c +++ b/lib/librte_ether/rte_ethdev.c @@ -1987,6 +1987,28 @@ rte_eth_dev_udp_tunnel_add(uint8_t port_id, } int +rte_eth_dev_udp_tunnel_port_add(uint8_t port_id, + struct rte_eth_udp_tunnel *udp_tunnel) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + if (udp_tunnel == NULL) { + RTE_PMD_DEBUG_TRACE("Invalid udp_tunnel parameter\n"); + return -EINVAL; + } + + if (udp_tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) { + RTE_PMD_DEBUG_TRACE("Invalid tunnel type\n"); + return -EINVAL; + } + + dev = &rte_eth_devices[port_id]; + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_port_add, -ENOTSUP); + return (*dev->dev_ops->udp_tunnel_port_add)(dev, udp_tunnel); +} + +int rte_eth_dev_udp_tunnel_delete(uint8_t port_id, struct rte_eth_udp_tunnel *udp_tunnel) { @@ -2010,6 +2032,29 @@ rte_eth_dev_udp_tunnel_delete(uint8_t port_id, } int +rte_eth_dev_udp_tunnel_port_delete(uint8_t port_id, + struct rte_eth_udp_tunnel *udp_tunnel) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = &rte_eth_devices[port_id]; + + if (udp_tunnel == NULL) { + RTE_PMD_DEBUG_TRACE("Invalid udp_tunnel parameter\n"); + return -EINVAL; + } + + if (udp_tunnel->prot_type >= RTE_TUNNEL_TYPE_MAX) { + RTE_PMD_DEBUG_TRACE("Invalid tunnel type\n"); + return -EINVAL; + } + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->udp_tunnel_port_del, -ENOTSUP); + return (*dev->dev_ops->udp_tunnel_port_del)(dev, udp_tunnel); +} + +int rte_eth_led_on(uint8_t port_id) { struct rte_eth_dev *dev; diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index bada8ad..2e064f4 100644 --- a/lib/librte_ether/rte_ethdev.h +++ b/lib/librte_ether/rte_ethdev.h @@ -1261,6 +1261,14 @@ typedef int (*eth_set_eeprom_t)(struct rte_eth_dev *dev, struct rte_dev_eeprom_info *info); /**< @internal Program eeprom data */ +typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev, +struct rte_eth_udp_tunnel *tunnel_udp); +/**< @internal Add tunneling UDP port */ + +typedef int (*eth_udp_tunnel_port_del_t)(struct rte_eth_dev *dev, +struct rte_eth_udp_tunnel *tunnel_udp); +/**< @internal Delete tunneling UDP port */ + #ifdef RTE_NIC_BYPASS enum { @@ -1443
[dpdk-dev] [PATCH v2 2/6] i40e: rename the tunnel port config functions
As the names of tunnel port config functions are not accurate, change them from tunnel_add/del to tunnel_port_add/del. And support both the old and new rte ops. Signed-off-by: Wenzhuo Lu --- drivers/net/i40e/i40e_ethdev.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index bf6220d..b0335f5 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -369,10 +369,10 @@ static int i40e_dev_rss_hash_update(struct rte_eth_dev *dev, struct rte_eth_rss_conf *rss_conf); static int i40e_dev_rss_hash_conf_get(struct rte_eth_dev *dev, struct rte_eth_rss_conf *rss_conf); -static int i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev, - struct rte_eth_udp_tunnel *udp_tunnel); -static int i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev, - struct rte_eth_udp_tunnel *udp_tunnel); +static int i40e_dev_udp_tunnel_port_add(struct rte_eth_dev *dev, + struct rte_eth_udp_tunnel *udp_tunnel); +static int i40e_dev_udp_tunnel_port_del(struct rte_eth_dev *dev, + struct rte_eth_udp_tunnel *udp_tunnel); static int i40e_ethertype_filter_set(struct i40e_pf *pf, struct rte_eth_ethertype_filter *filter, bool add); @@ -467,8 +467,10 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .reta_query = i40e_dev_rss_reta_query, .rss_hash_update = i40e_dev_rss_hash_update, .rss_hash_conf_get= i40e_dev_rss_hash_conf_get, - .udp_tunnel_add = i40e_dev_udp_tunnel_add, - .udp_tunnel_del = i40e_dev_udp_tunnel_del, + .udp_tunnel_add = i40e_dev_udp_tunnel_port_add, + .udp_tunnel_del = i40e_dev_udp_tunnel_port_del, + .udp_tunnel_port_add = i40e_dev_udp_tunnel_port_add, + .udp_tunnel_port_del = i40e_dev_udp_tunnel_port_del, .filter_ctrl = i40e_dev_filter_ctrl, .rxq_info_get = i40e_rxq_info_get, .txq_info_get = i40e_txq_info_get, @@ -5976,8 +5978,8 @@ i40e_del_vxlan_port(struct i40e_pf *pf, uint16_t port) /* Add UDP tunneling port */ static int -i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev, - struct rte_eth_udp_tunnel *udp_tunnel) +i40e_dev_udp_tunnel_port_add(struct rte_eth_dev *dev, +struct rte_eth_udp_tunnel *udp_tunnel) { int ret = 0; struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); @@ -6007,8 +6009,8 @@ i40e_dev_udp_tunnel_add(struct rte_eth_dev *dev, /* Remove UDP tunneling port */ static int -i40e_dev_udp_tunnel_del(struct rte_eth_dev *dev, - struct rte_eth_udp_tunnel *udp_tunnel) +i40e_dev_udp_tunnel_port_del(struct rte_eth_dev *dev, +struct rte_eth_udp_tunnel *udp_tunnel) { int ret = 0; struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private); -- 1.9.3
[dpdk-dev] [PATCH v2 4/6] ixgbe: support VxLAN & NVGRE RX checksum off-load
X550 will do VxLAN & NVGRE RX checksum off-load automatically. This patch exposes the result of the checksum off-load. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/ixgbe_rxtx.c | 11 ++- lib/librte_mbuf/rte_mbuf.c | 1 + lib/librte_mbuf/rte_mbuf.h | 1 + 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 52a263c..512ac3a 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1003,6 +1003,8 @@ rx_desc_status_to_pkt_flags(uint32_t rx_status) static inline uint64_t rx_desc_error_to_pkt_flags(uint32_t rx_status) { + uint64_t pkt_flags; + /* * Bit 31: IPE, IPv4 checksum error * Bit 30: L4I, L4I integrity error @@ -1011,8 +1013,15 @@ rx_desc_error_to_pkt_flags(uint32_t rx_status) 0, PKT_RX_L4_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD, PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD }; - return error_to_pkt_flags_map[(rx_status >> + pkt_flags = error_to_pkt_flags_map[(rx_status >> IXGBE_RXDADV_ERR_CKSUM_BIT) & IXGBE_RXDADV_ERR_CKSUM_MSK]; + + if ((rx_status & IXGBE_RXD_STAT_OUTERIPCS) && + (rx_status & IXGBE_RXDADV_ERR_OUTERIPER)) { + pkt_flags |= PKT_RX_OUTER_IP_CKSUM_BAD; + } + + return pkt_flags; } /* diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index c18b438..5d4af39 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf/rte_mbuf.c @@ -260,6 +260,7 @@ const char *rte_get_rx_ol_flag_name(uint64_t mask) /* case PKT_RX_MAC_ERR: return "PKT_RX_MAC_ERR"; */ case PKT_RX_IEEE1588_PTP: return "PKT_RX_IEEE1588_PTP"; case PKT_RX_IEEE1588_TMST: return "PKT_RX_IEEE1588_TMST"; + case PKT_RX_OUTER_IP_CKSUM_BAD: return "PKT_RX_OUTER_IP_CKSUM_BAD"; default: return NULL; } } diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index f234ac9..5ad5e59 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -98,6 +98,7 @@ extern "C" { #define PKT_RX_FDIR_ID (1ULL << 13) /**< FD id reported if FDIR match. */ #define PKT_RX_FDIR_FLX (1ULL << 14) /**< Flexible bytes reported if FDIR match. */ #define PKT_RX_QINQ_PKT (1ULL << 15) /**< RX packet with double VLAN stripped. */ +#define PKT_RX_OUTER_IP_CKSUM_BAD (1ULL << 16) /**< Outer IP cksum of RX pkt. is not OK. */ /* add new RX flags here */ /* add new TX flags here */ -- 1.9.3
[dpdk-dev] [PATCH v2 3/6] ixgbe: support UDP tunnel port config
Add UDP tunnel port add/del support on ixgbe. Now only support VxLAN port configuration. Although the VxLAN port has a default value 4789, it can be changed. We support VxLAN port configuration to meet the change. Note, the default value of VxLAN port in ixgbe NICs is 0. So please set it when using VxLAN off-load. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/ixgbe_ethdev.c | 95 1 file changed, 95 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 4c4c6df..c04edde 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -337,6 +337,10 @@ static int ixgbe_timesync_read_time(struct rte_eth_dev *dev, struct timespec *timestamp); static int ixgbe_timesync_write_time(struct rte_eth_dev *dev, const struct timespec *timestamp); +static int ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev, +struct rte_eth_udp_tunnel *udp_tunnel); +static int ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev, +struct rte_eth_udp_tunnel *udp_tunnel); /* * Define VF Stats MACRO for Non "cleared on read" register @@ -495,6 +499,10 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .timesync_adjust_time = ixgbe_timesync_adjust_time, .timesync_read_time = ixgbe_timesync_read_time, .timesync_write_time = ixgbe_timesync_write_time, + .udp_tunnel_add = ixgbe_dev_udp_tunnel_port_add, + .udp_tunnel_del = ixgbe_dev_udp_tunnel_port_del, + .udp_tunnel_port_add = ixgbe_dev_udp_tunnel_port_add, + .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, }; /* @@ -6191,6 +6199,93 @@ ixgbe_dev_get_dcb_info(struct rte_eth_dev *dev, return 0; } +#define DEFAULT_VXLAN_PORT 4789 + +/* on x550, there's only one register for VxLAN UDP port. + * So, we cannot add or del the port. We only update it. + */ +static int +ixgbe_update_vxlan_port(struct ixgbe_hw *hw, + uint16_t port) +{ + IXGBE_WRITE_REG(hw, IXGBE_VXLANCTRL, port); + IXGBE_WRITE_FLUSH(hw); + + return 0; +} + +/* Add UDP tunneling port */ +static int +ixgbe_dev_udp_tunnel_port_add(struct rte_eth_dev *dev, + struct rte_eth_udp_tunnel *udp_tunnel) +{ + int ret = 0; + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + if (hw->mac.type != ixgbe_mac_X550 && + hw->mac.type != ixgbe_mac_X550EM_x) { + return -ENOTSUP; + } + + if (udp_tunnel == NULL) + return -EINVAL; + + switch (udp_tunnel->prot_type) { + case RTE_TUNNEL_TYPE_VXLAN: + /* cannot add a port, update the port value */ + ret = ixgbe_update_vxlan_port(hw, udp_tunnel->udp_port); + break; + + case RTE_TUNNEL_TYPE_GENEVE: + case RTE_TUNNEL_TYPE_TEREDO: + PMD_DRV_LOG(ERR, "Tunnel type is not supported now."); + ret = -1; + break; + + default: + PMD_DRV_LOG(ERR, "Invalid tunnel type"); + ret = -1; + break; + } + + return ret; +} + +/* Remove UDP tunneling port */ +static int +ixgbe_dev_udp_tunnel_port_del(struct rte_eth_dev *dev, + struct rte_eth_udp_tunnel *udp_tunnel) +{ + int ret = 0; + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + + if (hw->mac.type != ixgbe_mac_X550 && + hw->mac.type != ixgbe_mac_X550EM_x) { + return -ENOTSUP; + } + + if (udp_tunnel == NULL) + return -EINVAL; + + switch (udp_tunnel->prot_type) { + case RTE_TUNNEL_TYPE_VXLAN: + /* cannot del the port, reset it to default */ + ret = ixgbe_update_vxlan_port(hw, DEFAULT_VXLAN_PORT); + break; + case RTE_TUNNEL_TYPE_GENEVE: + case RTE_TUNNEL_TYPE_TEREDO: + PMD_DRV_LOG(ERR, "Tunnel type is not supported now."); + ret = -1; + break; + default: + PMD_DRV_LOG(ERR, "Invalid tunnel type"); + ret = -1; + break; + } + + return ret; +} + static struct rte_driver rte_ixgbe_driver = { .type = PMD_PDEV, .init = rte_ixgbe_pmd_init, -- 1.9.3
[dpdk-dev] [PATCH v2 5/6] ixgbe: support VxLAN & NVGRE TX checksum off-load
The patch add VxLAN & NVGRE TX checksum off-load. When the flag of outer IP header checksum offload is set, we'll set the context descriptor to enable this checksum off-load. Signed-off-by: Wenzhuo Lu --- drivers/net/ixgbe/ixgbe_rxtx.c | 52 ++ drivers/net/ixgbe/ixgbe_rxtx.h | 6 - lib/librte_mbuf/rte_mbuf.h | 2 ++ 3 files changed, 49 insertions(+), 11 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 512ac3a..fea2495 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -85,7 +85,8 @@ PKT_TX_VLAN_PKT |\ PKT_TX_IP_CKSUM |\ PKT_TX_L4_MASK | \ - PKT_TX_TCP_SEG) + PKT_TX_TCP_SEG | \ + PKT_TX_OUTER_IP_CKSUM) static inline struct rte_mbuf * rte_rxmbuf_alloc(struct rte_mempool *mp) @@ -364,9 +365,11 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq, uint32_t ctx_idx; uint32_t vlan_macip_lens; union ixgbe_tx_offload tx_offload_mask; + uint32_t seqnum_seed = 0; ctx_idx = txq->ctx_curr; - tx_offload_mask.data = 0; + tx_offload_mask.data[0] = 0; + tx_offload_mask.data[1] = 0; type_tucmd_mlhl = 0; /* Specify which HW CTX to upload. */ @@ -430,9 +433,20 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq, } } + if (ol_flags & PKT_TX_OUTER_IP_CKSUM) { + tx_offload_mask.outer_l3_len |= ~0; + tx_offload_mask.outer_l2_len |= ~0; + seqnum_seed |= tx_offload.outer_l3_len + << IXGBE_ADVTXD_OUTER_IPLEN; + seqnum_seed |= tx_offload.outer_l2_len + << IXGBE_ADVTXD_TUNNEL_LEN; + } + txq->ctx_cache[ctx_idx].flags = ol_flags; - txq->ctx_cache[ctx_idx].tx_offload.data = - tx_offload_mask.data & tx_offload.data; + txq->ctx_cache[ctx_idx].tx_offload.data[0] = + tx_offload_mask.data[0] & tx_offload.data[0]; + txq->ctx_cache[ctx_idx].tx_offload.data[1] = + tx_offload_mask.data[1] & tx_offload.data[1]; txq->ctx_cache[ctx_idx].tx_offload_mask= tx_offload_mask; ctx_txd->type_tucmd_mlhl = rte_cpu_to_le_32(type_tucmd_mlhl); @@ -441,7 +455,7 @@ ixgbe_set_xmit_ctx(struct ixgbe_tx_queue *txq, vlan_macip_lens |= ((uint32_t)tx_offload.vlan_tci << IXGBE_ADVTXD_VLAN_SHIFT); ctx_txd->vlan_macip_lens = rte_cpu_to_le_32(vlan_macip_lens); ctx_txd->mss_l4len_idx = rte_cpu_to_le_32(mss_l4len_idx); - ctx_txd->seqnum_seed = 0; + ctx_txd->seqnum_seed = seqnum_seed; } /* @@ -454,16 +468,24 @@ what_advctx_update(struct ixgbe_tx_queue *txq, uint64_t flags, { /* If match with the current used context */ if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) && - (txq->ctx_cache[txq->ctx_curr].tx_offload.data == - (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data { + (txq->ctx_cache[txq->ctx_curr].tx_offload.data[0] == + (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[0] +& tx_offload.data[0])) && + (txq->ctx_cache[txq->ctx_curr].tx_offload.data[1] == + (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[1] +& tx_offload.data[1] { return txq->ctx_curr; } /* What if match with the next context */ txq->ctx_curr ^= 1; if (likely((txq->ctx_cache[txq->ctx_curr].flags == flags) && - (txq->ctx_cache[txq->ctx_curr].tx_offload.data == - (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data & tx_offload.data { + (txq->ctx_cache[txq->ctx_curr].tx_offload.data[0] == + (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[0] +& tx_offload.data[0])) && + (txq->ctx_cache[txq->ctx_curr].tx_offload.data[1] == + (txq->ctx_cache[txq->ctx_curr].tx_offload_mask.data[1] +& tx_offload.data[1] { return txq->ctx_curr; } @@ -492,6 +514,12 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags) cmdtype |= IXGBE_ADVTXD_DCMD_VLE; if (ol_flags & PKT_TX_TCP_SEG) cmdtype |= IXGBE_ADVTXD_DCMD_TSE; + if (ol_flags & PKT_TX_OUTER_IP_CKSUM) + cmdtype |= (1 << IXGBE_ADVTXD_OUTERIPCS_SHIFT); + if (ol_flags & PKT_TX_VXLAN_PKT) + cmdtype &= ~(1 << IXGBE_ADVTXD_TUNNEL_TYPE_NVGRE); + else + cmdtype |= (1 << IXGBE_ADVTXD_TUNNEL_TYPE_NVGRE); return cmdtype; } @@ -588,8 +616,10 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint64_t tx_ol_req; uint32_t ctx = 0;
[dpdk-dev] [PATCH v2 6/6] doc: update release note for VxLAN & NVGRE checksum off-load support
Signed-off-by: Wenzhuo Lu --- doc/guides/rel_notes/release_2_3.rst | 8 1 file changed, 8 insertions(+) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..5dce7fb 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,14 @@ DPDK Release 2.3 New Features +* **Added support for VxLAN & NVGRE checksum off-load on X550.** + + * Added support for VxLAN & NVGRE RX/TX checksum off-load on +X550. RX/TX checksum off-load is provided on both inner and +outer IP header and TCP header. + * Added functions to support VxLAN port configuration. The +default VxLAN port number is 4789 but this can be updated +programmatically. Resolved Issues --- -- 1.9.3
[dpdk-dev] Getting error while running DPDK test app on X-Gene1
Could you show what's exists in /sys/bus/pci/devices/:01:00.0/ Thanks, Michael On 1/13/2016 6:23 PM, Ankit Jindal wrote: > Hi, > > We are trying to run dpdk on our arm64 based SOC having Intel 10G > ixgbe PCIe card plugged. While running any test app, we are getting > following error. > > EAL: PCI device :01:00.0 on NUMA socket 0 > EAL: probe driver: 8086:10fb rte_ixgbe_pmd > EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such > file or directory > EAL: Error - exiting with code: 1 > Cause: Requested device :01:00.0 cannot be used > > Below are the details on modules, hugepages and device binding. > root at arm64:~# lsmod > Module Size Used by > rte_kni 292795 0 > igb_uio 4338 0 > ixgbe 184456 0 > > root at arm64:~/dpdk# cat > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > 2048 > > root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status > > Network devices using DPDK-compatible driver > > :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' > drv=igb_uio unused= > :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' > drv=igb_uio unused= > > Network devices using kernel driver > === > > > Other network devices > = > > root at arm64:~/dpdk# > > Thanks, > Ankit >
[dpdk-dev] Getting error while running DPDK test app on X-Gene1
On Wed, Jan 13, 2016 at 03:52:01PM +0530, Ankit Jindal wrote: > Hi, > > We are trying to run dpdk on our arm64 based SOC having Intel 10G > ixgbe PCIe card plugged. While running any test app, we are getting > following error. > > EAL: PCI device :01:00.0 on NUMA socket 0 > EAL: probe driver: 8086:10fb rte_ixgbe_pmd > EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such > file or directory > EAL: Error - exiting with code: 1 > Cause: Requested device :01:00.0 cannot be used pci resource creation patch is not yet part of the arm64 mainline kernel. The following patch should fix the problem. http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358906.html Jerin > > Below are the details on modules, hugepages and device binding. > root at arm64:~# lsmod > Module Size Used by > rte_kni 292795 0 > igb_uio 4338 0 > ixgbe 184456 0 > > root at arm64:~/dpdk# cat > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages > 2048 > > root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status > > Network devices using DPDK-compatible driver > > :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' > drv=igb_uio unused= > :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' > drv=igb_uio unused= > > Network devices using kernel driver > === > > > Other network devices > = > > root at arm64:~/dpdk# > > Thanks, > Ankit
[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver
On 2016/01/12 15:58, Yuanhan Liu wrote: > v2: - fix a data corruption reported by Qian, due to hdr size mismatch. > check detailes at ptach 5. > > - Add missing config_irq and isr reading support from v1. > > - fix comments from v1. > > Almost all difference comes from virtio 1.0 are the PCI layout change: > the major configuration structures are stored at bar space, and their > location is stored at corresponding pci cap structure. Reading/parsing > them is one of the major work of patch 7. > > To make handling virtio v1.0 and v0.95 co-exist well, this patch set > introduces a virtio_pci_ops structure, to add another layer so that > we could keep those vtpci_foo_bar "APIs". With that, we could do the > minimum change to add virtio 1.0 support. > > > --- > Yuanhan Liu (7): > virtio: don't set vring address again at queue startup > virtio: introduce struct virtio_pci_ops > virtio: move left pci stuff to virtio_pci.c > viritio: switch to 64 bit features > virtio: retrieve hdr_size from hw->vtnet_hdr_size > eal: pci: export pci_map_device > virtio: add 1.0 support > > doc/guides/rel_notes/release_2_3.rst| 3 + > drivers/net/virtio/virtio_ethdev.c | 301 +- > drivers/net/virtio/virtio_ethdev.h | 3 +- > drivers/net/virtio/virtio_pci.c | 768 > +++- > drivers/net/virtio/virtio_pci.h | 102 +++- > drivers/net/virtio/virtio_rxtx.c| 21 +- > drivers/net/virtio/virtqueue.h | 4 +- > lib/librte_eal/bsdapp/eal/eal_pci.c | 2 +- > lib/librte_eal/bsdapp/eal/rte_eal_version.map | 6 + > lib/librte_eal/common/eal_common_pci.c | 2 +- > lib/librte_eal/common/eal_private.h | 11 - > lib/librte_eal/common/include/rte_pci.h | 11 + > lib/librte_eal/linuxapp/eal/eal_pci.c | 2 +- > lib/librte_eal/linuxapp/eal/rte_eal_version.map | 6 + > 14 files changed, 899 insertions(+), 343 deletions(-) > Hi Yuanhan and Jianfeng, Thanks for great patches. I want to use VIRTIO-1.0 feature for my virtio container patch, because it will solve 44 bit memory address limitation. (So far, legacy virtio-net device only receives queue address under (1 << (32 + 12)).) I have a few comments to rebase virtio container patches on this patches. 1. VIRTIO_READ_REG_X So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h. But these macros are only referred by virtio_pci.c. How about moving the macros to virtio_pci.c? 2. Abstraction of read/write accesses. It may be difficult to cleanly rebase my patches on this patches, because virtio_read_caps() is not abstracted. Let me describe it more. So far, we need to handle below 3 virtio-net devices.. - physical virtio-net device. - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch) - virtual virtio-net device in QEMU. (my patch) Almost all code of the virtio-net PMD can be shared between above different cases. Probably big difference is how to access to configuration space. Yuanhan's patch introduces an abstraction layer to hide configuration space layout and how to access it. Is it possible to separate? I guess "access method" will be nice to be abstracted separately from "configuration space layout". Probably access method will be defined by "eth_dev->dev_type" and the PMD name like "eth_cvio". And "configuration space layout" will be defined by capability list of PCI configuration layout. For example, if access method like below are abstracted separately and current "virtio_pci.c" is implemented on this abstraction, we can easily re-use virtio_read_caps(). - how to read/write virtio configuration space. - how to mmap PCI configuration space. - how to read/(write) PCI configuration space. Thanks, Tetsuya
[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver
On Thu, Jan 14, 2016 at 01:27:37PM +0900, Tetsuya Mukawa wrote: > On 2016/01/12 15:58, Yuanhan Liu wrote: > > v2: - fix a data corruption reported by Qian, due to hdr size mismatch. > > check detailes at ptach 5. > > > > - Add missing config_irq and isr reading support from v1. > > > > - fix comments from v1. > > > > Almost all difference comes from virtio 1.0 are the PCI layout change: > > the major configuration structures are stored at bar space, and their > > location is stored at corresponding pci cap structure. Reading/parsing > > them is one of the major work of patch 7. > > > > To make handling virtio v1.0 and v0.95 co-exist well, this patch set > > introduces a virtio_pci_ops structure, to add another layer so that > > we could keep those vtpci_foo_bar "APIs". With that, we could do the > > minimum change to add virtio 1.0 support. > > > > > > Hi Yuanhan and Jianfeng, > > Thanks for great patches. > I want to use VIRTIO-1.0 feature for my virtio container patch, because > it will solve 44 bit memory address limitation. > (So far, legacy virtio-net device only receives queue address under (1 > << (32 + 12)).) > > I have a few comments to rebase virtio container patches on this patches. > > 1. VIRTIO_READ_REG_X > > So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h. > But these macros are only referred by virtio_pci.c. > How about moving the macros to virtio_pci.c? Jianfeng had same suggestion. I could do that in next version then. > 2. Abstraction of read/write accesses. > > It may be difficult to cleanly rebase my patches on this patches, > because virtio_read_caps() is not abstracted. I don't think we can/need abstract virtio_read_caps() here. As that detects wheter it is a legacy or modern (virtio 1.0) virtio device or not. If virtio_read_caps failes, which could either due to pci map failed, or because malformed pci layout, we fallback to legacy virtio 1.0 handling, using io port read/write to do configuration. > Let me describe it more. > So far, we need to handle below 3 virtio-net devices.. > - physical virtio-net device. > - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch) > - virtual virtio-net device in QEMU. (my patch) > > Almost all code of the virtio-net PMD can be shared between above > different cases. > Probably big difference is how to access to configuration space. > > Yuanhan's patch introduces an abstraction layer to hide configuration > space layout and how to access it. Actually, I didn't introduce the abstraction for pci device access. It's just a simple "if ... else ..." case here: use io port read/write, the VIRTIO_READ/WRITE_REG_X macros, to do access for legacy virtio, otherwise for modern virtio, use direct mapped memory read/write access: modern_read/writex. > Is it possible to separate? As stated, there is no mix, therefore no need for seperation. But you could add another access abstraction layer, and assign it properly later, say, by checking eth_dev->dev_type as you suggested below. > I guess "access method" will be nice to be abstracted separately from > "configuration space layout". > Probably access method will be defined by "eth_dev->dev_type" and the > PMD name like "eth_cvio". > And "configuration space layout" will be defined by capability list of > PCI configuration layout. > > For example, if access method like below are abstracted separately and > current "virtio_pci.c" is implemented on this abstraction, we can easily > re-use virtio_read_caps(). > - how to read/write virtio configuration space. It's abstracted by virtio_pci_ops. > - how to mmap PCI configuration space. > - how to read/(write) PCI configuration space. For now, it's actually done by EAL, or by functions provided by EAL. I haven't read your (as well Jianfeng's) code yet, but seems that you need implement another set of functions for above needes for your virtio device. If so, I'd suggest you (or Jianfeng) to do the abstraction based on my patchset: what I am kind of sure is that I should not add those abstraction here, simply for it has nothing to do with virtio 1.0 enabling. --yliu
[dpdk-dev] VFIO no-iommu
On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson wrote: > > So it works. Is it acceptable? Useful? Sufficiently complete? Does > it imply deprecating the uio interface? I believe the feature that > started this discussion was support for MSI/X interrupts so that VFs > can support some kind of interrupt (uio only supports INTx since it > doesn't allow DMA). Implementing that would be the ultimate test of > whether this provides dpdk with not only a more consistent interface, > but the feature dpdk wants that's missing in uio. Thanks, > Hi Alex, Sorry for jumping in. Just being curious, how does VFIO No-IOMMU mode support DMA from userspace drivers? If I understand correctly, due to the absence of IOMMU, pcidev has to use physaddr to start a DMA transaction, but how it is supposed to get physaddr from userspace drivers, /proc//pagemap or something else? -- Thanks, Jike
[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver
Hi Tetsuya, On 1/14/2016 12:27 PM, Tetsuya Mukawa wrote: > On 2016/01/12 15:58, Yuanhan Liu wrote: > Hi Yuanhan and Jianfeng, > > Thanks for great patches. > I want to use VIRTIO-1.0 feature for my virtio container patch, because > it will solve 44 bit memory address limitation. > (So far, legacy virtio-net device only receives queue address under (1 > << (32 + 12)).) I suppose you are specifying the code below: /* * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit, * and only accepts 32 bit page frame number. * Check if the allocated physical memory exceeds 16TB. */ if ((mz->phys_addr + vq->vq_ring_size - 1) >> (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) { PMD_INIT_LOG(ERR, "vring address shouldn't be above 16TB!"); rte_free(vq); return -ENOMEM; } So you don't need to add extra cmd option, right? > > I have a few comments to rebase virtio container patches on this patches. > > 1. VIRTIO_READ_REG_X > > So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h. > But these macros are only referred by virtio_pci.c. > How about moving the macros to virtio_pci.c? +1 for this. > > 2. Abstraction of read/write accesses. > > It may be difficult to cleanly rebase my patches on this patches, > because virtio_read_caps() is not abstracted. > Let me describe it more. > So far, we need to handle below 3 virtio-net devices.. > - physical virtio-net device. > - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch) > - virtual virtio-net device in QEMU. (my patch) > > Almost all code of the virtio-net PMD can be shared between above > different cases. > Probably big difference is how to access to configuration space. > > Yuanhan's patch introduces an abstraction layer to hide configuration > space layout and how to access it. > Is it possible to separate? > I guess "access method" will be nice to be abstracted separately from > "configuration space layout". > Probably access method will be defined by "eth_dev->dev_type" and the > PMD name like "eth_cvio". > And "configuration space layout" will be defined by capability list of > PCI configuration layout. > > For example, if access method like below are abstracted separately and > current "virtio_pci.c" is implemented on this abstraction, we can easily > re-use virtio_read_caps(). > - how to read/write virtio configuration space. > - how to mmap PCI configuration space. > - how to read/(write) PCI configuration space. I basically agree with you. We have two dimensions here: legacy modern physical virtio device: Use virtio_read_caps_phys() to distinguish virtual virtio device (Tetsuya): Use virtio_read_caps_virt() to distinguish virtual virtio device (Jianfeng):does not need a "configuration space layout", no need to distinguish So in vtpci_init(), we needs to test "eth_dev->dev_type" firstly vtpci_init() { if (eth_dev->dev_type == RTE_ETH_DEV_PCI) { if (virtio_read_caps_phys()) { // modern } else { // legacy } } else { if (Tetsuya's way) { if (virtio_read_caps_virt()) { // modern } else { // legacy } } else { // Jianfeng's way } } } And from Yuanhan's angle, I think he does not need to address this problem. How do you think? Thanks, Jianfeng > > Thanks, > Tetsuya
[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver
On 2016/01/14 15:09, Tan, Jianfeng wrote: > > Hi Tetsuya, > > On 1/14/2016 12:27 PM, Tetsuya Mukawa wrote: >> On 2016/01/12 15:58, Yuanhan Liu wrote: >> Hi Yuanhan and Jianfeng, >> >> Thanks for great patches. >> I want to use VIRTIO-1.0 feature for my virtio container patch, because >> it will solve 44 bit memory address limitation. >> (So far, legacy virtio-net device only receives queue address under (1 >> << (32 + 12)).) > > I suppose you are specifying the code below: > /* > * Virtio PCI device VIRTIO_PCI_QUEUE_PF register is 32bit, > * and only accepts 32 bit page frame number. > * Check if the allocated physical memory exceeds 16TB. > */ > if ((mz->phys_addr + vq->vq_ring_size - 1) >> > (VIRTIO_PCI_QUEUE_ADDR_SHIFT + 32)) { > PMD_INIT_LOG(ERR, "vring address shouldn't be above > 16TB!"); > rte_free(vq); > return -ENOMEM; > } > > So you don't need to add extra cmd option, right? Yes, this is the code. In our case, instead of using physical address, virtual address will be used, right? Problem is that virtual address will be over (1 << 44) without specifying some mmap options. Probably we need to specify "MAP_FIXED" option while mmapping EAL memory to get lower address. But if we can use VIRTIO-1.0, we can specify 64bit address, then we don't need to do something tricky mmaping. > >> >> I have a few comments to rebase virtio container patches on this >> patches. >> >> 1. VIRTIO_READ_REG_X >> >> So far, VIRTIO_READ_REG_1/2/4 are defined in virtio_pci.h. >> But these macros are only referred by virtio_pci.c. >> How about moving the macros to virtio_pci.c? > > +1 for this. > >> >> 2. Abstraction of read/write accesses. >> >> It may be difficult to cleanly rebase my patches on this patches, >> because virtio_read_caps() is not abstracted. >> Let me describe it more. >> So far, we need to handle below 3 virtio-net devices.. >> - physical virtio-net device. >> - virtual virtio-net device in virtio-net PMD. (Jianfeng's patch) >> - virtual virtio-net device in QEMU. (my patch) >> >> Almost all code of the virtio-net PMD can be shared between above >> different cases. >> Probably big difference is how to access to configuration space. >> >> Yuanhan's patch introduces an abstraction layer to hide configuration >> space layout and how to access it. >> Is it possible to separate? >> I guess "access method" will be nice to be abstracted separately from >> "configuration space layout". >> Probably access method will be defined by "eth_dev->dev_type" and the >> PMD name like "eth_cvio". >> And "configuration space layout" will be defined by capability list of >> PCI configuration layout. >> >> For example, if access method like below are abstracted separately and >> current "virtio_pci.c" is implemented on this abstraction, we can easily >> re-use virtio_read_caps(). >> - how to read/write virtio configuration space. >> - how to mmap PCI configuration space. >> - how to read/(write) PCI configuration space. > > > I basically agree with you. We have two dimensions here: > > legacy modern > physical virtio device: Use > virtio_read_caps_phys() to distinguish > virtual virtio device (Tetsuya): Use virtio_read_caps_virt() to > distinguish > virtual virtio device (Jianfeng):does not need a "configuration > space layout", no need to distinguish > > So in vtpci_init(), we needs to test "eth_dev->dev_type" firstly > > vtpci_init() { > if (eth_dev->dev_type == RTE_ETH_DEV_PCI) { > if (virtio_read_caps_phys()) { > // modern > } else { > // legacy > } > } else { > if (Tetsuya's way) { > if (virtio_read_caps_virt()) { > // modern > } else { > // legacy > } > } else { > // Jianfeng's way > } > } > } > > And from Yuanhan's angle, I think he does not need to address this > problem. How do you think? Yes, I agree he doesn't need. Firstly, I have implemented like above, then I noticed that virtio_read_caps_phy() and virtio_read_caps_virt() are same except for access method. Anyway, I guess abstracting access method is not so difficult. If you are OK, I want to send RFC on Yuanhan's patch. Is it OK? Thanks, Tetsuya
[dpdk-dev] [PATCH v2 0/7] virtio 1.0 enabling for virtio pmd driver
On Thu, Jan 14, 2016 at 02:09:18PM +0800, Tan, Jianfeng wrote: ... > I basically agree with you. We have two dimensions here: > > legacy modern > physical virtio device: Use virtio_read_caps_phys() to > distinguish > virtual virtio device (Tetsuya): Use virtio_read_caps_virt() to > distinguish > virtual virtio device (Jianfeng):does not need a "configuration space > layout", no need to distinguish I guess you meant to build a form or something, but seems you failed :) > > So in vtpci_init(), we needs to test "eth_dev->dev_type" firstly > > vtpci_init() { > if (eth_dev->dev_type == RTE_ETH_DEV_PCI) { > if (virtio_read_caps_phys()) { > // modern > } else { > // legacy > } > } else { > if (Tetsuya's way) { > if (virtio_read_caps_virt()) { > // modern > } else { > // legacy > } > } else { > // Jianfeng's way > } > } > } Normally, I'd like to hide the details inside virtio_read_caps(): I don't want similar codes to be appeared twice. And if it can be simply done by "if (eth_dev->dev_type == ...)", I'd like to do it in this way. If not, introducing another set of operation abstractions as suggested in my another email might be a better option. > And from Yuanhan's angle, I think he does not need to address this problem. Yep; it just has nothing to do with this patch set. --yliu
[dpdk-dev] librte_power w/ intel_pstate cpufreq governor
On Tue, Jan 12, 2016 at 03:17:21PM +, Zhang, Helin wrote: > Hi Matthew > > Yes, you have indicated out the key, the power management module has changed > or upgraded. > Could you help to try the legacy one to see if it still works, as indicated > in your link? I can do this, but according to the documents I am reading, the old Power Management module is secretly stubbed out / no-opped inside of the Skylake CPU core, and the core manages its own clockrate internally every 1 msec instead of every 30 msec with input from the OS (Intel Speed Shift technology). If this is true, then I suspect there is no point to getting it to work again with either the old frequency driver or the new driver, because the chip would not listen to it. So then it seems like it makes sense to skip the clock adjustment callbacks on Skylake and take extra stuff out of the fastpath code. > Taking control of the governor from kernel to user space, might need one > more checks before that. But it is actually not a big issue, as user can > switch it back to anything via 'echo'. I think it's a bit bigger issue, as it leaves the chip in full-power mode without really warning anybody, instead of the standard default adaptive mode. > Yes, it seems that librte_power is out of date for a while. It is not easy > to track all the kernel versions. Now we have good chance to do that, as you > have reported issues. Let's have a look on the new power management > mechanism and then see if we can do something. Yes, let me know how I could help. I don't know very much yet. My machine is Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because there is no whitepaper on the Intel website for Intel Speed Shift technology at all. > Really thanks to your questions! I am looking forward to getting some answers figured out together. > Regards, > Helin Matthew.
[dpdk-dev] librte_power w/ intel_pstate cpufreq governor
On Thu, Jan 14, 2016 at 02:03:55AM -0500, Matthew Hall wrote: > Yes, let me know how I could help. I don't know very much yet. My machine is > Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because > there is no whitepaper on the Intel website for Intel Speed Shift technology > at all. This is the closest thing I could find: http://wccftech.com/idf15-intel-skylake-analysis-cpu-gpu-microarchitecture-ddr4-memory-impact/4/ Some copy of a presentation from Intel IDF15. Can somebody at Intel help me to find more papers or the right instruction or architecture manuals for HWP (Hardware P-State) feature? Matthew.
[dpdk-dev] librte_power w/ intel_pstate cpufreq governor
> -Original Message- > From: Matthew Hall [mailto:mhall at mhcomputing.net] > Sent: Thursday, January 14, 2016 3:04 PM > To: Zhang, Helin > Cc: dev at dpdk.org; Liang, Cunming; Zhou, Danny > Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor > > On Tue, Jan 12, 2016 at 03:17:21PM +, Zhang, Helin wrote: > > Hi Matthew > > > > Yes, you have indicated out the key, the power management module has > changed or upgraded. > > Could you help to try the legacy one to see if it still works, as indicated > > in > your link? > > I can do this, but according to the documents I am reading, the old Power > Management module is secretly stubbed out / no-opped inside of the > Skylake CPU core, and the core manages its own clockrate internally every 1 > msec instead of every 30 msec with input from the OS (Intel Speed Shift > technology). > > If this is true, then I suspect there is no point to getting it to work again > with > either the old frequency driver or the new driver, because the chip would > not listen to it. So then it seems like it makes sense to skip the clock > adjustment callbacks on Skylake and take extra stuff out of the fastpath code. That's disappointing if Skylake is like that. Let's have a learning first, and then check if we can fix that. But in addition, DPDK provide interrupt based packet receiving mechanism, can it be one of your choice? For now, I am afraid that I don't have time on it, as we are all focusing on the next release development. If no objection, I will find time later (may be in a month) to investigate that. Of cause, please try to investigate that from your side. > > > Taking control of the governor from kernel to user space, might need > > one more checks before that. But it is actually not a big issue, as > > user can switch it back to anything via 'echo'. > > I think it's a bit bigger issue, as it leaves the chip in full-power mode > without > really warning anybody, instead of the standard default adaptive mode. That's always there, for example, DPDK can exit accidently, without caring anything. Then you can have the similar issue again. > > > Yes, it seems that librte_power is out of date for a while. It is not > > easy to track all the kernel versions. Now we have good chance to do > > that, as you have reported issues. Let's have a look on the new power > > management mechanism and then see if we can do something. > > Yes, let me know how I could help. I don't know very much yet. My machine > is Skylake Core i7-6700k. Unfortunately I think I am in trouble here, because > there is no whitepaper on the Intel website for Intel Speed Shift technology > at all. It seems that you are so important for Intel. :) I don't have Skylake in hand. :( Anyway, I will try to find time on that, and hopefully will find something or solution. Thank you very much for the great jobs! Regards, Helin > > > Really thanks to your questions! > > I am looking forward to getting some answers figured out together. > > > Regards, > > Helin > > Matthew.
[dpdk-dev] [PATCH v3 1/8] virtio: don't set vring address again at queue startup
As we have already set up it at virtio_dev_queue_setup(), and a vq restart will not reset the settings. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_rxtx.c | 15 --- 1 file changed, 15 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 74b39ef..b7267c0 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -339,11 +339,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type) vq_update_avail_idx(vq); PMD_INIT_LOG(DEBUG, "Allocated %d bufs", nbufs); - - VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL, - vq->vq_queue_index); - VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN, - vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT); } else if (queue_type == VTNET_TQ) { if (use_simple_rxtx) { int mid_idx = vq->vq_nentries >> 1; @@ -362,16 +357,6 @@ virtio_dev_vring_start(struct virtqueue *vq, int queue_type) for (i = mid_idx; i < vq->vq_nentries; i++) vq->vq_ring.avail->ring[i] = i; } - - VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL, - vq->vq_queue_index); - VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN, - vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT); - } else { - VIRTIO_WRITE_REG_2(vq->hw, VIRTIO_PCI_QUEUE_SEL, - vq->vq_queue_index); - VIRTIO_WRITE_REG_4(vq->hw, VIRTIO_PCI_QUEUE_PFN, - vq->mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT); } } -- 1.9.0
[dpdk-dev] [PATCH v3 0/8] virtio 1.0 enabling for virtio pmd driver
v3: - export pci_unmap_device as well; and invoke it at virtio uninit stage. - fixed same data corruption bug reported by Qian in simple rxtx code path. - move VIRTIO_READ/WRITE_REG_X to virtio_pci.c v2: - fix a data corruption reported by Qian, due to hdr size mismatch. check detailes at ptach 5. - Add missing config_irq and isr reading support from v1. - fix comments from v1. Almost all difference comes from virtio 1.0 are the PCI layout change: the major configuration structures are stored at bar space, and their location is stored at corresponding pci cap structure. Reading/parsing them is one of the major work of patch 7. To make handling virtio v1.0 and v0.95 co-exist well, this patch set introduces a virtio_pci_ops structure, to add another layer so that we could keep those vtpci_foo_bar "APIs". With that, we could do the minimum change to add virtio 1.0 support. Rough test guide Firstly, you need get a virtio 1.0 supported QEMU (say, v2.5), then add option "disable-modern=false" to qemu virtio-net-pci device to enable virtio 1.0 (which is disabled by default). And if you see something like following from 'lspci -v', it means virtio 1.0 is indeed enabled: 00:04.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 4 Flags: bus master, fast devsel, latency 0, IRQ 11 I/O ports at c040 [size=64] Memory at febf1000 (32-bit, non-prefetchable) [size=4K] Memory at fe00 (64-bit, prefetchable) [size=8M] Expansion ROM at feb8 [disabled] [size=256K] Capabilities: [98] MSI-X: Enable+ Count=6 Masked- ==> Capabilities: [84] Vendor Specific Information: Len=14 ==> Capabilities: [70] Vendor Specific Information: Len=14 ==> Capabilities: [60] Vendor Specific Information: Len=10 ==> Capabilities: [50] Vendor Specific Information: Len=10 ==> Capabilities: [40] Vendor Specific Information: Len=10 Kernel driver in use: virtio-pci Kernel modules: virtio_pci After that, there wasn't anything speical comparing to the old virtio 0.95 pmd driver. --- Yuanhan Liu (8): virtio: don't set vring address again at queue startup virtio: introduce struct virtio_pci_ops virtio: move left pci stuff to virtio_pci.c viritio: switch to 64 bit features virtio: retrieve hdr_size from hw->vtnet_hdr_size eal: pci: export pci_[un]map_device virtio: add 1.0 support virtio: move VIRTIO_READ/WRITE_REG_X into virtio_pci.c doc/guides/rel_notes/release_2_3.rst| 3 + drivers/net/virtio/virtio_ethdev.c | 302 + drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtio_pci.c | 787 +++- drivers/net/virtio/virtio_pci.h | 120 +++- drivers/net/virtio/virtio_rxtx.c| 21 +- drivers/net/virtio/virtio_rxtx_simple.c | 12 +- drivers/net/virtio/virtqueue.h | 4 +- lib/librte_eal/bsdapp/eal/eal_pci.c | 4 +- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 7 + lib/librte_eal/common/eal_common_pci.c | 4 +- lib/librte_eal/common/eal_private.h | 18 - lib/librte_eal/common/include/rte_pci.h | 27 + lib/librte_eal/linuxapp/eal/eal_pci.c | 4 +- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 7 + 15 files changed, 946 insertions(+), 377 deletions(-) -- 1.9.0
[dpdk-dev] [PATCH v3 3/8] virtio: move left pci stuff to virtio_pci.c
virtio_pci.c is a more proper place for pci stuff; virtio_ethdev.c is not. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 265 +--- drivers/net/virtio/virtio_pci.c| 270 - 2 files changed, 270 insertions(+), 265 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 6c1d3a0..b57224d 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -36,10 +36,6 @@ #include #include #include -#ifdef RTE_EXEC_ENV_LINUXAPP -#include -#include -#endif #include #include @@ -955,260 +951,6 @@ virtio_negotiate_features(struct virtio_hw *hw) hw->guest_features); } -#ifdef RTE_EXEC_ENV_LINUXAPP -static int -parse_sysfs_value(const char *filename, unsigned long *val) -{ - FILE *f; - char buf[BUFSIZ]; - char *end = NULL; - - f = fopen(filename, "r"); - if (f == NULL) { - PMD_INIT_LOG(ERR, "%s(): cannot open sysfs value %s", -__func__, filename); - return -1; - } - - if (fgets(buf, sizeof(buf), f) == NULL) { - PMD_INIT_LOG(ERR, "%s(): cannot read sysfs value %s", -__func__, filename); - fclose(f); - return -1; - } - *val = strtoul(buf, &end, 0); - if ((buf[0] == '\0') || (end == NULL) || (*end != '\n')) { - PMD_INIT_LOG(ERR, "%s(): cannot parse sysfs value %s", -__func__, filename); - fclose(f); - return -1; - } - fclose(f); - return 0; -} - -static int get_uio_dev(struct rte_pci_addr *loc, char *buf, unsigned int buflen, - unsigned int *uio_num) -{ - struct dirent *e; - DIR *dir; - char dirname[PATH_MAX]; - - /* depending on kernel version, uio can be located in uio/uioX -* or uio:uioX */ - snprintf(dirname, sizeof(dirname), -SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/uio", -loc->domain, loc->bus, loc->devid, loc->function); - dir = opendir(dirname); - if (dir == NULL) { - /* retry with the parent directory */ - snprintf(dirname, sizeof(dirname), -SYSFS_PCI_DEVICES "/" PCI_PRI_FMT, -loc->domain, loc->bus, loc->devid, loc->function); - dir = opendir(dirname); - - if (dir == NULL) { - PMD_INIT_LOG(ERR, "Cannot opendir %s", dirname); - return -1; - } - } - - /* take the first file starting with "uio" */ - while ((e = readdir(dir)) != NULL) { - /* format could be uio%d ...*/ - int shortprefix_len = sizeof("uio") - 1; - /* ... or uio:uio%d */ - int longprefix_len = sizeof("uio:uio") - 1; - char *endptr; - - if (strncmp(e->d_name, "uio", 3) != 0) - continue; - - /* first try uio%d */ - errno = 0; - *uio_num = strtoull(e->d_name + shortprefix_len, &endptr, 10); - if (errno == 0 && endptr != (e->d_name + shortprefix_len)) { - snprintf(buf, buflen, "%s/uio%u", dirname, *uio_num); - break; - } - - /* then try uio:uio%d */ - errno = 0; - *uio_num = strtoull(e->d_name + longprefix_len, &endptr, 10); - if (errno == 0 && endptr != (e->d_name + longprefix_len)) { - snprintf(buf, buflen, "%s/uio:uio%u", dirname, -*uio_num); - break; - } - } - closedir(dir); - - /* No uio resource found */ - if (e == NULL) { - PMD_INIT_LOG(ERR, "Could not find uio resource"); - return -1; - } - - return 0; -} - -static int -virtio_has_msix(const struct rte_pci_addr *loc) -{ - DIR *d; - char dirname[PATH_MAX]; - - snprintf(dirname, sizeof(dirname), -SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/msi_irqs", -loc->domain, loc->bus, loc->devid, loc->function); - - d = opendir(dirname); - if (d) - closedir(d); - - return (d != NULL); -} - -/* Extract I/O port numbers from sysfs */ -static int virtio_resource_init_by_uio(struct rte_pci_device *pci_dev) -{ - char dirname[PATH_MAX]; - char filename[PATH_MAX]; - unsigned long start, size; - unsigned int uio_num; - - if (get_uio_dev(&pci_dev->addr, dirname, sizeof(dirname), &uio_num) < 0) - return -1; - - /* get portio size */ - snprintf(filename, sizeof(filename), -"%s/portio
[dpdk-dev] [PATCH v3 2/8] virtio: introduce struct virtio_pci_ops
Introduce struct virtio_pci_ops, to let legacy virtio (v0.95) and modern virtio (1.0) have different implementation regarding to a specific pci action, such as read host status. With that, this patch reimplements all exported pci functions, in a way like: vtpci_foo_bar(struct virtio_hw *hw) { hw->vtpci_ops->foo_bar(hw); } So that we need pay attention to those pci related functions only while adding virtio 1.0 support. This patch introduced a new vtpci function, vtpci_init(), to do proper virtio pci settings. It's pretty simple so far: just sets hw->vtpci_ops to legacy_ops as we don't support 1.0 yet. Signed-off-by: Yuanhan Liu --- v2: extra whitespace line removing, and comment on "reading status after reset". rename the badly taken op name "set_irq" to "set_config_irq". --- drivers/net/virtio/virtio_ethdev.c | 22 ++ drivers/net/virtio/virtio_pci.c| 158 ++--- drivers/net/virtio/virtio_pci.h| 27 +++ drivers/net/virtio/virtqueue.h | 2 +- 4 files changed, 166 insertions(+), 43 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index d928339..6c1d3a0 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -272,9 +272,7 @@ virtio_dev_queue_release(struct virtqueue *vq) { if (vq) { hw = vq->hw; - /* Select and deactivate the queue */ - VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vq->vq_queue_index); - VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, 0); + hw->vtpci_ops->del_queue(hw, vq); rte_free(vq->sw_ring); rte_free(vq); @@ -295,15 +293,13 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, struct virtio_hw *hw = dev->data->dev_private; struct virtqueue *vq = NULL; - /* Write the virtqueue index to the Queue Select Field */ - VIRTIO_WRITE_REG_2(hw, VIRTIO_PCI_QUEUE_SEL, vtpci_queue_idx); - PMD_INIT_LOG(DEBUG, "selecting queue: %u", vtpci_queue_idx); + PMD_INIT_LOG(DEBUG, "setting up queue: %u", vtpci_queue_idx); /* * Read the virtqueue size from the Queue Size field * Always power of 2 and if 0 virtqueue does not exist */ - vq_size = VIRTIO_READ_REG_2(hw, VIRTIO_PCI_QUEUE_NUM); + vq_size = hw->vtpci_ops->get_queue_num(hw, vtpci_queue_idx); PMD_INIT_LOG(DEBUG, "vq_size: %u nb_desc:%u", vq_size, nb_desc); if (vq_size == 0) { PMD_INIT_LOG(ERR, "%s: virtqueue does not exist", __func__); @@ -436,12 +432,8 @@ int virtio_dev_queue_setup(struct rte_eth_dev *dev, memset(vq->virtio_net_hdr_mz->addr, 0, PAGE_SIZE); } - /* -* Set guest physical address of the virtqueue -* in VIRTIO_PCI_QUEUE_PFN config register of device -*/ - VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_QUEUE_PFN, - mz->phys_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT); + hw->vtpci_ops->setup_queue(hw, vq); + *pvq = vq; return 0; } @@ -950,7 +942,7 @@ virtio_negotiate_features(struct virtio_hw *hw) hw->guest_features); /* Read device(host) feature bits */ - host_features = VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES); + host_features = hw->vtpci_ops->get_features(hw); PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x", host_features); @@ -1287,6 +1279,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) pci_dev = eth_dev->pci_dev; + vtpci_init(pci_dev, hw); + if (virtio_resource_init(pci_dev) < 0) return -1; diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c index 2245bec..9930efa 100644 --- a/drivers/net/virtio/virtio_pci.c +++ b/drivers/net/virtio/virtio_pci.c @@ -34,12 +34,11 @@ #include "virtio_pci.h" #include "virtio_logs.h" +#include "virtqueue.h" -static uint8_t vtpci_get_status(struct virtio_hw *); - -void -vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset, - void *dst, int length) +static void +legacy_read_dev_config(struct virtio_hw *hw, uint64_t offset, + void *dst, int length) { uint64_t off; uint8_t *d; @@ -60,9 +59,9 @@ vtpci_read_dev_config(struct virtio_hw *hw, uint64_t offset, } } -void -vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset, - void *src, int length) +static void +legacy_write_dev_config(struct virtio_hw *hw, uint64_t offset, + void *src, int length) { uint64_t off; uint8_t *s; @@ -83,30 +82,133 @@ vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset, } } +static uint32_t +legacy_get_features(struct virtio_hw *hw) +{ + return VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES); +} + +static void +legacy_set_fea
[dpdk-dev] [PATCH v3 4/8] viritio: switch to 64 bit features
Switch to 64 bit features, that virtio 1.0 supports. While legacy virtio only supports 32 bit features, here we complain aloud and quit when trying to setting > 32 bit features for legacy device. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_ethdev.c | 8 drivers/net/virtio/virtio_pci.c| 15 ++- drivers/net/virtio/virtio_pci.h| 12 ++-- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index b57224d..94e0c4a 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -930,16 +930,16 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on) static void virtio_negotiate_features(struct virtio_hw *hw) { - uint32_t host_features; + uint64_t host_features; /* Prepare guest_features: feature that driver wants to support */ hw->guest_features = VIRTIO_PMD_GUEST_FEATURES; - PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %x", + PMD_INIT_LOG(DEBUG, "guest_features before negotiate = %"PRIx64, hw->guest_features); /* Read device(host) feature bits */ host_features = hw->vtpci_ops->get_features(hw); - PMD_INIT_LOG(DEBUG, "host_features before negotiate = %x", + PMD_INIT_LOG(DEBUG, "host_features before negotiate = %"PRIx64, host_features); /* @@ -947,7 +947,7 @@ virtio_negotiate_features(struct virtio_hw *hw) * guest feature bits. */ hw->guest_features = vtpci_negotiate_features(hw, host_features); - PMD_INIT_LOG(DEBUG, "features after negotiate = %x", + PMD_INIT_LOG(DEBUG, "features after negotiate = %"PRIx64, hw->guest_features); } diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c index 03d623b..5eed57e 100644 --- a/drivers/net/virtio/virtio_pci.c +++ b/drivers/net/virtio/virtio_pci.c @@ -87,15 +87,20 @@ legacy_write_dev_config(struct virtio_hw *hw, uint64_t offset, } } -static uint32_t +static uint64_t legacy_get_features(struct virtio_hw *hw) { return VIRTIO_READ_REG_4(hw, VIRTIO_PCI_HOST_FEATURES); } static void -legacy_set_features(struct virtio_hw *hw, uint32_t features) +legacy_set_features(struct virtio_hw *hw, uint64_t features) { + if ((features >> 32) != 0) { + PMD_DRV_LOG(ERR, + "only 32 bit features are allowed for legacy virtio!"); + return; + } VIRTIO_WRITE_REG_4(hw, VIRTIO_PCI_GUEST_FEATURES, features); } @@ -451,10 +456,10 @@ vtpci_write_dev_config(struct virtio_hw *hw, uint64_t offset, hw->vtpci_ops->write_dev_cfg(hw, offset, src, length); } -uint32_t -vtpci_negotiate_features(struct virtio_hw *hw, uint32_t host_features) +uint64_t +vtpci_negotiate_features(struct virtio_hw *hw, uint64_t host_features) { - uint32_t features; + uint64_t features; /* * Limit negotiated features to what the driver, virtqueue, and diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index ee7d265..3fd86f6 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -175,8 +175,8 @@ struct virtio_pci_ops { uint8_t (*get_status)(struct virtio_hw *hw); void(*set_status)(struct virtio_hw *hw, uint8_t status); - uint32_t (*get_features)(struct virtio_hw *hw); - void (*set_features)(struct virtio_hw *hw, uint32_t features); + uint64_t (*get_features)(struct virtio_hw *hw); + void (*set_features)(struct virtio_hw *hw, uint64_t features); uint8_t (*get_isr)(struct virtio_hw *hw); @@ -191,7 +191,7 @@ struct virtio_pci_ops { struct virtio_hw { struct virtqueue *cvq; uint32_tio_base; - uint32_tguest_features; + uint64_tguest_features; uint32_tmax_tx_queues; uint32_tmax_rx_queues; uint16_tvtnet_hdr_size; @@ -271,9 +271,9 @@ outl_p(unsigned int data, unsigned int port) outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg static inline int -vtpci_with_feature(struct virtio_hw *hw, uint32_t bit) +vtpci_with_feature(struct virtio_hw *hw, uint64_t bit) { - return (hw->guest_features & (1u << bit)) != 0; + return (hw->guest_features & (1ULL << bit)) != 0; } /* @@ -286,7 +286,7 @@ void vtpci_reinit_complete(struct virtio_hw *); void vtpci_set_status(struct virtio_hw *, uint8_t); -uint32_t vtpci_negotiate_features(struct virtio_hw *, uint32_t); +uint64_t vtpci_negotiate_features(struct virtio_hw *, uint64_t); void vtpci_write_dev_config(struct virtio_hw *, uint64_t, void *, int); -- 1.9.0
[dpdk-dev] [PATCH v3 5/8] virtio: retrieve hdr_size from hw->vtnet_hdr_size
The mergeable virtio net hdr format has been the standard and the only virtio net hdr format since virtio 1.0. Therefore, we can not hardcode hdr_size to "sizeof(struct virtio_net_hdr)" any more at virtio_recv_pkts(), otherwise, there would be a mismatch of hdr size from rte_vhost_enqueue_burst() and virtio_recv_pkts(), leading a packet corruption. Instead, we should retrieve it from hw->vtnet_hdr_size; we will do proper settings at eth_virtio_dev_init() in later patches. Signed-off-by: Yuanhan Liu --- v3: retrieve hdr_size from hw->vtnet_hdr_size for simple rxtx code path as well: it should not break anything, as simple rx and mergeable rx still will not co-exist. --- drivers/net/virtio/virtio_rxtx.c| 6 -- drivers/net/virtio/virtio_rxtx_simple.c | 12 ++-- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index b7267c0..41a1366 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -560,7 +560,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) struct rte_mbuf *rcv_pkts[VIRTIO_MBUF_BURST_SZ]; int error; uint32_t i, nb_enqueued; - const uint32_t hdr_size = sizeof(struct virtio_net_hdr); + uint32_t hdr_size; nb_used = VIRTQUEUE_NUSED(rxvq); @@ -580,6 +580,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) hw = rxvq->hw; nb_rx = 0; nb_enqueued = 0; + hdr_size = hw->vtnet_hdr_size; for (i = 0; i < num ; i++) { rxm = rcv_pkts[i]; @@ -664,7 +665,7 @@ virtio_recv_mergeable_pkts(void *rx_queue, uint32_t seg_num; uint16_t extra_idx; uint32_t seg_res; - const uint32_t hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf); + uint32_t hdr_size; nb_used = VIRTQUEUE_NUSED(rxvq); @@ -682,6 +683,7 @@ virtio_recv_mergeable_pkts(void *rx_queue, seg_num = 0; extra_idx = 0; seg_res = 0; + hdr_size = hw->vtnet_hdr_size; while (i < nb_used) { struct virtio_net_hdr_mrg_rxbuf *header; diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index ff3c11a..3e66e8b 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -81,9 +81,9 @@ virtqueue_enqueue_recv_refill_simple(struct virtqueue *vq, start_dp = vq->vq_ring.desc; start_dp[desc_idx].addr = (uint64_t)((uintptr_t)cookie->buf_physaddr + - RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr)); + RTE_PKTMBUF_HEADROOM - vq->hw->vtnet_hdr_size); start_dp[desc_idx].len = cookie->buf_len - - RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr); + RTE_PKTMBUF_HEADROOM + vq->hw->vtnet_hdr_size; vq->vq_free_cnt--; vq->vq_avail_idx++; @@ -120,9 +120,9 @@ virtio_rxq_rearm_vec(struct virtqueue *rxvq) start_dp[i].addr = (uint64_t)((uintptr_t)sw_ring[i]->buf_physaddr + - RTE_PKTMBUF_HEADROOM - sizeof(struct virtio_net_hdr)); + RTE_PKTMBUF_HEADROOM - rxvq->hw->vtnet_hdr_size); start_dp[i].len = sw_ring[i]->buf_len - - RTE_PKTMBUF_HEADROOM + sizeof(struct virtio_net_hdr); + RTE_PKTMBUF_HEADROOM + rxvq->hw->vtnet_hdr_size; } rxvq->vq_avail_idx += RTE_VIRTIO_VPMD_RX_REARM_THRESH; @@ -175,8 +175,8 @@ virtio_recv_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, len_adjust = _mm_set_epi16( 0, 0, 0, - (uint16_t) -sizeof(struct virtio_net_hdr), - 0, (uint16_t) -sizeof(struct virtio_net_hdr), + (uint16_t) -rxvq->hw->vtnet_hdr_size, + 0, (uint16_t) -rxvq->hw->vtnet_hdr_size, 0, 0); if (unlikely(nb_pkts < RTE_VIRTIO_DESC_PER_LOOP)) -- 1.9.0
[dpdk-dev] [PATCH v3 6/8] eal: pci: export pci_[un]map_device
Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will invoke pci_map_device internally for us. From that point view, there is no need to export pci_map_device. However, for virtio pmd driver, which is designed to work without binding UIO (or something similar first), pci_map_device() will fail, which ends up with virtio pmd driver being skipped. Therefore, we can not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver. Therefore, this patch exports pci_map_device, and let virtio pmd call it when necessary. Cc: David Marchand Signed-off-by: Yuanhan Liu --- v3: - export pci_unmap_device as well - Add few more comments about rte_eal_pci_map_device(). --- lib/librte_eal/bsdapp/eal/eal_pci.c | 4 ++-- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 7 +++ lib/librte_eal/common/eal_common_pci.c | 4 ++-- lib/librte_eal/common/eal_private.h | 18 - lib/librte_eal/common/include/rte_pci.h | 27 + lib/librte_eal/linuxapp/eal/eal_pci.c | 4 ++-- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 7 +++ 7 files changed, 47 insertions(+), 24 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 6c21fbd..95c32c1 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -93,7 +93,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused) /* Map pci device */ int -pci_map_device(struct rte_pci_device *dev) +rte_eal_pci_map_device(struct rte_pci_device *dev) { int ret = -1; @@ -115,7 +115,7 @@ pci_map_device(struct rte_pci_device *dev) /* Unmap pci device */ void -pci_unmap_device(struct rte_pci_device *dev) +rte_eal_pci_unmap_device(struct rte_pci_device *dev) { /* try unmapping the NIC resources */ switch (dev->kdrv) { diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 9d7adf1..1b28170 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -135,3 +135,10 @@ DPDK_2.2 { rte_xen_dom0_supported; } DPDK_2.1; + +DPDK_2.3 { + global: + + rte_eal_pci_map_device; + rte_eal_pci_unmap_device; +} DPDK_2.2; diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index dcfe947..96d5113 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -188,7 +188,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d pci_config_space_set(dev); #endif /* map resources for devices that use igb_uio */ - ret = pci_map_device(dev); + ret = rte_eal_pci_map_device(dev); if (ret != 0) return ret; } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND && @@ -254,7 +254,7 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr, if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) /* unmap resources for devices that use igb_uio */ - pci_unmap_device(dev); + rte_eal_pci_unmap_device(dev); return 0; } diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 072e672..2342fa1 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -165,24 +165,6 @@ struct rte_pci_device; int pci_unbind_kernel_driver(struct rte_pci_device *dev); /** - * Map this device - * - * This function is private to EAL. - * - * @return - * 0 on success, negative on error and positive if no driver - * is found for the device. - */ -int pci_map_device(struct rte_pci_device *dev); - -/** - * Unmap this device - * - * This function is private to EAL. - */ -void pci_unmap_device(struct rte_pci_device *dev); - -/** * Map the PCI resource of a PCI device in virtual memory * * This function is private to EAL. diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 334c12e..2224109 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -485,6 +485,33 @@ int rte_eal_pci_read_config(const struct rte_pci_device *device, */ int rte_eal_pci_write_config(const struct rte_pci_device *device, const void *buf, size_t len, off_t offset); +/** + * Map the PCI device resources in user space virtual memory address + * + * Note that driver should not call this function when flag + * RTE_PCI_DRV_NEED_MAPPING is set, as EAL will do that for + * you when it's on. + * + * @param dev + * A pointer to a rte_pci_device structure describing the device + * to use + * + * @return + * 0 on success, negative on error and po
[dpdk-dev] [PATCH v3 7/8] virtio: add 1.0 support
Modern (v1.0) virtio pci device defines several pci capabilities. Each cap has a configure structure corresponding to it, and the cap.bar and cap.offset fields tell us where to find it. Firstly, we map the pci resources by rte_eal_pci_map_device(). We then could easily locate to a cfg structure by: cfg_addr = dev->mem_resources[cap.bar].addr + cap.offset; Therefore, the entrance of enabling modern (v1.0) pci device support is to iterate the pci capability lists, and to locate some configs we care; and they are: - common cfg For generic virtio and virtuqueu configuration, such as setting/getting features, enabling a specific queue, and so on. - nofity cfg Combining with `queue_notify_off' from common cfg, we could use it to notify a specific virt queue. - device cfg Where virtio_net_config structure locates. - isr cfg Where to read isr (interrupt status). If any of above cap is not found, we fallback to the legacy virtio handling. If succeed, hw->vtpci_ops is assigned to modern_ops, where all operations are implemented by reading/writing a (or few) specific configuration space from above 4 cfg structures. And that's basically how this patch works. Besides those changes, virtio 1.0 introduces a new status field: FEATURES_OK, which is set after features negotiation is done. Last, set the VIRTIO_F_VERSION_1 feature flag. Signed-off-by: Yuanhan Liu --- v2: - re-read status after setting FEATURES_OK to make sure status is set correctly. - Add isr reading and config irq setting support. - Define some pci macro on our own to not get the dependency of linux/pci_regs.h, as there should be no such file at non-Linux platform v3: - invoke rte_eal_pci_unmap_device() at uninit stage --- doc/guides/rel_notes/release_2_3.rst | 3 + drivers/net/virtio/virtio_ethdev.c | 25 ++- drivers/net/virtio/virtio_ethdev.h | 3 +- drivers/net/virtio/virtio_pci.c | 335 ++- drivers/net/virtio/virtio_pci.h | 67 +++ drivers/net/virtio/virtqueue.h | 2 + 6 files changed, 430 insertions(+), 5 deletions(-) diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..c390d97 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,9 @@ DPDK Release 2.3 New Features +* **Virtio 1.0 support.** + + Enabled virtio 1.0 support for virtio pmd driver. Resolved Issues --- diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 94e0c4a..deb0382 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -927,7 +927,7 @@ virtio_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on) return virtio_send_command(hw->cvq, &ctrl, &len, 1); } -static void +static int virtio_negotiate_features(struct virtio_hw *hw) { uint64_t host_features; @@ -949,6 +949,22 @@ virtio_negotiate_features(struct virtio_hw *hw) hw->guest_features = vtpci_negotiate_features(hw, host_features); PMD_INIT_LOG(DEBUG, "features after negotiate = %"PRIx64, hw->guest_features); + + if (hw->modern) { + if (!vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) { + PMD_INIT_LOG(ERR, + "VIRTIO_F_VERSION_1 features is not enabled."); + return -1; + } + vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_FEATURES_OK); + if (!(vtpci_get_status(hw) & VIRTIO_CONFIG_STATUS_FEATURES_OK)) { + PMD_INIT_LOG(ERR, + "failed to set FEATURES_OK status!"); + return -1; + } + } + + return 0; } /* @@ -1032,7 +1048,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) /* Tell the host we've known how to drive the device. */ vtpci_set_status(hw, VIRTIO_CONFIG_STATUS_DRIVER); - virtio_negotiate_features(hw); + if (virtio_negotiate_features(hw) < 0) + return -1; /* If host does not support status then disable LSC */ if (!vtpci_with_feature(hw, VIRTIO_NET_F_STATUS)) @@ -1043,7 +1060,8 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) rx_func_get(eth_dev); /* Setting up rx_header size for the device */ - if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) + if (vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF) || + vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr_mrg_rxbuf); else hw->vtnet_hdr_size = sizeof(struct virtio_net_hdr); @@ -1159,6 +1177,7 @@ eth_virtio_dev_uninit(struct rte_eth_dev *eth_dev) rte_intr_callback_unregister(&pci_dev->intr_handle, virtio_interrupt_handle
[dpdk-dev] [PATCH v3 8/8] virtio: move VIRTIO_READ/WRITE_REG_X into virtio_pci.c
virtio_pci.c become the only file references those macros; move them there. Signed-off-by: Yuanhan Liu --- drivers/net/virtio/virtio_pci.c | 19 +++ drivers/net/virtio/virtio_pci.h | 18 -- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/net/virtio/virtio_pci.c b/drivers/net/virtio/virtio_pci.c index 9b62013..8e26f00 100644 --- a/drivers/net/virtio/virtio_pci.c +++ b/drivers/net/virtio/virtio_pci.c @@ -49,6 +49,25 @@ #define PCI_CAPABILITY_LIST0x34 #define PCI_CAP_ID_VNDR0x09 + +#define VIRTIO_PCI_REG_ADDR(hw, reg) \ + (unsigned short)((hw)->io_base + (reg)) + +#define VIRTIO_READ_REG_1(hw, reg) \ + inb((VIRTIO_PCI_REG_ADDR((hw), (reg +#define VIRTIO_WRITE_REG_1(hw, reg, value) \ + outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg + +#define VIRTIO_READ_REG_2(hw, reg) \ + inw((VIRTIO_PCI_REG_ADDR((hw), (reg +#define VIRTIO_WRITE_REG_2(hw, reg, value) \ + outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg + +#define VIRTIO_READ_REG_4(hw, reg) \ + inl((VIRTIO_PCI_REG_ADDR((hw), (reg +#define VIRTIO_WRITE_REG_4(hw, reg, value) \ + outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg + static void legacy_read_dev_config(struct virtio_hw *hw, uint64_t offset, void *dst, int length) diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 6ade642..5400ebd 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -318,24 +318,6 @@ outl_p(unsigned int data, unsigned int port) } #endif -#define VIRTIO_PCI_REG_ADDR(hw, reg) \ - (unsigned short)((hw)->io_base + (reg)) - -#define VIRTIO_READ_REG_1(hw, reg) \ - inb((VIRTIO_PCI_REG_ADDR((hw), (reg -#define VIRTIO_WRITE_REG_1(hw, reg, value) \ - outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg - -#define VIRTIO_READ_REG_2(hw, reg) \ - inw((VIRTIO_PCI_REG_ADDR((hw), (reg -#define VIRTIO_WRITE_REG_2(hw, reg, value) \ - outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg - -#define VIRTIO_READ_REG_4(hw, reg) \ - inl((VIRTIO_PCI_REG_ADDR((hw), (reg -#define VIRTIO_WRITE_REG_4(hw, reg, value) \ - outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg - static inline int vtpci_with_feature(struct virtio_hw *hw, uint64_t bit) { -- 1.9.0
[dpdk-dev] [PATCH v3 6/8] eal: pci: export pci_[un]map_device
On Thu, Jan 14, 2016 at 03:42:50PM +0800, Yuanhan Liu wrote: > Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will > invoke pci_map_device internally for us. From that point view, there > is no need to export pci_map_device. > > However, for virtio pmd driver, which is designed to work without > binding UIO (or something similar first), pci_map_device() will fail, > which ends up with virtio pmd driver being skipped. Therefore, we can > not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver. > > Therefore, this patch exports pci_map_device, and let virtio pmd > call it when necessary. > > Cc: David Marchand > Signed-off-by: Yuanhan Liu Oops, forgot to carry the tested-by from Santosh. Tested-By: Santosh Shukla --yliu
[dpdk-dev] librte_power w/ intel_pstate cpufreq governor
On Thu, Jan 14, 2016 at 07:15:51AM +, Zhang, Helin wrote: > That's disappointing if Skylake is like that. Let's have a learning first, > and then check if we can fix that. But in addition, DPDK provide interrupt > based packet receiving mechanism, can it be one of your choice? Maybe I am wrong. But I could not disprove what the Linux p_state driver Documentation file and other places claimed, which is that the clockrate control is no-opped, because the white papers on Intel HWP are not findable in the Intel website, or by using Google with the operator "site:intel.com". The IRQ based part is still enabled and works quite well in a very trivial test so far... but the clockrate callback handlers are null and the governor setting gets corrupted, both due to failed init of librte_power. So I will have to rebuild DPDK with the librte_power ACPI + KVM init commented out and the fastpath clockrate callback functions commented out of course. It is minor so I can do it to see what will happen. > If no objection, I will find time later (may be in a month) to investigate > that. Of cause, please try to investigate that from your side. Agreed. > That's always there, for example, DPDK can exit accidently, without caring > anything. Then you can have the similar issue again. Of course, it could. But if there was some kind of shutdown function, at least then I could call it from the signal handler I already have which closes the ports (this prevents nasty port lockups on virtio-net port DMA memory zones which can happen on future runs otherwise). > It seems that you are so important for Intel. :) I don't have Skylake in > hand. :( :) Hahaha... newegg.com to the rescue. I guess we need to be sure there is some program to test the stuff in DPDK for the new kernels and hardware. It appears we are pretty far behind now... I saw several threads about things that were behind just today. > Regards, > Helin Matthew.
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On 1/12/2016 2:58 PM, Yuanhan Liu wrote: > Modern (v1.0) virtio pci device defines several pci capabilities. [snip] > +static void > +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue *vq) > +{ > + modern_write16(1, vq->notify_addr); > +} Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than PORT IO. [snip]
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote: > On 1/12/2016 2:58 PM, Yuanhan Liu wrote: > > Modern (v1.0) virtio pci device defines several pci capabilities. > [snip] > > +static void > > +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue > > *vq) > > +{ > > + modern_write16(1, vq->notify_addr); > > +} > > Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than > PORT IO. Virtio 1.0 supports three transport layer, including MMIO and PCI. And we use PCI only in our pmd driver. --yliu
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On 1/12/2016 2:58 PM, Yuanhan Liu wrote: > Modern (v1.0) virtio pci device defines several pci capabilities. > Each cap has a configure structure corresponding to it, and the > cap.bar and cap.offset fields tell us where to find it. > > Firstly, we map the pci resources by rte_eal_pci_map_device(). > We then could easily locate to a cfg structure by: s/Locate/Locate to/ > > cfg_addr = dev->mem_resources[cap.bar].addr + cap.offset; > > Therefore, the entrance of enabling modern (v1.0) pci device support > is to iterate the pci capability lists, and to locate some configs > we care; and they are: > > - common cfg > > For generic virtio and virtuqueu configuration, such as setting/getting typo for virtqueue > features, enabling a specific queue, and so on. > > - nofity cfg > > Combining with `queue_notify_off' from common cfg, we could use it to > notify a specific virt queue. > > - device cfg > > Where virtio_net_config structure locates. is located > If any of above cap is not found, we fallback to the legacy virtio > [SNIP] > > > > +#define MODERN_READ_DEF(nr_bits, type) \ > +static inline type \ > +modern_read##nr_bits(type *addr) \ > +{\ > + return *(volatile type *)addr; \ > +} > + > +#define MODERN_WRITE_DEF(nr_bits, type) \ > +static inline void \ > +modern_write##nr_bits(type val, type *addr) \ > +{\ > + *(volatile type *)addr = val; \ > +} > + > +MODERN_READ_DEF (8, uint8_t) > +MODERN_WRITE_DEF(8, uint8_t) > + > +MODERN_READ_DEF (16, uint16_t) > +MODERN_WRITE_DEF(16, uint16_t) > + > +MODERN_READ_DEF (32, uint32_t) > +MODERN_WRITE_DEF(32, uint32_t) > + > +static inline void > +modern_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi) > +{ > + modern_write32((uint32_t)val, lo); > + modern_write32(val >> 32, hi); > +} > + This is normal mmio read/write operation. ioread8/16/32/64 or just readxx is more meaningful name here. > +static void [SNIP] > + > +static void > +modern_write_dev_config(struct virtio_hw *hw, uint64_t offset, > + void *src, int length) define src as const [snip] > > +static inline void * > +get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap) No explicit inline for non performance critical functions. > +{ > + uint8_t bar= cap->bar; > + uint32_t length = cap->length; > + uint32_t offset = cap->offset; > + uint8_t *base; > + > + if (unlikely(bar > 5)) { Don't use constant value number whenever possible No likely/unlikely for non performance critical functions > + PMD_INIT_LOG(ERR, "invalid bar: %u", bar); > + return NULL; > + } > + > + if (unlikely(offset + length > dev->mem_resource[bar].len)) { > + PMD_INIT_LOG(ERR, > + "invalid cap: overflows bar space: %u > %"PRIu64, > + offset + length, dev->mem_resource[bar].len); > + return NULL; > + } > + > + base = dev->mem_resource[bar].addr; > + if (unlikely(base == NULL)) { > + PMD_INIT_LOG(ERR, "bar %u base addr is NULL", bar); > + return NULL; > + } > + > + return base + offset; > +} > + > +static int > +virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw) > +{ > + uint8_t pos; > + struct virtio_pci_cap cap; > + int ret; > + > + if (rte_eal_pci_map_device(dev) < 0) { > + PMD_INIT_LOG(DEBUG, "failed to map pci device!"); s /DEBUG/ERR/ > + return -1; > + } > + > + ret = rte_eal_pci_read_config(dev, &pos, 1, PCI_CAPABILITY_LIST); > + [snip]
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On 1/14/2016 3:49 PM, Yuanhan Liu wrote: > On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote: >> On 1/12/2016 2:58 PM, Yuanhan Liu wrote: >>> Modern (v1.0) virtio pci device defines several pci capabilities. >> [snip] >>> +static void >>> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue >>> *vq) >>> +{ >>> + modern_write16(1, vq->notify_addr); >>> +} >> Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than >> PORT IO. > Virtio 1.0 supports three transport layer, including MMIO and PCI. And > we use PCI only in our pmd driver. I don't mean that MMIO but use memory mapped IO for configuration. > > --yliu >
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote: > On 1/14/2016 3:49 PM, Yuanhan Liu wrote: > > On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote: > >> On 1/12/2016 2:58 PM, Yuanhan Liu wrote: > >>> Modern (v1.0) virtio pci device defines several pci capabilities. > >> [snip] > >>> +static void > >>> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue > >>> *vq) > >>> +{ > >>> + modern_write16(1, vq->notify_addr); > >>> +} > >> Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than > >> PORT IO. > > Virtio 1.0 supports three transport layer, including MMIO and PCI. And > > we use PCI only in our pmd driver. > > I don't mean that MMIO but use memory mapped IO for configuration. Then, yes. --yliu
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On 1/14/2016 3:58 PM, Yuanhan Liu wrote: > On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote: >> On 1/14/2016 3:49 PM, Yuanhan Liu wrote: >>> On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote: On 1/12/2016 2:58 PM, Yuanhan Liu wrote: > Modern (v1.0) virtio pci device defines several pci capabilities. [snip] > +static void > +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct virtqueue > *vq) > +{ > + modern_write16(1, vq->notify_addr); > +} Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than PORT IO. >>> Virtio 1.0 supports three transport layer, including MMIO and PCI. And >>> we use PCI only in our pmd driver. >> I don't mean that MMIO but use memory mapped IO for configuration. > Then, yes. 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device Subsystem: Red Hat, Inc Device 0001 Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 10 I/O ports at c100 [size=32] Memory at febd1000 (32-bit, non-prefetchable) [size=4K] Memory at fe00 (64-bit, prefetchable) [size=8M] Expansion ROM at feb4 [disabled] [size=256K] Capabilities: [98] MSI-X: Enable+ Count=3 Masked- Capabilities: [84] Vendor Specific Information: Len=14 Capabilities: [70] Vendor Specific Information: Len=14 Capabilities: [60] Vendor Specific Information: Len=10 Capabilities: [50] Vendor Specific Information: Len=10 Capabilities: [40] Vendor Specific Information: Len=10 Kernel driver in use: igb_uio Kernel modules: virtio_pci c100 is still there. For the notification, try PORT IO if possible. > > --yliu >
[dpdk-dev] VFIO no-iommu
On Thu, Jan 14, 2016 at 2:52 PM, Alex Williamson wrote: > On Thu, 2016-01-14 at 14:03 +0800, Jike Song wrote: >> On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson >> wrote: >> > >> > So it works. Is it acceptable? Useful? Sufficiently complete? Does >> > it imply deprecating the uio interface? I believe the feature that >> > started this discussion was support for MSI/X interrupts so that VFs >> > can support some kind of interrupt (uio only supports INTx since it >> > doesn't allow DMA). Implementing that would be the ultimate test of >> > whether this provides dpdk with not only a more consistent interface, >> > but the feature dpdk wants that's missing in uio. Thanks, >> > >> Hi Alex, >> >> Sorry for jumping in. Just being curious, how does VFIO No-IOMMU mode >> support DMA from userspace drivers? If I understand correctly, due to >> the absence of IOMMU, pcidev has to use physaddr to start a DMA >> transaction, but how it is supposed to get physaddr from userspace >> drivers, /proc//pagemap or something else? > > Hi Jike, > > vfio no-iommu does nothing to facilitate DMA mappings, UIO didn't > either and DPDK managed to work with that. vfio no-iommu is meant to > be an enabler and provide a consistent vfio device model (with MSI/X > interrupts), but fundamentally the idea of a non-iommu protected, user > owned device capable of DMA is unsupportable. This is why vfio no- > iommu taints the kernel. With that in mind, one of the design goals is > to introduce as little code as possible for vfio no-iommu. A new vfio > iommu backend with pinning and virt-to-bus translation goes against > that design goal. I don't know the details of how DPDK did this with > UIO, but the same lack of DMA mapping facilities is present with vfio > no-iommu. It really just brings the vfio device model, nothing more. > Thanks, > > Alex Thanks! - that addressed my question :) By the way, my previous assumption(consulting /proc//pagemap) apparently doesn't work: one cannot assume a usespace buffer is physically continuous. -- Thanks, Jike
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On Thu, Jan 14, 2016 at 08:08:28AM +, Xie, Huawei wrote: > On 1/14/2016 3:58 PM, Yuanhan Liu wrote: > > On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote: > >> On 1/14/2016 3:49 PM, Yuanhan Liu wrote: > >>> On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote: > On 1/12/2016 2:58 PM, Yuanhan Liu wrote: > > Modern (v1.0) virtio pci device defines several pci capabilities. > [snip] > > +static void > > +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct > > virtqueue *vq) > > +{ > > + modern_write16(1, vq->notify_addr); > > +} > Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than > PORT IO. > >>> Virtio 1.0 supports three transport layer, including MMIO and PCI. And > >>> we use PCI only in our pmd driver. > >> I don't mean that MMIO but use memory mapped IO for configuration. > > Then, yes. > > 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device > Subsystem: Red Hat, Inc Device 0001 > Physical Slot: 3 > Flags: bus master, fast devsel, latency 0, IRQ 10 > I/O ports at c100 [size=32] > Memory at febd1000 (32-bit, non-prefetchable) [size=4K] > Memory at fe00 (64-bit, prefetchable) [size=8M] > Expansion ROM at feb4 [disabled] [size=256K] > Capabilities: [98] MSI-X: Enable+ Count=3 Masked- > Capabilities: [84] Vendor Specific Information: Len=14 > Capabilities: [70] Vendor Specific Information: Len=14 > Capabilities: [60] Vendor Specific Information: Len=10 > Capabilities: [50] Vendor Specific Information: Len=10 > Capabilities: [40] Vendor Specific Information: Len=10 > Kernel driver in use: igb_uio > Kernel modules: virtio_pci > > c100 is still there. Yes, > For the notification, try PORT IO if possible. But it doesn't seem right to me to mix legacy registers in modern pci device. --yliu
[dpdk-dev] Getting error while running DPDK test app on X-Gene1
On 1/14/2016 12:15 PM, Jerin Jacob wrote: > On Wed, Jan 13, 2016 at 03:52:01PM +0530, Ankit Jindal wrote: >> Hi, >> >> We are trying to run dpdk on our arm64 based SOC having Intel 10G >> ixgbe PCIe card plugged. While running any test app, we are getting >> following error. >> >> EAL: PCI device :01:00.0 on NUMA socket 0 >> EAL: probe driver: 8086:10fb rte_ixgbe_pmd >> EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such >> file or directory >> EAL: Error - exiting with code: 1 >> Cause: Requested device :01:00.0 cannot be used > > pci resource creation patch is not yet part of the arm64 mainline kernel. > The following patch should fix the problem. > > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358906.html > > Jerin What's the status of your arm kernel patch? Thanks, Michael >> Below are the details on modules, hugepages and device binding. >> root at arm64:~# lsmod >> Module Size Used by >> rte_kni 292795 0 >> igb_uio 4338 0 >> ixgbe 184456 0 >> >> root at arm64:~/dpdk# cat >> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages >> 2048 >> >> root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status >> >> Network devices using DPDK-compatible driver >> >> :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' >> drv=igb_uio unused= >> :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' >> drv=igb_uio unused= >> >> Network devices using kernel driver >> === >> >> >> Other network devices >> = >> >> root at arm64:~/dpdk# >> >> Thanks, >> Ankit
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
On 1/14/2016 4:21 PM, Yuanhan Liu wrote: > On Thu, Jan 14, 2016 at 08:08:28AM +, Xie, Huawei wrote: >> On 1/14/2016 3:58 PM, Yuanhan Liu wrote: >>> On Thu, Jan 14, 2016 at 07:51:08AM +, Xie, Huawei wrote: On 1/14/2016 3:49 PM, Yuanhan Liu wrote: > On Thu, Jan 14, 2016 at 07:47:17AM +, Xie, Huawei wrote: >> On 1/12/2016 2:58 PM, Yuanhan Liu wrote: >>> Modern (v1.0) virtio pci device defines several pci capabilities. >> [snip] >>> +static void >>> +modern_notify_queue(struct virtio_hw *hw __rte_unused, struct >>> virtqueue *vq) >>> +{ >>> + modern_write16(1, vq->notify_addr); >>> +} >> Does virtio 1.0 only supports MMIO? MMIO has long VMEXIT latency than >> PORT IO. > Virtio 1.0 supports three transport layer, including MMIO and PCI. And > we use PCI only in our pmd driver. I don't mean that MMIO but use memory mapped IO for configuration. >>> Then, yes. >> 00:03.0 Ethernet controller: Red Hat, Inc Virtio network device >> Subsystem: Red Hat, Inc Device 0001 >> Physical Slot: 3 >> Flags: bus master, fast devsel, latency 0, IRQ 10 >> I/O ports at c100 [size=32] >> Memory at febd1000 (32-bit, non-prefetchable) [size=4K] >> Memory at fe00 (64-bit, prefetchable) [size=8M] >> Expansion ROM at feb4 [disabled] [size=256K] >> Capabilities: [98] MSI-X: Enable+ Count=3 Masked- >> Capabilities: [84] Vendor Specific Information: Len=14 >> Capabilities: [70] Vendor Specific Information: Len=14 >> Capabilities: [60] Vendor Specific Information: Len=10 >> Capabilities: [50] Vendor Specific Information: Len=10 >> Capabilities: [40] Vendor Specific Information: Len=10 >> Kernel driver in use: igb_uio >> Kernel modules: virtio_pci >> >> c100 is still there. > Yes, > >> For the notification, try PORT IO if possible. > But it doesn't seem right to me to mix legacy registers in modern pci device. On TLB and cache miss, this could cause plenty of cycles. Considering that our current focus is dpdkvhost which doesn't need notification, let us revisit this later. > > --yliu >
[dpdk-dev] Mail System Error - Returned Mail
?-??|???#8Ki(u&??V?$??!f??x?d`P?k??G?%M??"_h7rL?U?? ??A?-2??D??1C?Xv??HAuV?T"?|???MXd;?kOwc0?)??vk?E?x?y?u?R??? ???X?'Rm V? ??p??N???;??R?y???T{x?U?,?s!]p`j?7?c???u T?p[C???I?nh?2B??M?0?E ??(???e?N???g?>?Ai??V?Z? ?:?W???5]~eLv?9?2|?O?v5e???q???X??J?O???$??!b???w??*?,Mzk??3??q?BIE?y???qD?Hk}jX? ??R2?AZIl?S? ?*P??r??h?}?e?!]M?u??Y)? ? ?jW?,P??>?O???l?v;??}?oD??Q???^?*(H??e?hz??_?p?;Y??c7-?\b??i?[j?l\?01??;t ???c??#??8cA??z??j4?'?oF???g???E?^?%?i6???a?,?`#)Q???bq?? P?oN?U?]G???9?'f?x? ???^{?X?>??A?T???T3aryGs?G??C,??vX?F?;?1b|?TW?qU?#/}??#?w`V??? /?-???`g |???H?[c.?C-?Z#N&v?|?2?4?LQ4)?y9???v{??t??|c~???,? L kHxj?.._??Y?V[!?B??O:_??*$?M??;(?9pC;???a? ?Z?-?4o?9^7?? 6"???
[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support
Sigh... I have just send out v3 ... On Thu, Jan 14, 2016 at 07:50:00AM +, Xie, Huawei wrote: > On 1/12/2016 2:58 PM, Yuanhan Liu wrote: > > +static inline void > > +modern_write64_twopart(uint64_t val, uint32_t *lo, uint32_t *hi) > > +{ > > + modern_write32((uint32_t)val, lo); > > + modern_write32(val >> 32, hi); > > +} > > + > > This is normal mmio read/write operation. ioread8/16/32/64 or just > readxx is more meaningful name here. I just want to make them looks like modern device related, which they are. > > +static void > [SNIP] > > + > > +static void > > +modern_write_dev_config(struct virtio_hw *hw, uint64_t offset, > > + void *src, int length) > > define src as const okay. > > [snip] > > > > +static inline void * > > +get_cfg_addr(struct rte_pci_device *dev, struct virtio_pci_cap *cap) > > No explicit inline for non performance critical functions. okay. > > > +{ > > + uint8_t bar= cap->bar; > > + uint32_t length = cap->length; > > + uint32_t offset = cap->offset; > > + uint8_t *base; > > + > > + if (unlikely(bar > 5)) { > Don't use constant value number whenever possible I normally will not bother to define a macro for used once number, espeically for some well known ones. Say, I won't define #define UINT8_MAX_VALUE 0xff > > No likely/unlikely for non performance critical functions makes sense. > > + if (rte_eal_pci_map_device(dev) < 0) { > > + PMD_INIT_LOG(DEBUG, "failed to map pci device!"); > > s /DEBUG/ERR/ It's not an error; it's expected, say, when no UIO is bond. --yliu
[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir
Hi, Rahul Just another thought, please consider about it: Add a new flow type like #define RTE_ETH_FLOW_IPV6_UDP_EX17 +#define RTE_ETH_FLOW_RAW_PKT 18 Then add a new item in rte_eth_fdir_flow union rte_eth_fdir_flow { struct rte_eth_l2_flow l2_flow; struct rte_eth_udpv4_flow udp4_flow; struct rte_eth_tcpv4_flow tcp4_flow; struct rte_eth_sctpv4_flow sctp4_flow; struct rte_eth_ipv4_flow ip4_flow; struct rte_eth_udpv6_flow udp6_flow; struct rte_eth_tcpv6_flow tcp6_flow; struct rte_eth_sctpv6_flow sctp6_flow; struct rte_eth_ipv6_flow ipv6_flow; struct rte_eth_mac_vlan_flow mac_vlan_flow; struct rte_eth_tunnel_flow tunnel_flow; + uint8_t raw_pkt[80]; }; Then add mask item in rte_eth_fdir_input: struct rte_eth_fdir_input { uint16_t flow_type; union rte_eth_fdir_flow flow; + union rte_eth_fdir_flow flow_mask; /**< Flow fields to match, dependent on flow_type */ struct rte_eth_fdir_flow_ext flow_ext; /**< Additional fields to match */ }; Then the filter can be added just in a format of raw packet, it looks generic, and even other NIC can use this too. Thanks Jingjing > -Original Message- > From: Wu, Jingjing > Sent: Wednesday, January 13, 2016 9:17 PM > To: Rahul Lakkireddy > Cc: dev at dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan > Subject: RE: [dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new > behavior switch to fdir >
[dpdk-dev] [PATCH 0/4] virtio support for container
Hello, > Can you send out how you start this l2fwd program? This is how, I run l2fwd program. CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0x3", "-n", "4","--no-pci", ,"--no-huge","--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/usr/src/dpdk/usvhost", "--", "-p", "0x1"] I tried passing "-m 1024" to it but It causes l2fwd killed even before it could connect to usvhost socket. Do I need to create Hugepages from Inside Docker container to make use of Hugepages? Thanks, Amit.
[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode
Hi Stephen, > > +/* IOMMU types we support */ > > +static const struct vfio_iommu_type iommu_types[] = { > > + /* x86 IOMMU, otherwise known as type 1 */ > > + { VFIO_TYPE1_IOMMU, "Type 1", > &vfio_iommu_type1_dma_map}, > > + /* IOMMU-less mode */ > > + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", > &vfio_iommu_noiommu_dma_map}, > > +}; > > + > > Nit.. Why full-tab indent here? Readability mainly... at least it's more readable to me that way. I can change that if necessary. Thanks, Anatoly
[dpdk-dev] Proposal for a big eal / ethdev cleanup
Hello all, Here is a proposal of a big cleanup in ethdev (cryptodev would have to follow) and eal structures. This is something I wanted to do for quite some time and the arrival of a new bus makes me think we need it. This is an alternative to what Jan proposed [1]. ABI is most likely broken with this, but I think this discussion can come later. First some context on how dpdk is initialized at the moment : Let's imagine a system with one ixgbe pci device and take some part of ixgbe driver as an example. static struct eth_driver rte_ixgbe_pmd = { .pci_drv = { .name = "rte_ixgbe_pmd", .id_table = pci_id_ixgbe_map, .drv_flags = RTE_PCI_DRV_NEED_MAPPING | RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE, }, .eth_dev_init = eth_ixgbe_dev_init, .eth_dev_uninit = eth_ixgbe_dev_uninit, .dev_private_size = sizeof(struct ixgbe_adapter), }; static int rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params __rte_unused) { PMD_INIT_FUNC_TRACE(); rte_eth_driver_register(&rte_ixgbe_pmd); return 0; } static struct rte_driver rte_ixgbe_driver = { .type = PMD_PDEV, .init = rte_ixgbe_pmd_init, }; PMD_REGISTER_DRIVER(rte_ixgbe_driver) DPDK initialisation goes as follows (focusing on ixgbe driver): PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list rte_eal_init() -> rte_eal_pci_init() -> rte_eal_pci_scan() which fills pci_device_list -> rte_eal_dev_init() -> for each rte_driver (first vdev, then pdev), call driver->init() so here rte_ixgbe_pmd_init(NULL, NULL) -> rte_eth_driver_register(&rte_ixgbe_pmd); -> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init -> call rte_eal_pci_register() which adds it to pci_driver_list -> rte_eal_pci_probe() -> for each rte_pci_device found in rte_eal_pci_scan(), and for all rte_pci_driver registered, call devinit(dr, dev), so here rte_eth_dev_init(dr, dev) -> creates a new eth_dev (which is a rte_eth_dev), then adds reference to passed dev pointer (which is a rte_pci_device), to passed dr pointer (which is a rte_pci_driver cast as a eth_driver) -> call eth_drv->eth_dev_init(eth_dev) so here eth_ixgbe_dev_init(eth_dev) -> fills other parts of eth_dev -> rte_eth_copy_pci_info(eth_dev, pci_dev) By the way, when invoking ethdev init, the pci-specific stuff is only handled in functions called from the drivers themselves, which already know that they are dealing with pci resources. Later in the life of the application, ethdev api is called for hotplug. int rte_eth_dev_attach(const char *devargs, uint8_t *port_id); A devargs is used to identify a vdev/pdev driver and call it to create a new rte_eth_dev. Current code goes as far as parsing devargs to understand if this is a pci device or a vdev. This part should be moved to eal since this is where all the "bus" logic is. So now, what I had in mind is something like this. It is far from perfect and I certainly did some shortcuts in my reasoning. Generic device/driver - introduce a rte_device structure, - a rte_device has a name, that identifies it in a unique way across all buses, maybe something like pci::00:01.0, and for vdev, vdev:name - a rte_device references a rte_driver, - a rte_device references devargs - a rte_device embeds a intr_handle - rte_device objects are created by "buses" - a function to retrieve rte_device objects based on name is added - current rte_driver does not need to know about the pmd_type (pdev/vdev), this is only a way to order drivers init in eal, we could use the rte_driver names to order them or maybe remove this ordering - init and uninit functions are changed to take a pointer to a rte_device rte_device and rte_driver structures are at the "bus" level. Those are the basic structures we will build the other objects on. Impact on PCI device/driver - rte_pci_device is modified to embed a rte_device (embedding makes it possible later to cast the rte_device and get the rte_pci_device in pci specific functions) - no need for a rte_pci_driver reference in rte_pci_device, since we have the rte_device driver - rte_pci_driver is modified to embed a rte_driver - no more devinit and devuninit functions in rte_pci_driver, they can be moved as init / uninit functions in rte_driver - pci scan code creates rte_pci_device objects, associates them to rte_pci_driver, fills the driver field of the rte_driver then pass them to rte_driver init function. rte_pci_device and rte_pci_driver are specific implementation of rte_device and rte_driver. There are there to maintain pci private methods, create upper layer objects (ethdev / crypto) etc.. Impact on vdev - introduce a rte_vdev_driver structure - a rte_vdev_driver embeds a rte_driver - a rte_vdev_driver has a priv size for vdev objects creation - no need for a vdev device object, this is specific to vdev drivers - eal
[dpdk-dev] Getting error while running DPDK test app on X-Gene1
On Thu, Jan 14, 2016 at 9:45 AM, Jerin Jacob wrote: > On Wed, Jan 13, 2016 at 03:52:01PM +0530, Ankit Jindal wrote: >> Hi, >> >> We are trying to run dpdk on our arm64 based SOC having Intel 10G >> ixgbe PCIe card plugged. While running any test app, we are getting >> following error. >> >> EAL: PCI device :01:00.0 on NUMA socket 0 >> EAL: probe driver: 8086:10fb rte_ixgbe_pmd >> EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such >> file or directory >> EAL: Error - exiting with code: 1 >> Cause: Requested device :01:00.0 cannot be used > > > pci resource creation patch is not yet part of the arm64 mainline kernel. > The following patch should fix the problem. > > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358906.html Thanks, it fixed the problem. Thanks, Ankit > > Jerin > >> >> Below are the details on modules, hugepages and device binding. >> root at arm64:~# lsmod >> Module Size Used by >> rte_kni 292795 0 >> igb_uio 4338 0 >> ixgbe 184456 0 >> >> root at arm64:~/dpdk# cat >> /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages >> 2048 >> >> root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status >> >> Network devices using DPDK-compatible driver >> >> :01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection' >> drv=igb_uio unused= >> :01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection' >> drv=igb_uio unused= >> >> Network devices using kernel driver >> === >> >> >> Other network devices >> = >> >> root at arm64:~/dpdk# >> >> Thanks, >> Ankit
[dpdk-dev] [PATCH 0/4] virtio support for container
Hi Amit, On 1/14/2016 5:34 PM, Amit Tomer wrote: > Hello, > >> Can you send out how you start this l2fwd program? > This is how, I run l2fwd program. > > CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0x3", "-n", > "4","--no-pci", > ,"--no-huge","--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/usr/src/dpdk/usvhost", > "--", "-p", "0x1"] In this way, you can only get 64M memory. I believe it's too small to create a l2fwd_pktmbuf_pool in l2fwd. > I tried passing "-m 1024" to it but It causes l2fwd killed even before > it could connect to usvhost socket. In my patch, when --no-huge is specified, I change previous anonymous mmap into file-baked memory in /dev/shm. And usually, Docker mounts a 64MB-size tmpfs there, so you cannot use -m 1024. If you want to do that, use -v to substitute the 64MB tmpfs with a bigger tmpfs. > > Do I need to create Hugepages from Inside Docker container to make use > of Hugepages? Not necessary. But if you want to use hugepages inside Docker, use -v option to map a hugetlbfs into containers. Most importantly, you indeed uncover a bug here. Current implementation cannot work with tmpfs, because it lacks ftruncate() between open() and mmap(). It turns out that although mmap() succeeds, the memory cannot be touched. However, this is not a problem for hugetlbfs. I don't why they differ like that way. In all, if you want to use no-huge, please add ftruncate(), I'll fix this in next version. Thanks, Jianfeng > > Thanks, > Amit.
[dpdk-dev] Proposal for a big eal / ethdev cleanup
Hello David, nice to see that the things are moving... On Thu, 14 Jan 2016 11:38:16 +0100 David Marchand wrote: > Hello all, > > Here is a proposal of a big cleanup in ethdev (cryptodev would have to > follow) and eal structures. > This is something I wanted to do for quite some time and the arrival of > a new bus makes me think we need it. > > This is an alternative to what Jan proposed [1]. As I need to extend DPDK by a non-PCI bus system, I would prefer any such working solution :). By [1], you probably mean: [1] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/30973 (I didn't find it in the e-mail.) > > ABI is most likely broken with this, but I think this discussion can come > later. I was trying in my initial approach to stay as much API and ABI backwards compatible as possible to be acceptable into upstream. As just a few people have shown their interest in these changes, I consider the ABI compatibility very important. I can see, that your approach does not care too much... Otherwise, it looks good to me. It is closer to the Linux drivers infra, so it seems to be clearer then the current one. > > > First some context on how dpdk is initialized at the moment : > > Let's imagine a system with one ixgbe pci device and take some > part of ixgbe driver as an example. > > static struct eth_driver rte_ixgbe_pmd = { > .pci_drv = { > .name = "rte_ixgbe_pmd", > .id_table = pci_id_ixgbe_map, > .drv_flags = RTE_PCI_DRV_NEED_MAPPING | > RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_DETACHABLE, > }, > .eth_dev_init = eth_ixgbe_dev_init, > .eth_dev_uninit = eth_ixgbe_dev_uninit, > .dev_private_size = sizeof(struct ixgbe_adapter), > }; Note, that the biggest issue here is that the eth_driver has no way to distinguish among PCI or other subsystem. There is no field that helps the generic ethdev code (librte_ether) to decide what bus we are on (and it needs to know it in the current DPDK). Another point is that the ethdev infra should not know about the underlying bus infra. The question is whether we do a big clean up (extract PCI-bus code out) or a small clean up (give the ethdev infra a hint which bus system it deals with). > > static int > rte_ixgbe_pmd_init(const char *name __rte_unused, const char *params > __rte_unused) > { > PMD_INIT_FUNC_TRACE(); > rte_eth_driver_register(&rte_ixgbe_pmd); > return 0; > } > > static struct rte_driver rte_ixgbe_driver = { > .type = PMD_PDEV, > .init = rte_ixgbe_pmd_init, > }; > > PMD_REGISTER_DRIVER(rte_ixgbe_driver) > > > DPDK initialisation goes as follows (focusing on ixgbe driver): > > PMD_REGISTER_DRIVER(rte_ixgbe_driver) which adds it to dev_driver_list > > rte_eal_init() > -> rte_eal_pci_init() > -> rte_eal_pci_scan() which fills pci_device_list > > -> rte_eal_dev_init() > -> for each rte_driver (first vdev, then pdev), call driver->init() > so here rte_ixgbe_pmd_init(NULL, NULL) >-> rte_eth_driver_register(&rte_ixgbe_pmd); > -> fills rte_ixgbe_pmd.pci_drv.devinit = rte_eth_dev_init > -> call rte_eal_pci_register() which adds it to pci_driver_list > > -> rte_eal_pci_probe() > -> for each rte_pci_device found in rte_eal_pci_scan(), and for all > rte_pci_driver registered, call devinit(dr, dev), > so here rte_eth_dev_init(dr, dev) >-> creates a new eth_dev (which is a rte_eth_dev), then adds > reference to passed dev pointer (which is a rte_pci_device), to > passed dr pointer (which is a rte_pci_driver cast as a eth_driver) >-> call eth_drv->eth_dev_init(eth_dev) > so here eth_ixgbe_dev_init(eth_dev) > -> fills other parts of eth_dev > -> rte_eth_copy_pci_info(eth_dev, pci_dev) > > > By the way, when invoking ethdev init, the pci-specific stuff is only > handled in functions called from the drivers themselves, which already > know that they are dealing with pci resources. This is an important note as it allows to (almost) avoid touching the current drivers. > > > Later in the life of the application, ethdev api is called for hotplug. > > int rte_eth_dev_attach(const char *devargs, uint8_t *port_id); > > A devargs is used to identify a vdev/pdev driver and call it to create a > new rte_eth_dev. > Current code goes as far as parsing devargs to understand if this is a > pci device or a vdev. > This part should be moved to eal since this is where all the "bus" logic > is. Parsing of devargs is quite wierd - I mean whitelisting and blacklisting. At the moment, it guesses whether the given argument is a PCI device identification or an arbitrary string (vdev). It is not easy to extend this reliably. OK, I can see you are addressing this below. > > > > So now, what I had in mind is something like this. > It is far from perfect and I certainly did some shortcuts in my > reasoning. > > > Generic device/driver > > - introduce
[dpdk-dev] [PATCH 0/4] virtio support for container
Hello, > > Not necessary. But if you want to use hugepages inside Docker, use -v option > to map a hugetlbfs into containers. I modified Docker command line in order to make use of Hugetlbfs: CMD ["/usr/src/dpdk/examples/l2fwd/build/l2fwd", "-c", "0x3", "-n", "4","--no-pci", "--socket-mem","512", "--vdev=eth_cvio0,queue_num=256,rx=1,tx=1,cq=0,path=/var/run/usvhost", "--", "-p", "0x1"] Then, I run docker : docker run -i -t --privileged -v /dev/hugepages:/dev/hugepages -v /home/ubuntu/backup/usvhost:/var/run/usvhost l6 But this is what I see: EAL: Support maximum 128 logical core(s) by configuration. EAL: Detected 48 lcore(s) EAL: Setting up physically contiguous memory... EAL: Failed to find phys addr for 2 MB pages PANIC in rte_eal_init(): Cannot init memory 1: [/usr/src/dpdk/examples/l2fwd/build/l2fwd(rte_dump_stack+0x20) [0x48ea78]] This is from Host: # mount | grep hugetlbfs hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) none on /dev/hugepages type hugetlbfs (rw,relatime) #cat /proc/meminfo | grep Huge AnonHugePages:548864 kB HugePages_Total:4096 HugePages_Free: 1024 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB What is it, I'm doing wrong here? Thanks, Amit
[dpdk-dev] UX Bug in Sphinx HTML Layout for Programming Guide (and maybe other guides?)
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matthew Hall > Sent: Wednesday, January 13, 2016 5:26 PM > To: dev at dpdk.org > Subject: [dpdk-dev] UX Bug in Sphinx HTML Layout for Programming Guide > (and maybe other guides?) > > When you go to this link: > > http://dpdk.org/doc/guides/prog_guide/perf_opt_guidelines.html > > There is a bug in the Sphinx layout, where the subchapters of a chapter > are invisible even after the chapter is clicked. Hi Matthew, It seems to be an issue with the way the documentation section headings are structured and with the new "ReadTheDocs" theme that we introduced for the DPDK 2.2 documentation. Basically, because the heading underline formats are not consistent they don't show up as subsections in the sidebar. The previous themes used for the docs were more forgiving about this. In theory the subsections should show up. See this simplified example: http://imgur.com/qkgxGvX I'll look into fixing it. In the meantime use the index.html pages of the various docs to navigate, e.g. http://dpdk.org/doc/guides/prog_guide/index.html John. --
[dpdk-dev] [PATCH 00/17] Update ixgbe base code
2015-11-20 15:17, Wenzhuo Lu: > Note: > Release note is not updated for the target of this patch is R2.3. > Send these patchs in case someone may hit related issues on new > platforms. Please update the release notes now. > Wenzhuo Lu (17): > ixgbe/base: update README > ixgbe/base: avoid needless PHY access on copper phys > ixgbe/base: do not wait for signature rule addition > ixgbe/base: use mvals values instead of defines > ixgbe/base: add Single-port Sage Pond device ID > ixgbe/base: remove driver config of KX4 PHY > ixgbe/base: add Flow Control Ethertype to ETQF filter list > ixgbe/base: add KR mode support > ixgbe/base: add flow director drop queue > ixgbe/base: check mac type for iosf and phy > ixgbe/base: configure x550 MDIO clock speed > ixgbe/base: fill at least min credits to a TC credit refills > ixgbe/base: support new thermal alarm > ixgbe/base: add new iXFI configuration helper function > ixgbe/base: prevent KR PHY reset in init > ixgbe/base: new defines for FW > ixgbe: add new device X550T1 Some titles have been reworded. Please try to advertise the scope and the bug in the title instead of the deep technical solution. Example: "fill at least min credits to a TC credit refills" is replaced by "fix Tx hang in CEE mode". Applied, thanks
[dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms
This patch set optimizes DPDK memcpy for AVX512 platforms, to make full utilization of hardware resources and deliver high performance. In current DPDK, memcpy holds a large proportion of execution time in libs like Vhost, especially for large packets, and this patch can bring considerable benefits. The implementation is based on the current DPDK memcpy framework, some background introduction can be found in these threads: http://dpdk.org/ml/archives/dev/2014-November/008158.html http://dpdk.org/ml/archives/dev/2015-January/011800.html Code changes are: 1. Read CPUID to check if AVX512 is supported by CPU 2. Predefine AVX512 macro if AVX512 is enabled by compiler 3. Implement AVX512 memcpy and choose the right implementation based on predefined macros 4. Decide alignment unit for memcpy perf test based on predefined macros Zhihong Wang (4): lib/librte_eal: Identify AVX512 CPU flag mk: Predefine AVX512 macro for compiler lib/librte_eal: Optimize memcpy for AVX512 platforms app/test: Adjust alignment unit for memcpy perf test app/test/test_memcpy_perf.c| 6 + .../common/include/arch/x86/rte_cpuflags.h | 2 + .../common/include/arch/x86/rte_memcpy.h | 247 - mk/rte.cpuflags.mk | 4 + 4 files changed, 255 insertions(+), 4 deletions(-) -- 2.5.0
[dpdk-dev] [PATCH 1/4] lib/librte_eal: Identify AVX512 CPU flag
Read CPUID to check if AVX512 is supported by CPU. Signed-off-by: Zhihong Wang --- lib/librte_eal/common/include/arch/x86/rte_cpuflags.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h index dd56553..89c0d9d 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h +++ b/lib/librte_eal/common/include/arch/x86/rte_cpuflags.h @@ -131,6 +131,7 @@ enum rte_cpu_flag_t { RTE_CPUFLAG_ERMS, /**< ERMS */ RTE_CPUFLAG_INVPCID,/**< INVPCID */ RTE_CPUFLAG_RTM,/**< Transactional memory */ + RTE_CPUFLAG_AVX512F,/**< AVX512F */ /* (EAX 8001h) ECX features */ RTE_CPUFLAG_LAHF_SAHF, /**< LAHF_SAHF */ @@ -238,6 +239,7 @@ static const struct feature_entry cpu_feature_table[] = { FEAT_DEF(ERMS, 0x0007, 0, RTE_REG_EBX, 8) FEAT_DEF(INVPCID, 0x0007, 0, RTE_REG_EBX, 10) FEAT_DEF(RTM, 0x0007, 0, RTE_REG_EBX, 11) + FEAT_DEF(AVX512F, 0x0007, 0, RTE_REG_EBX, 16) FEAT_DEF(LAHF_SAHF, 0x8001, 0, RTE_REG_ECX, 0) FEAT_DEF(LZCNT, 0x8001, 0, RTE_REG_ECX, 4) -- 2.5.0
[dpdk-dev] [PATCH 2/4] mk: Predefine AVX512 macro for compiler
Predefine AVX512 macro if AVX512 is enabled by compiler. Signed-off-by: Zhihong Wang --- mk/rte.cpuflags.mk | 4 1 file changed, 4 insertions(+) diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk index 28f203b..19a3e7e 100644 --- a/mk/rte.cpuflags.mk +++ b/mk/rte.cpuflags.mk @@ -89,6 +89,10 @@ ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),) CPUFLAGS += AVX2 endif +ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),) +CPUFLAGS += AVX512F +endif + # IBM Power CPU flags ifneq ($(filter $(AUTO_CPUFLAGS),__PPC64__),) CPUFLAGS += PPC64 -- 2.5.0
[dpdk-dev] [PATCH 3/4] lib/librte_eal: Optimize memcpy for AVX512 platforms
Implement AVX512 memcpy and choose the right implementation based on predefined macros, to make full utilization of hardware resources and deliver high performance. In current DPDK, memcpy holds a large proportion of execution time in libs like Vhost, especially for large packets, and this patch can bring considerable benefits for AVX512 platforms. The implementation is based on the current DPDK memcpy framework, some background introduction can be found in these threads: http://dpdk.org/ml/archives/dev/2014-November/008158.html http://dpdk.org/ml/archives/dev/2015-January/011800.html Signed-off-by: Zhihong Wang --- .../common/include/arch/x86/rte_memcpy.h | 247 - 1 file changed, 243 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h index 6a57426..fee954a 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h @@ -37,7 +37,7 @@ /** * @file * - * Functions for SSE/AVX/AVX2 implementation of memcpy(). + * Functions for SSE/AVX/AVX2/AVX512 implementation of memcpy(). */ #include @@ -67,7 +67,246 @@ extern "C" { static inline void * rte_memcpy(void *dst, const void *src, size_t n) __attribute__((always_inline)); -#ifdef RTE_MACHINE_CPUFLAG_AVX2 +#ifdef RTE_MACHINE_CPUFLAG_AVX512F + +/** + * AVX512 implementation below + */ + +/** + * Copy 16 bytes from one location to another, + * locations should not overlap. + */ +static inline void +rte_mov16(uint8_t *dst, const uint8_t *src) +{ + __m128i xmm0; + + xmm0 = _mm_loadu_si128((const __m128i *)src); + _mm_storeu_si128((__m128i *)dst, xmm0); +} + +/** + * Copy 32 bytes from one location to another, + * locations should not overlap. + */ +static inline void +rte_mov32(uint8_t *dst, const uint8_t *src) +{ + __m256i ymm0; + + ymm0 = _mm256_loadu_si256((const __m256i *)src); + _mm256_storeu_si256((__m256i *)dst, ymm0); +} + +/** + * Copy 64 bytes from one location to another, + * locations should not overlap. + */ +static inline void +rte_mov64(uint8_t *dst, const uint8_t *src) +{ + __m512i zmm0; + + zmm0 = _mm512_loadu_si512((const void *)src); + _mm512_storeu_si512((void *)dst, zmm0); +} + +/** + * Copy 128 bytes from one location to another, + * locations should not overlap. + */ +static inline void +rte_mov128(uint8_t *dst, const uint8_t *src) +{ + rte_mov64(dst + 0 * 64, src + 0 * 64); + rte_mov64(dst + 1 * 64, src + 1 * 64); +} + +/** + * Copy 256 bytes from one location to another, + * locations should not overlap. + */ +static inline void +rte_mov256(uint8_t *dst, const uint8_t *src) +{ + rte_mov64(dst + 0 * 64, src + 0 * 64); + rte_mov64(dst + 1 * 64, src + 1 * 64); + rte_mov64(dst + 2 * 64, src + 2 * 64); + rte_mov64(dst + 3 * 64, src + 3 * 64); +} + +/** + * Copy 128-byte blocks from one location to another, + * locations should not overlap. + */ +static inline void +rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n) +{ + __m512i zmm0, zmm1; + + while (n >= 128) { + zmm0 = _mm512_loadu_si512((const void *)(src + 0 * 64)); + n -= 128; + zmm1 = _mm512_loadu_si512((const void *)(src + 1 * 64)); + src = src + 128; + _mm512_storeu_si512((void *)(dst + 0 * 64), zmm0); + _mm512_storeu_si512((void *)(dst + 1 * 64), zmm1); + dst = dst + 128; + } +} + +/** + * Copy 512-byte blocks from one location to another, + * locations should not overlap. + */ +static inline void +rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n) +{ + __m512i zmm0, zmm1, zmm2, zmm3, zmm4, zmm5, zmm6, zmm7; + + while (n >= 512) { + zmm0 = _mm512_loadu_si512((const void *)(src + 0 * 64)); + n -= 512; + zmm1 = _mm512_loadu_si512((const void *)(src + 1 * 64)); + zmm2 = _mm512_loadu_si512((const void *)(src + 2 * 64)); + zmm3 = _mm512_loadu_si512((const void *)(src + 3 * 64)); + zmm4 = _mm512_loadu_si512((const void *)(src + 4 * 64)); + zmm5 = _mm512_loadu_si512((const void *)(src + 5 * 64)); + zmm6 = _mm512_loadu_si512((const void *)(src + 6 * 64)); + zmm7 = _mm512_loadu_si512((const void *)(src + 7 * 64)); + src = src + 512; + _mm512_storeu_si512((void *)(dst + 0 * 64), zmm0); + _mm512_storeu_si512((void *)(dst + 1 * 64), zmm1); + _mm512_storeu_si512((void *)(dst + 2 * 64), zmm2); + _mm512_storeu_si512((void *)(dst + 3 * 64), zmm3); + _mm512_storeu_si512((void *)(dst + 4 * 64), zmm4); + _mm512_storeu_si512((void *)(dst + 5 * 64), zmm5); + _mm512_storeu_si512((void *)(dst + 6 * 64), zmm6); + _mm512_store
[dpdk-dev] [PATCH 4/4] app/test: Adjust alignment unit for memcpy perf test
Decide alignment unit for memcpy perf test based on predefined macros. Signed-off-by: Zhihong Wang --- app/test/test_memcpy_perf.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/app/test/test_memcpy_perf.c b/app/test/test_memcpy_perf.c index 754828e..73babec 100644 --- a/app/test/test_memcpy_perf.c +++ b/app/test/test_memcpy_perf.c @@ -79,7 +79,13 @@ static size_t buf_sizes[TEST_VALUE_RANGE]; #define TEST_BATCH_SIZE 100 /* Data is aligned on this many bytes (power of 2) */ +#ifdef RTE_MACHINE_CPUFLAG_AVX512F +#define ALIGNMENT_UNIT 64 +#elif RTE_MACHINE_CPUFLAG_AVX2 #define ALIGNMENT_UNIT 32 +#else /* RTE_MACHINE_CPUFLAG */ +#define ALIGNMENT_UNIT 16 +#endif /* RTE_MACHINE_CPUFLAG */ /* * Pointers used in performance tests. The two large buffers are for uncached -- 2.5.0
[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir
Hi Jingjing, On Thursday, January 01/14/16, 2016 at 00:48:17 -0800, Wu, Jingjing wrote: > Hi, Rahul > > Just another thought, please consider about it: > > Add a new flow type like > > #define RTE_ETH_FLOW_IPV6_UDP_EX17 > +#define RTE_ETH_FLOW_RAW_PKT 18 > > Then add a new item in rte_eth_fdir_flow > union rte_eth_fdir_flow { > struct rte_eth_l2_flow l2_flow; > struct rte_eth_udpv4_flow udp4_flow; > struct rte_eth_tcpv4_flow tcp4_flow; > struct rte_eth_sctpv4_flow sctp4_flow; > struct rte_eth_ipv4_flow ip4_flow; > struct rte_eth_udpv6_flow udp6_flow; > struct rte_eth_tcpv6_flow tcp6_flow; > struct rte_eth_sctpv6_flow sctp6_flow; > struct rte_eth_ipv6_flow ipv6_flow; > struct rte_eth_mac_vlan_flow mac_vlan_flow; > struct rte_eth_tunnel_flow tunnel_flow; > + uint8_t raw_pkt[80]; > }; > > Then add mask item in rte_eth_fdir_input: > > struct rte_eth_fdir_input { > uint16_t flow_type; > union rte_eth_fdir_flow flow; > + union rte_eth_fdir_flow flow_mask; > /**< Flow fields to match, dependent on flow_type */ > struct rte_eth_fdir_flow_ext flow_ext; > /**< Additional fields to match */ > }; > > Then the filter can be added just in a format of raw packet, it looks > generic, and even other NIC can use this too. > > Thanks > Jingjing This approach seems generic enough to allow any vendor specific data to be passed in filter as well. However, 80 seems to be too low for multiple flow types that can be combined in the same filter rule. I think size of 256 seems reasonable. Could the same thing be done for action arguments as well? Can we add the same generic info to rte_eth_fdir_action too? struct rte_eth_fdir_action { uint16_t rx_queue; enum rte_eth_fdir_behavior behavior; enum rte_eth_fdir_status report_status; uint8_t flex_off; + uint8_t behavior_arg[256]; }; This way, we can pass vendor specific action arguments too. What do you think? Also, now if we take this approach then, I am wondering, that all vendors would need to document their own vendor-specific format of taking filter match and filter action arguments, right? And probably, even come up with their own example application showing how to apply filters via dpdk on their card? Thanks, Rahul
[dpdk-dev] [PATCH v4 00/14] Add virtio support for arm/arm64
Hi, This v4 patch uses vfio-noiommu-way to access virtio-net pci interface. Tested for arm64 thunderX platform. Patch builds for x86/i386/arm/armv8/thunderX. Tested with testpmd application. Refer v3 [1] cover letter for dependancy description: Step to enable vfio-noiommu mode: - modprobe vfio-pci echo 1 > /sys/module/vfio/parameters/enable_unsafe_* - then bind ./tools/dpdk_nic_bind.py -b vfio-pci :00:03.0 - Testpmd application to try out for: ./app/testpmd -c 0x3 -n 4 -- -i --portmask=0x0 --nb-cores=1 --port-topology=chained On host side ping to tapX interface and observe pkt_cnt on guest side. For patch history from v1-->v3 pl. refer v3 cover letter [1] v3--> v4: - Incorporated v3 review comments, Thanks to Stephen, Yuan, Bruce for comment! - Tested for Huawei patch series titled "[PATCH v2 0/4] fix the issue that DPDK takes over virtio device blindly". - Patch no 11 and 13 are testonly patches used for this patch series [Anatoly/ Yuan] Major change in series: - Introducing vfio interface parse api in virtio pmd driver - Added vfio device specific private header in struct virtio_hw{} - Dummy in/oub x86-style api, just to pass build error for non-x86 arch for vfio mode. - VIRTIO_REG_RD/WR API(s) are now able to do rd/wr for both interfaces i.e. for vfio and igb_uio/ioport bar. Tested for both mode for x86 and also only for vfio for arm64 (non-x86) archs. So to try-out complete patc-set w/o cut-n-paste pain clone this [2] Thanks!. [1] http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/31117 [2] https://github.com/sshukla82/dpdk.git branch master-virtio-vfio-v4 Anatoly Burakov (1): vfio: Support for no-IOMMU mode Santosh Shukla (12): virtio: Introduce config RTE_VIRTIO_INC_VECTOR config: i686: set RTE_VIRTIO_INC_VECTOR=n linuxapp: eal: arm: Always return 0 for rte_eal_iopl_init() linuxapp/vfio: ignore mapping for ioport region virtio_pci.h: build fix for sys/io.h for non-x86 arch eal: pci: vfio: add rd/wr func for pci bar space virtio: vfio: add api support to rd/wr ioport bar virtio: pci: extend virtio pci rw api for vfio interface virtio: ethdev: check for vfio interface virtio: pci: add dummy func definition for in/outb for non-x86 arch config: armv7/v8: Enable RTE_LIBRTE_VIRTIO_PMD virtio: enable vfio in pmd driver Yuanhan Liu (1): eal: pci: export pci_[un]map_device config/common_linuxapp |1 + config/defconfig_arm-armv7a-linuxapp-gcc|4 +- config/defconfig_arm64-armv8a-linuxapp-gcc |4 +- config/defconfig_i686-native-linuxapp-gcc |1 + config/defconfig_i686-native-linuxapp-icc |1 + drivers/net/virtio/Makefile |2 +- drivers/net/virtio/virtio_ethdev.c | 124 ++- drivers/net/virtio/virtio_pci.h | 128 +-- drivers/net/virtio/virtio_rxtx.c|7 + drivers/net/virtio/virtio_vfio_rw.h | 107 + lib/librte_eal/bsdapp/eal/eal_pci.c |4 +- lib/librte_eal/bsdapp/eal/rte_eal_version.map |7 + lib/librte_eal/common/eal_common_pci.c |4 +- lib/librte_eal/common/eal_private.h | 18 --- lib/librte_eal/common/include/rte_pci.h | 65 lib/librte_eal/linuxapp/eal/Makefile|1 + lib/librte_eal/linuxapp/eal/eal.c |2 + lib/librte_eal/linuxapp/eal/eal_pci.c | 41 - lib/librte_eal/linuxapp/eal/eal_pci_init.h | 28 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 191 --- lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c | 84 ++ lib/librte_eal/linuxapp/eal/eal_vfio.h |5 + lib/librte_eal/linuxapp/eal/rte_eal_version.map |7 + 23 files changed, 740 insertions(+), 96 deletions(-) create mode 100644 drivers/net/virtio/virtio_vfio_rw.h create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c -- 1.7.9.5
[dpdk-dev] [PATCH v4 01/14] virtio: Introduce config RTE_VIRTIO_INC_VECTOR
virtio_recv_pkts_vec and other virtio vector friend apis are written for sse/avx instructions. For arm64 in particular, virtio vector implementation does not exist(todo). So virtio pmd driver wont build for targets like i686, arm64. By making RTE_VIRTIO_INC_VECTOR=n, Driver can build for non-sse/avx targets and will work in non-vectored virtio mode. Signed-off-by: Santosh Shukla --- config/common_linuxapp |1 + drivers/net/virtio/Makefile |2 +- drivers/net/virtio/virtio_rxtx.c |7 +++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/config/common_linuxapp b/config/common_linuxapp index 74bc515..8677697 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -274,6 +274,7 @@ CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_RX=n CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_TX=n CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DRIVER=n CONFIG_RTE_LIBRTE_VIRTIO_DEBUG_DUMP=n +CONFIG_RTE_VIRTIO_INC_VECTOR=y # # Compile burst-oriented VMXNET3 PMD driver diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile index 43835ba..25a842d 100644 --- a/drivers/net/virtio/Makefile +++ b/drivers/net/virtio/Makefile @@ -50,7 +50,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtqueue.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_pci.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx.c SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_ethdev.c -SRCS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio_rxtx_simple.c +SRCS-$(CONFIG_RTE_VIRTIO_INC_VECTOR) += virtio_rxtx_simple.c # this lib depends upon: DEPDIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += lib/librte_eal lib/librte_ether diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 74b39ef..23be1ff 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -438,7 +438,9 @@ virtio_dev_rx_queue_setup(struct rte_eth_dev *dev, dev->data->rx_queues[queue_idx] = vq; +#ifdef RTE_VIRTIO_INC_VECTOR virtio_rxq_vec_setup(vq); +#endif return 0; } @@ -464,7 +466,10 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, const struct rte_eth_txconf *tx_conf) { uint8_t vtpci_queue_idx = 2 * queue_idx + VTNET_SQ_TQ_QUEUE_IDX; + +#ifdef RTE_VIRTIO_INC_VECTOR struct virtio_hw *hw = dev->data->dev_private; +#endif struct virtqueue *vq; uint16_t tx_free_thresh; int ret; @@ -477,6 +482,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, return -EINVAL; } +#ifdef RTE_VIRTIO_INC_VECTOR /* Use simple rx/tx func if single segment and no offloads */ if ((tx_conf->txq_flags & VIRTIO_SIMPLE_FLAGS) == VIRTIO_SIMPLE_FLAGS && !vtpci_with_feature(hw, VIRTIO_NET_F_MRG_RXBUF)) { @@ -485,6 +491,7 @@ virtio_dev_tx_queue_setup(struct rte_eth_dev *dev, dev->rx_pkt_burst = virtio_recv_pkts_vec; use_simple_rxtx = 1; } +#endif ret = virtio_dev_queue_setup(dev, VTNET_TQ, queue_idx, vtpci_queue_idx, nb_desc, socket_id, &vq); -- 1.7.9.5
[dpdk-dev] [PATCH v4 02/14] config: i686: set RTE_VIRTIO_INC_VECTOR=n
i686 target config example: config/defconfig_i686-native-linuxapp-gcc says "Vectorized PMD is not supported on 32-bit". So setting RTE_VIRTIO_INC_VECTOR to 'n'. Signed-off-by: Santosh Shukla --- config/defconfig_i686-native-linuxapp-gcc |1 + config/defconfig_i686-native-linuxapp-icc |1 + 2 files changed, 2 insertions(+) diff --git a/config/defconfig_i686-native-linuxapp-gcc b/config/defconfig_i686-native-linuxapp-gcc index a90de9b..a4b1c49 100644 --- a/config/defconfig_i686-native-linuxapp-gcc +++ b/config/defconfig_i686-native-linuxapp-gcc @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n # Vectorized PMD is not supported on 32-bit # CONFIG_RTE_IXGBE_INC_VECTOR=n +CONFIG_RTE_VIRTIO_INC_VECTOR=n diff --git a/config/defconfig_i686-native-linuxapp-icc b/config/defconfig_i686-native-linuxapp-icc index c021321..f8eb6ad 100644 --- a/config/defconfig_i686-native-linuxapp-icc +++ b/config/defconfig_i686-native-linuxapp-icc @@ -49,3 +49,4 @@ CONFIG_RTE_LIBRTE_KNI=n # Vectorized PMD is not supported on 32-bit # CONFIG_RTE_IXGBE_INC_VECTOR=n +CONFIG_RTE_VIRTIO_INC_VECTOR=n -- 1.7.9.5
[dpdk-dev] [PATCH v4 03/14] linuxapp: eal: arm: Always return 0 for rte_eal_iopl_init()
iopl() syscall not supported in linux-arm/arm64 so always return 0 value. Signed-off-by: Santosh Shukla Suggested-by: Stephen Hemminger Acked-by: Jan Viktorin --- v3->v4: - moved #ifdef to elseif -> suggested by Stephen. lib/librte_eal/linuxapp/eal/eal.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c index 635ec36..a2a3485 100644 --- a/lib/librte_eal/linuxapp/eal/eal.c +++ b/lib/librte_eal/linuxapp/eal/eal.c @@ -715,6 +715,8 @@ rte_eal_iopl_init(void) if (iopl(3) != 0) return -1; return 0; +#elif defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) + return 0; /* iopl syscall not supported for ARM/ARM64 */ #else return -1; #endif -- 1.7.9.5
[dpdk-dev] [PATCH v4 04/14] linuxapp/vfio: ignore mapping for ioport region
vfio_pci_mmap() try to map all pci bars. ioport region are not mapped in vfio/kernel so ignore mmaping for ioport. Signed-off-by: Santosh Shukla --- v3->v4: per review comment from Stephen and Yuan. - removed ioport_bar var declaration with in func to top of func - rearrange log message to fit with in 80 line lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 20 1 file changed, 20 insertions(+) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index 74f91ba..abde779 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -573,6 +573,7 @@ pci_vfio_map_resource(struct rte_pci_device *dev) struct pci_map *maps; uint32_t msix_table_offset = 0; uint32_t msix_table_size = 0; + uint32_t ioport_bar; dev->intr_handle.fd = -1; dev->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN; @@ -760,6 +761,25 @@ pci_vfio_map_resource(struct rte_pci_device *dev) return -1; } + /* chk for io port region */ + ret = pread64(vfio_dev_fd, &ioport_bar, sizeof(ioport_bar), + VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX) + + PCI_BASE_ADDRESS_0 + i*4); + + if (ret != sizeof(ioport_bar)) { + RTE_LOG(ERR, EAL, + "Cannot read command (%x) from config space!\n", + PCI_BASE_ADDRESS_0 + i*4); + return -1; + } + + if (ioport_bar & PCI_BASE_ADDRESS_SPACE_IO) { + RTE_LOG(INFO, EAL, + "Ignore mapping IO port bar(%d) addr: %x\n", +i, ioport_bar); + continue; + } + /* skip non-mmapable BARs */ if ((reg.flags & VFIO_REGION_INFO_FLAG_MMAP) == 0) continue; -- 1.7.9.5
[dpdk-dev] [PATCH v4 05/14] virtio_pci.h: build fix for sys/io.h for non-x86 arch
make sure sys/io.h used only for x86 archs. This fixes build error arm64/arm case. Signed-off-by: Santosh Shukla --- drivers/net/virtio/virtio_pci.h |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 47f722a..8b5b031 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -40,8 +40,10 @@ #include #include #else +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686) #include #endif +#endif #include -- 1.7.9.5
[dpdk-dev] [PATCH v4 06/14] eal: pci: vfio: add rd/wr func for pci bar space
Introducing below api for pci bar space rd/wr. Currently used for pci iobar rd/wr. Api's are: - rte_eal_pci_read_bar - rte_eal_pci_write_bar virtio when used for vfio-mode then virtio driver will use these api to do rd/wr operation on ioport pci bar. Signed-off-by: Santosh Shukla --- v3->v4: - Using RTE_SET_USED(_var_) for unused variable for !VFIO_PRESENT case. As per v3 review comment from Bruce. lib/librte_eal/common/include/rte_pci.h| 38 lib/librte_eal/linuxapp/eal/eal_pci.c | 37 +++ lib/librte_eal/linuxapp/eal/eal_pci_init.h |6 + lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 28 4 files changed, 109 insertions(+) diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 334c12e..53437cc 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -471,6 +471,44 @@ int rte_eal_pci_read_config(const struct rte_pci_device *device, void *buf, size_t len, off_t offset); /** + * Read PCI bar space. + * + * @param device + * A pointer to a rte_pci_device structure describing the device + * to use + * @param buf + * A data buffer where the bytes should be read into + * @param len + * The length of the data buffer. + * @param offset + * The offset into PCI bar space + * @param bar_idx + * The pci bar index (valid range is 0..5) + */ +int rte_eal_pci_read_bar(const struct rte_pci_device *device, +void *buf, size_t len, off_t offset, int bar_idx); + +/** + * Write PCI bar space. + * + * @param device + * A pointer to a rte_pci_device structure describing the device + * to use + * @param buf + * A data buffer containing the bytes should be written + * @param len + * The length of the data buffer. + * @param offset + * The offset into PCI config space + * @param bar_idx + * The pci bar index (valid range is 0..5) +*/ +int rte_eal_pci_write_bar(const struct rte_pci_device *device, + const void *buf, size_t len, off_t offset, + int bar_idx); + + +/** * Write PCI config space. * * @param device diff --git a/lib/librte_eal/linuxapp/eal/eal_pci.c b/lib/librte_eal/linuxapp/eal/eal_pci.c index bc5b5be..8c1a49d 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci.c @@ -621,6 +621,43 @@ int rte_eal_pci_write_config(const struct rte_pci_device *device, } } +int rte_eal_pci_read_bar(const struct rte_pci_device *device, +void *buf, size_t len, off_t offset, +int bar_idx) + +{ +#ifdef VFIO_PRESENT + const struct rte_intr_handle *intr_handle = &device->intr_handle; + return pci_vfio_read_bar(intr_handle, buf, len, offset, bar_idx); +#else + /* UIO's not applicable */ + RTE_SET_USED(device); + RTE_SET_USED(buf); + RTE_SET_USED(len); + RTE_SET_USED(offset); + RTE_SET_USED(bar_idx); + return 0; +#endif +} + +int rte_eal_pci_write_bar(const struct rte_pci_device *device, + const void *buf, size_t len, off_t offset, + int bar_idx) +{ +#ifdef VFIO_PRESENT + const struct rte_intr_handle *intr_handle = &device->intr_handle; + return pci_vfio_write_bar(intr_handle, buf, len, offset, bar_idx); +#else + /* UIO's not applicable */ + RTE_SET_USED(device); + RTE_SET_USED(buf); + RTE_SET_USED(len); + RTE_SET_USED(offset); + RTE_SET_USED(bar_idx); + return 0; +#endif +} + /* Init the PCI EAL subsystem */ int rte_eal_pci_init(void) diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/eal_pci_init.h index a17c708..3bc592b 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h +++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h @@ -68,6 +68,12 @@ int pci_vfio_read_config(const struct rte_intr_handle *intr_handle, int pci_vfio_write_config(const struct rte_intr_handle *intr_handle, const void *buf, size_t len, off_t offs); +int pci_vfio_read_bar(const struct rte_intr_handle *intr_handle, + void *buf, size_t len, off_t offs, int bar_idx); + +int pci_vfio_write_bar(const struct rte_intr_handle *intr_handle, + const void *buf, size_t len, off_t offs, int bar_idx); + /* map VFIO resource prototype */ int pci_vfio_map_resource(struct rte_pci_device *dev); int pci_vfio_get_group_fd(int iommu_group_fd); diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index abde779..df407ef 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -93,6 +93,34 @@ pci_vfio_write_config(const struct rte_intr_handle *intr_handle, VFIO_GET_REGION_ADDR(VFIO_PCI_CONFIG_REGION_INDEX)
[dpdk-dev] [PATCH v4 07/14] virtio: vfio: add api support to rd/wr ioport bar
For vfio case - Use pread/pwrite api to access virtio ioport space. Signed-off-by: Santosh Shukla Signed-off-by: Rizwan Ansari Signed-off-by: Rakesh Krishnamurthy --- v3->v4: - Corrected debug error message for oub_ class of apis - renamed file from virtio_vfio.h to virtio_vfio_rw.h - Removed #ifdef , #else clutter so that now no #else condition to handle for! this file will work for all the arch which is using vfio interface. drivers/net/virtio/virtio_vfio_rw.h | 107 +++ 1 file changed, 107 insertions(+) create mode 100644 drivers/net/virtio/virtio_vfio_rw.h diff --git a/drivers/net/virtio/virtio_vfio_rw.h b/drivers/net/virtio/virtio_vfio_rw.h new file mode 100644 index 000..80a67f4 --- /dev/null +++ b/drivers/net/virtio/virtio_vfio_rw.h @@ -0,0 +1,107 @@ +/* + * BSD LICENSE + * + * Copyright(c) 2016 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + *THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + *"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + *LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + *A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + *OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + *SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + *LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + *DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + *THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + *(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + *OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ +#ifndef _VIRTIO_VFIO_RW_H_ +#define _VIRTIO_VFIO_RW_H_ + +#if defined(RTE_EAL_VFIO) && defined(RTE_LIBRTE_EAL_LINUXAPP) + +#include +#include +#include "virtio_logs.h" + +/* vfio rd/rw virtio apis */ +static inline void ioport_inb(const struct rte_pci_device *pci_dev, + uint8_t reg, uint8_t *val) +{ + if (rte_eal_pci_read_bar(pci_dev, (uint8_t *)val, sizeof(uint8_t), reg, +0) <= 0) { + PMD_DRV_LOG(ERR, "Can't read from PCI bar space"); + return; + } +} + +static inline void ioport_inw(const struct rte_pci_device *pci_dev, + uint16_t reg, uint16_t *val) +{ + if (rte_eal_pci_read_bar(pci_dev, (uint16_t *)val, sizeof(uint16_t), +reg, 0) <= 0) { + PMD_DRV_LOG(ERR, "Can't read from PCI bar space"); + return; + } +} + +static inline void ioport_inl(const struct rte_pci_device *pci_dev, + uint32_t reg, uint32_t *val) +{ + if (rte_eal_pci_read_bar(pci_dev, (uint32_t *)val, sizeof(uint32_t), +reg, 0) <= 0) { + PMD_DRV_LOG(ERR, "Can't read from PCI bar space"); + return; + } +} + +static inline void ioport_outb_p(const struct rte_pci_device *pci_dev, +uint8_t reg, uint8_t val) +{ + if (rte_eal_pci_write_bar(pci_dev, (uint8_t *)&val, sizeof(uint8_t), + reg, 0) <= 0) { + PMD_DRV_LOG(ERR, "Can't write to PCI bar space"); + return; + } +} + + +static inline void ioport_outw_p(const struct rte_pci_device *pci_dev, +uint16_t reg, uint16_t val) +{ + if (rte_eal_pci_write_bar(pci_dev, (uint16_t *)&val, sizeof(uint16_t), + reg, 0) <= 0) { + PMD_DRV_LOG(ERR, "Can't write to PCI bar space"); + return; + } +} + + +static inline void ioport_outl_p(const struct rte_pci_device *pci_dev, +uint32_t reg, uint32_t val) +{ + if (rte_eal_pci_write_bar(pci_dev, (uint32_t *)&val, sizeof(uint32_t), + reg, 0) <= 0) { + PMD_DRV_LOG(ERR, "Can't write to PCI bar space"); + return; + } +} + +#endif /* RTE_EAL_VFIO && RTE_XX_EAL_LINUXAPP */ +#endif /* _VIRTIO_VFI
[dpdk-dev] [PATCH v4 08/14] virtio: pci: extend virtio pci rw api for vfio interface
So far virtio handle rw access for uio / ioport interface, This patch to extend the support for vfio interface. For that introducing private struct virtio_vfio_dev{ - is_vfio - pci_dev }; Signed-off-by: Santosh Shukla --- v3->v4: - Removed #indef RTE_EAL_VFIO and made it arch agnostic such now virtio_pci rd/wr api to handle both vfio and ig_uio/ioport interfaces, depending upon is_vfio flags set or unset. - Tested for x86 for igb_uio and vfio interface, also tested for arm64 for vfio interface. drivers/net/virtio/virtio_pci.h | 84 --- 1 file changed, 70 insertions(+), 14 deletions(-) diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 8b5b031..8526c07 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -46,6 +46,8 @@ #endif #include +#include +#include "virtio_vfio_rw.h" struct virtqueue; @@ -165,6 +167,14 @@ struct virtqueue; */ #define VIRTIO_MAX_VIRTQUEUES 8 +/* For vfio only */ +struct virtio_vfio_dev { + boolis_vfio;/* True: vfio i/f, +* False: not a vfio i/f +*/ + struct rte_pci_device *pci_dev; /* vfio dev */ +}; + struct virtio_hw { struct virtqueue *cvq; uint32_tio_base; @@ -176,6 +186,7 @@ struct virtio_hw { uint8_t use_msix; uint8_t started; uint8_t mac_addr[ETHER_ADDR_LEN]; + struct virtio_vfio_dev dev; }; /* @@ -231,20 +242,65 @@ outl_p(unsigned int data, unsigned int port) #define VIRTIO_PCI_REG_ADDR(hw, reg) \ (unsigned short)((hw)->io_base + (reg)) -#define VIRTIO_READ_REG_1(hw, reg) \ - inb((VIRTIO_PCI_REG_ADDR((hw), (reg -#define VIRTIO_WRITE_REG_1(hw, reg, value) \ - outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg - -#define VIRTIO_READ_REG_2(hw, reg) \ - inw((VIRTIO_PCI_REG_ADDR((hw), (reg -#define VIRTIO_WRITE_REG_2(hw, reg, value) \ - outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg - -#define VIRTIO_READ_REG_4(hw, reg) \ - inl((VIRTIO_PCI_REG_ADDR((hw), (reg -#define VIRTIO_WRITE_REG_4(hw, reg, value) \ - outl_p((unsigned int)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg +#define VIRTIO_READ_REG_1(hw, reg) \ +({ \ + uint8_t ret;\ + struct virtio_vfio_dev *vdev; \ + (vdev) = (&(hw)->dev); \ + (((vdev)->is_vfio) ?\ + (ioport_inb(((vdev)->pci_dev), reg, &ret)) :\ + ((ret) = (inb((VIRTIO_PCI_REG_ADDR((hw), (reg))); \ + ret;\ +}) + +#define VIRTIO_WRITE_REG_1(hw, reg, value) \ +({ \ + struct virtio_vfio_dev *vdev; \ + (vdev) = (&(hw)->dev); \ + (((vdev)->is_vfio) ?\ + (ioport_outb_p(((vdev)->pci_dev), reg, (uint8_t)(value))) : \ + (outb_p((unsigned char)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg)); \ +}) + +#define VIRTIO_READ_REG_2(hw, reg) \ +({ \ + uint16_t ret; \ + struct virtio_vfio_dev *vdev; \ + (vdev) = (&(hw)->dev); \ + (((vdev)->is_vfio) ?\ + (ioport_inw(((vdev)->pci_dev), reg, &ret)) :\ + ((ret) = (inw((VIRTIO_PCI_REG_ADDR((hw), (reg))); \ + ret;\ +}) + +#define VIRTIO_WRITE_REG_2(hw, reg, value) \ +({ \ + struct virtio_vfio_dev *vdev; \ + (vdev) = (&(hw)->dev); \ + (((vdev)->is_vfio) ?\ + (ioport_outw_p(((vdev)->pci_dev), reg, (uint16_t)(value))) :\ + (outw_p((unsigned short)(value), (VIRTIO_PCI_REG_ADDR((hw), (reg)); \ +}) + +#define VIRTIO_READ_REG_4(hw, reg) \ +({ \ + uint32_t ret; \ + struct vi
[dpdk-dev] [PATCH v4 09/14] virtio: ethdev: check for vfio interface
Introducing api to check interface type is vfio or not, if interface is vfio then update struct virtio_vfio_dev {}. Those two apis are: - virtio_chk_for_vfio - virtio_hw_init_by_vfio Signed-off-by: Santosh Shukla --- v3->v4: - Removed RTE_PCI_DRV_NEED_MAPPING drv flag (as per Review comment from Stephen and Suggested by Yuan) - Introducing vfio interface parsing api which will set/unset is_vfio flag at runtime. drivers/net/virtio/virtio_ethdev.c | 112 +++- 1 file changed, 110 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index d928339..8f2260f 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -1202,6 +1202,105 @@ static int virtio_resource_init(struct rte_pci_device *pci_dev) return virtio_resource_init_by_ioports(pci_dev); } +static int virtio_chk_for_vfio(struct rte_pci_device *pci_dev) +{ + /* +* 1. check whether vfio-noiommu mode is enabled +* 2. verify pci device attached to vfio-noiommu driver +* root at arm64:/sys/bus/pci/drivers/vfio-pci/:00:01.0/iommu_group# +* > cat name +* > vfio-noiommu +*/ + + /* 1. Chk for vfio: noiommu mode set or not in kernel driver */ + struct rte_pci_addr *loc; + FILE *fp; + const char *path = "/sys/module/vfio/parameters/enable_unsafe_noiommu_mode"; + char filename[PATH_MAX] = {0}; + char buf[PATH_MAX] = {0}; + + fp = fopen(path, "r"); + if (fp == NULL) { + PMD_INIT_LOG(ERR, "can't open %s\n", path); + return -1; + } + + if (fread(buf, sizeof(char), 1, fp) != 1) { + PMD_INIT_LOG(ERR, "can't read from file %s\n", path); + fclose(fp); + return -1; + } + + if (strncmp(buf, "Y", 1) != 0) { + PMD_INIT_LOG(ERR, "[%s]: vfio: noiommu mode not set\n", path); + fclose(fp); + return -1; + } + + fclose(fp); + + /* 2. Verify pci device attached to vfio-noiommu driver */ + + /* 2.1 chk whether attached driver is vfio-noiommu or not */ + loc = &pci_dev->addr; + snprintf(filename, sizeof(filename), +SYSFS_PCI_DEVICES "/" PCI_PRI_FMT "/iommu_group/name", +loc->domain, loc->bus, loc->devid, loc->function); + + /* check for vfio-noiommu */ + fp = fopen(filename, "r"); + if (fp == NULL) { + PMD_INIT_LOG(ERR, "can't open %s\n", filename); + return -1; + } + + if (fread(buf, sizeof(char), sizeof("vfio-noiommu"), fp) != + sizeof("vfio-noiommu")) { + PMD_INIT_LOG(ERR, "can't read from file %s\n", filename); + fclose(fp); + return -1; + } + + if (strncmp(buf, "vfio-noiommu", strlen("vfio-noiommu")) != 0) { + PMD_INIT_LOG(ERR, "not a vfio-noiommu driver\n"); + fclose(fp); + return -1; + } + + fclose(fp); + + /* todo: vfio interrupt handling */ + return 0; +} + +/* Init virtio by vfio-way */ +static int virtio_hw_init_by_vfio(struct virtio_hw *hw, + struct rte_pci_device *pci_dev) +{ + struct virtio_vfio_dev *vdev; + + vdev = &hw->dev; + if (virtio_chk_for_vfio(pci_dev) < 0) { + vdev->is_vfio = false; + vdev->pci_dev = NULL; + return -1; + } + + /* .. So attached interface is vfio */ + vdev->is_vfio = true; + vdev->pci_dev = pci_dev; + + /* For debug use only */ + const struct rte_intr_handle *intr_handle; + RTE_SET_USED(intr_handle); /* to keep compilar happy */ + intr_handle = &pci_dev->intr_handle; + PMD_INIT_LOG(DEBUG, "vdev->pci_dev %p intr_handle %p vfio_dev_fd %d\n", +vdev->pci_dev, intr_handle, +intr_handle->vfio_dev_fd); + + return 0; +} + #else static int virtio_has_msix(const struct rte_pci_addr *loc __rte_unused) @@ -1215,6 +1314,13 @@ static int virtio_resource_init(struct rte_pci_device *pci_dev __rte_unused) /* no setup required */ return 0; } + +static int virtio_hw_init_by_vfio(struct virtio_hw *hw __rte_unused, + struct rte_pci_device *pci_dev __rte_unused) +{ + /* NA */ + return 0; +} #endif /* @@ -1287,8 +1393,10 @@ eth_virtio_dev_init(struct rte_eth_dev *eth_dev) pci_dev = eth_dev->pci_dev; - if (virtio_resource_init(pci_dev) < 0) - return -1; + if (virtio_hw_init_by_vfio(hw, pci_dev) < 0) { + if (virtio_resource_init(pci_dev) < 0) + return -1; + } hw->use_msix = virtio_has_msix(&pci_dev->addr); hw->io_base = (uint32_t)(uintptr_t)pci_dev-
[dpdk-dev] [PATCH v4 10/14] virtio: pci: add dummy func definition for in/outb for non-x86 arch
For non-x86 arch, Compiler will throw build error for in/out apis. Including dummy api function so to pass build. Note that: For virtio to work for non-x86 arch - RTE_EAL_VFIO is the only supported method. RTE_EAL_IGB_UIO is not supported for non-x86 arch. So, Virtio support for arch and supported interface by that arch: ARCH IGB_UIO VFIO x86 Y Y ARM64 N/A Y PPC_64 N/A Y (Not tested but likely should work, as vfio is arch independent) Signed-off-by: Santosh Shukla --- v4: - dummy inb/outb function useful for non-x86 archs, Intent to get-rid of build error. drivers/net/virtio/virtio_pci.h | 42 +++ 1 file changed, 42 insertions(+) diff --git a/drivers/net/virtio/virtio_pci.h b/drivers/net/virtio/virtio_pci.h index 8526c07..600260a 100644 --- a/drivers/net/virtio/virtio_pci.h +++ b/drivers/net/virtio/virtio_pci.h @@ -239,6 +239,48 @@ outl_p(unsigned int data, unsigned int port) } #endif +#if !defined(RTE_ARCH_X86_64) && !defined(RTE_ARCH_I686) && \ + defined(RTE_EXEC_ENV_LINUXAPP) +static inline uint8_t inb(unsigned long addr __rte_unused) +{ + PMD_INIT_LOG(ERR, "inb() not supported for this RTE_ARCH\n"); + return 0; +} + +static inline uint16_t inw(unsigned long addr __rte_unused) +{ + PMD_INIT_LOG(ERR, "inw() not supported for this RTE_ARCH\n"); + return 0; +} + +static inline uint32_t inl(unsigned long addr __rte_unused) +{ + PMD_INIT_LOG(ERR, "in() not supported for this RTE_ARCH\n"); + return 0; +} + +static inline void +outb_p(unsigned char data __rte_unused, unsigned int port __rte_unused) +{ + PMD_INIT_LOG(ERR, "outb_p() not supported for this RTE_ARCH\n"); + return; +} + +static inline void +outw_p(unsigned short data __rte_unused, unsigned int port __rte_unused) +{ + PMD_INIT_LOG(ERR, "outw_p() not supported for this RTE_ARCH\n"); + return; +} + +static inline void +outl_p(unsigned int data __rte_unused, unsigned int port __rte_unused) +{ + PMD_INIT_LOG(ERR, "outl_p() not supported for this RTE_ARCH\n"); + return; +} +#endif + #define VIRTIO_PCI_REG_ADDR(hw, reg) \ (unsigned short)((hw)->io_base + (reg)) -- 1.7.9.5
[dpdk-dev] [PATCH v4 11/14] config: armv7/v8: Enable RTE_LIBRTE_VIRTIO_PMD
Enable RTE_LIBRTE_VIRTIO_PMD for armv7/v8 and setting RTE_VIRTIO_INC_VEC=n. Builds successfully for armv7/v8. Signed-off-by: Santosh Shukla --- v2->v4: - Removed explict setting of VIRTIO_PMD in configs (per review comment from Jiangbo) config/defconfig_arm-armv7a-linuxapp-gcc |4 +++- config/defconfig_arm64-armv8a-linuxapp-gcc |4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc index cbebd64..9f852ce 100644 --- a/config/defconfig_arm-armv7a-linuxapp-gcc +++ b/config/defconfig_arm-armv7a-linuxapp-gcc @@ -43,6 +43,9 @@ CONFIG_RTE_FORCE_INTRINSICS=y CONFIG_RTE_TOOLCHAIN="gcc" CONFIG_RTE_TOOLCHAIN_GCC=y +# Disable VIRTIO VECTOR support +CONFIG_RTE_VIRTIO_INC_VECTOR=n + # ARM doesn't have support for vmware TSC map CONFIG_RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT=n @@ -70,7 +73,6 @@ CONFIG_RTE_LIBRTE_I40E_PMD=n CONFIG_RTE_LIBRTE_IXGBE_PMD=n CONFIG_RTE_LIBRTE_MLX4_PMD=n CONFIG_RTE_LIBRTE_MPIPE_PMD=n -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n CONFIG_RTE_LIBRTE_VMXNET3_PMD=n CONFIG_RTE_LIBRTE_PMD_XENVIRT=n CONFIG_RTE_LIBRTE_PMD_BNX2X=n diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index 504f3ed..1a638b3 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -45,8 +45,10 @@ CONFIG_RTE_TOOLCHAIN_GCC=y CONFIG_RTE_CACHE_LINE_SIZE=64 +# Disable VIRTIO VECTOR support +CONFIG_RTE_VIRTIO_INC_VECTOR=n + CONFIG_RTE_IXGBE_INC_VECTOR=n -CONFIG_RTE_LIBRTE_VIRTIO_PMD=n CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n CONFIG_RTE_LIBRTE_I40E_PMD=n -- 1.7.9.5
[dpdk-dev] [PATCH v4 12/14] eal: pci: export pci_[un]map_device
From: Yuanhan Liu Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will invoke pci_map_device internally for us. From that point view, there is no need to export pci_map_device. However, for virtio pmd driver, which is designed to work without binding UIO (or something similar first), pci_map_device() will fail, which ends up with virtio pmd driver being skipped. Therefore, we can not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver. Therefore, this patch exports pci_map_device, and let virtio pmd call it when necessary. Cc: David Marchand Signed-off-by: Yuanhan Liu Tested-by: Santosh Shukla --- - Pulled Yuan v3 patch, just for testing and other user to try out complete set and test it on HW i.e. x86 and non-x86 both. It works for VFIO mode seemlessly. lib/librte_eal/bsdapp/eal/eal_pci.c |4 ++-- lib/librte_eal/bsdapp/eal/rte_eal_version.map |7 ++ lib/librte_eal/common/eal_common_pci.c |4 ++-- lib/librte_eal/common/eal_private.h | 18 --- lib/librte_eal/common/include/rte_pci.h | 27 +++ lib/librte_eal/linuxapp/eal/eal_pci.c |4 ++-- lib/librte_eal/linuxapp/eal/rte_eal_version.map |7 ++ 7 files changed, 47 insertions(+), 24 deletions(-) diff --git a/lib/librte_eal/bsdapp/eal/eal_pci.c b/lib/librte_eal/bsdapp/eal/eal_pci.c index 6c21fbd..95c32c1 100644 --- a/lib/librte_eal/bsdapp/eal/eal_pci.c +++ b/lib/librte_eal/bsdapp/eal/eal_pci.c @@ -93,7 +93,7 @@ pci_unbind_kernel_driver(struct rte_pci_device *dev __rte_unused) /* Map pci device */ int -pci_map_device(struct rte_pci_device *dev) +rte_eal_pci_map_device(struct rte_pci_device *dev) { int ret = -1; @@ -115,7 +115,7 @@ pci_map_device(struct rte_pci_device *dev) /* Unmap pci device */ void -pci_unmap_device(struct rte_pci_device *dev) +rte_eal_pci_unmap_device(struct rte_pci_device *dev) { /* try unmapping the NIC resources */ switch (dev->kdrv) { diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map index 9d7adf1..1b28170 100644 --- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map +++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map @@ -135,3 +135,10 @@ DPDK_2.2 { rte_xen_dom0_supported; } DPDK_2.1; + +DPDK_2.3 { + global: + + rte_eal_pci_map_device; + rte_eal_pci_unmap_device; +} DPDK_2.2; diff --git a/lib/librte_eal/common/eal_common_pci.c b/lib/librte_eal/common/eal_common_pci.c index dcfe947..96d5113 100644 --- a/lib/librte_eal/common/eal_common_pci.c +++ b/lib/librte_eal/common/eal_common_pci.c @@ -188,7 +188,7 @@ rte_eal_pci_probe_one_driver(struct rte_pci_driver *dr, struct rte_pci_device *d pci_config_space_set(dev); #endif /* map resources for devices that use igb_uio */ - ret = pci_map_device(dev); + ret = rte_eal_pci_map_device(dev); if (ret != 0) return ret; } else if (dr->drv_flags & RTE_PCI_DRV_FORCE_UNBIND && @@ -254,7 +254,7 @@ rte_eal_pci_detach_dev(struct rte_pci_driver *dr, if (dr->drv_flags & RTE_PCI_DRV_NEED_MAPPING) /* unmap resources for devices that use igb_uio */ - pci_unmap_device(dev); + rte_eal_pci_unmap_device(dev); return 0; } diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index 072e672..2342fa1 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -165,24 +165,6 @@ struct rte_pci_device; int pci_unbind_kernel_driver(struct rte_pci_device *dev); /** - * Map this device - * - * This function is private to EAL. - * - * @return - * 0 on success, negative on error and positive if no driver - * is found for the device. - */ -int pci_map_device(struct rte_pci_device *dev); - -/** - * Unmap this device - * - * This function is private to EAL. - */ -void pci_unmap_device(struct rte_pci_device *dev); - -/** * Map the PCI resource of a PCI device in virtual memory * * This function is private to EAL. diff --git a/lib/librte_eal/common/include/rte_pci.h b/lib/librte_eal/common/include/rte_pci.h index 53437cc..0c667ff 100644 --- a/lib/librte_eal/common/include/rte_pci.h +++ b/lib/librte_eal/common/include/rte_pci.h @@ -523,6 +523,33 @@ int rte_eal_pci_write_bar(const struct rte_pci_device *device, */ int rte_eal_pci_write_config(const struct rte_pci_device *device, const void *buf, size_t len, off_t offset); +/** + * Map the PCI device resources in user space virtual memory address + * + * Note that driver should not call this function when flag + * RTE_PCI_DRV_NEED_MAPPING is set, as EAL will do that for + * you when it's on. + * + * @param dev + * A pointer to a rte_pc
[dpdk-dev] [PATCH v4 13/14] virtio: enable vfio in pmd driver
Using mapping api i.e. rte_eal_pci_map_device() to create vfio container, group id and get the vfio-dev-fd for virtio-net-pci interface. Later vfio_dev_fd used for virtio device rd/wr operation. Signed-off-by: Santosh Shukla --- v3->v4: - Per stephens comment, removed static driver flag RTE_PCI_XXX_XX_NEED_MAPPING , now vfio virtio pmd driver intialized vifio interface at runtime. drivers/net/virtio/virtio_ethdev.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c index 8f2260f..ce03a24 100644 --- a/drivers/net/virtio/virtio_ethdev.c +++ b/drivers/net/virtio/virtio_ethdev.c @@ -497,6 +497,8 @@ virtio_dev_close(struct rte_eth_dev *dev) hw->started = 0; virtio_dev_free_mbufs(dev); virtio_free_queues(dev); + + /* For vfio case : hotunplug/unmap not supported (todo) */ } static void @@ -1286,6 +1288,16 @@ static int virtio_hw_init_by_vfio(struct virtio_hw *hw, return -1; } + /* +* pci_map_device used not to actually map ioport region but +* create vfio container/group and vfio-dev-fd for _this_ +* virtio interface. +*/ + if (rte_eal_pci_map_device(pci_dev) != 0) { + PMD_INIT_LOG(ERR, "vfio pci mapping failed for ioport bar\n"); + return -1; + } + /* .. So attached interface is vfio */ vdev->is_vfio = true; vdev->pci_dev = pci_dev; -- 1.7.9.5
[dpdk-dev] [PATCH v4 14/14] vfio: Support for no-IOMMU mode
From: Anatoly Burakov This commit is adding a generic mechanism to support multiple IOMMU types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special VFIO mode that doesn't use IOMMU at all), but it's easily extended by adding necessary definitions into eal_pci_init.h and a DMA mapping function to eal_pci_vfio_dma.c. Since type 1 IOMMU module is no longer necessary to have VFIO, we fix the module check to check for vfio-pci instead. It's not ideal and triggers VFIO checks more often (and thus produces more error output, which was the reason behind the module check in the first place), so we compensate for that by providing more verbose logging, indicating whether VFIO initialization has succeeded or failed. Signed-off-by: Anatoly Burakov Signed-off-by: Santosh Shukla Tested-by: Santosh Shukla --- - Pulled Anatoly patch just for testing and other user to use full patchset to try-out / experitment for, This patchset works for me :). lib/librte_eal/linuxapp/eal/Makefile |1 + lib/librte_eal/linuxapp/eal/eal_pci_init.h | 22 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 143 +++- lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c | 84 ++ lib/librte_eal/linuxapp/eal/eal_vfio.h |5 + 5 files changed, 202 insertions(+), 53 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index 26eced5..5c9e9d9 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c +SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_dma.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h b/lib/librte_eal/linuxapp/eal/eal_pci_init.h index 3bc592b..068800c 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h +++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h @@ -112,6 +112,28 @@ struct vfio_config { struct vfio_group vfio_groups[VFIO_MAX_GROUPS]; }; +/* function pointer typedef for DMA mapping functions */ +typedef int (*vfio_dma_func_t)(int); + +/* Structure to hold supported IOMMU types */ +struct vfio_iommu_type { + int type_id; + const char *name; + vfio_dma_func_t dma_map_func; +}; + +/* function prototypes for different IOMMU types */ +int vfio_iommu_type1_dma_map(int container_fd); +int vfio_iommu_noiommu_dma_map(int container_fd); + +/* IOMMU types we support */ +static const struct vfio_iommu_type iommu_types[] = { + /* x86 IOMMU, otherwise known as type 1 */ + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map}, + /* IOMMU-less mode */ + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map}, +}; + #endif #endif /* EAL_PCI_INIT_H_ */ diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c index df407ef..5c5ccea 100644 --- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c @@ -72,6 +72,7 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq) #define VFIO_DIR "/dev/vfio" #define VFIO_CONTAINER_PATH "/dev/vfio/vfio" #define VFIO_GROUP_FMT "/dev/vfio/%u" +#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u" #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL) /* per-process VFIO config */ @@ -236,42 +237,58 @@ pci_vfio_set_bus_master(int dev_fd) return 0; } -/* set up DMA mappings */ -static int -pci_vfio_setup_dma_maps(int vfio_container_fd) -{ - const struct rte_memseg *ms = rte_eal_get_physmem_layout(); - int i, ret; - - ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, - VFIO_TYPE1_IOMMU); - if (ret) { - RTE_LOG(ERR, EAL, " cannot set IOMMU type, " - "error %i (%s)\n", errno, strerror(errno)); - return -1; +/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */ +static const struct vfio_iommu_type * +pci_vfio_set_iommu_type(int vfio_container_fd) { + unsigned idx; + for (idx = 0; idx < RTE_DIM(iommu_types); idx++) { + const struct vfio_iommu_type *t = &iommu_types[idx]; + + int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU, + t->type_id); + if (!ret) { + RTE_LOG(NOTICE, EAL, " using IOMMU type %d (%s)\n", + t->type_id, t->name); + return t; + } + /* not an error, there may
[dpdk-dev] [PATCH v2] app/testpmd Fix max_socket detection
On Wed, Jan 13, 2016 at 02:23:36PM -0800, Stephen Hurd wrote: > Previously, max_socket was set to the highest numbered socket with > an enabled lcore. The intent is to set it to the highest socket > regardless of it being enabled. > Can you clarify why this changes is necessary? Is it causing a bug somewhere? thanks, /Bruce
[dpdk-dev] [PATCH v2 0/5] virtio: Tx performance improvements
On 1/6/2016 8:04 PM, Thomas Monjalon wrote: > 2016-01-05 08:10, Xie, Huawei: >> On 10/26/2015 10:06 PM, Xie, Huawei wrote: >>> On 10/19/2015 1:16 PM, Stephen Hemminger wrote: This is a tested version of the virtio Tx performance improvements that I posted earlier on the list, and described at the DPDK Userspace meeting in Dublin. Together they get a 25% performance improvement for both small packet and large multi-segment packet case when testing from DPDK guest application to Linux KVM host. Stephen Hemminger (5): virtio: clean up space checks on xmit virtio: don't use unlikely for normal tx stuff virtio: use indirect ring elements virtio: use any layout on transmit virtio: optimize transmit enqueue >>> There is one open why merge-able header is used in tx path. Since old >>> implementation is also using the merge-able header in tx path if this >>> feature is negotiated, i choose to ack the patch and address this later >>> if not now. >>> >>> Acked-by: Huawei Xie >> Thomas: >> This patch isn't in the patchwork. Does Stephen need to send a new one? > Yes please, I cannot find them in patchwork. ping >
[dpdk-dev] [PATCH 0/4] Optimize memcpy for AVX512 platforms
On Thu, 14 Jan 2016 01:13:18 -0500 Zhihong Wang wrote: > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > utilization of hardware resources and deliver high performance. > > In current DPDK, memcpy holds a large proportion of execution time in > libs like Vhost, especially for large packets, and this patch can bring > considerable benefits. > > The implementation is based on the current DPDK memcpy framework, some > background introduction can be found in these threads: > http://dpdk.org/ml/archives/dev/2014-November/008158.html > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > Code changes are: > > 1. Read CPUID to check if AVX512 is supported by CPU > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > 3. Implement AVX512 memcpy and choose the right implementation based on > predefined macros > > 4. Decide alignment unit for memcpy perf test based on predefined macros > > Zhihong Wang (4): > lib/librte_eal: Identify AVX512 CPU flag > mk: Predefine AVX512 macro for compiler > lib/librte_eal: Optimize memcpy for AVX512 platforms > app/test: Adjust alignment unit for memcpy perf test > > app/test/test_memcpy_perf.c| 6 + > .../common/include/arch/x86/rte_cpuflags.h | 2 + > .../common/include/arch/x86/rte_memcpy.h | 247 > - > mk/rte.cpuflags.mk | 4 + > 4 files changed, 255 insertions(+), 4 deletions(-) > This really looks like code that could benefit from Gcc function multiversioning. The current cpuflags model is useless/flawed in real product deployment
[dpdk-dev] Problem with Intel i40e XL710 dpdk driver
Hi, I am seeing a "Failed to init adminq: -54" or admin queue timeouts while initializing the admin queue for i40e xl710 intel nic. (Intel server is a E5-2670) First things first. I am running the latest firmware. The kernel module is not loaded and yes, it works with the i40e kernel driver. (latest or otherwise) And this problem comes even with dpdk 2.0/2.1 or the latest stable. So there's that. I have done a bunch of debugging and here are my findings. With the card configured in 2x40g or 4x10g mode, it _ALWAYS_ works with successfully initializing pci function 0 or port 0. It always fails to subsequently initialize the rest. Even if unbind the igb uio for port 0 and bind only port 1 or port 2,3,4 in 4x10g mode, it fails. Since it works with the kernel driver, I tried to see if there were differences in the way registers are setup for i40e driver in kernel and dpdk. They look mostly to be the same but obviously there were subtle differences. From what I could fathom, I couldn't see much and whatever little was caught, I tried to keep the dpdk code in sync and it still failed. While stepping through gdb all the way from eal pci to pci uio map to eth_i40e_dev_init, to the failure in obtaining the firmware revision for port1 during i40e_init_adminq, I did confirm that the memory map was right for the pci. So the hw->hw_addr looks correct for port 1 correlating it to the uio1 map or the physical address from lspci or kernel driver when using the kernel driver which works. However the admin queue seems to be not processing any request for port 1. Note that port 0 always works and its the same code for others with a different eal dev/hw instance. But for other ports like port1, after correctly setting up the adminq registers and memory map, it always fails to obtain the firmware revision since the i40e_asq_done is returning 0 for the head register at 0x80300 and doesn't match the next_in_use when starting at 1. So it always returns pending or false in i40e_asq_done which is retried a certain times after resetting the aq by i40e_init_adminq but ultimately gives up. Thoughts and wondering if you guys have seen this and have a fix or patch that is not in upstream yet. Failure enclosed below as mentioned above in detail: (with a 4x10g mode for the card but same failure with 2x40g mode as well. No difference. Port 0 always succeeds but subsequent ports fail. And same result even with port 0 not bound and starting with the initialization of port 2,3,4 which always fails. EAL: lcore 1 is ready (tid=6bd30700;cpuset=[1]) EAL: PCI device :01:00.0 on NUMA socket 0 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device :01:00.1 on NUMA socket 0 EAL: probe driver: 8086:1521 rte_igb_pmd EAL: Not managed by a supported kernel driver, skipped EAL: PCI device :83:00.0 on NUMA socket 1 EAL: probe driver: 8086:1583 rte_i40e_pmd EAL: PCI memory mapped at 0x7f2f8000 EAL: PCI memory mapped at 0x7f2f8080 PMD: eth_i40e_dev_init(): FW 4.40 API 1.4 NVM 04.05.03 eetrack 80001dca PMD: i40e_pf_parameter_init(): Max supported VSIs:34 PMD: i40e_pf_parameter_init(): PF queue pairs:64 PMD: i40e_pf_parameter_init(): Max VMDQ VSI num:34 PMD: i40e_pf_parameter_init(): VMDQ queue pairs:4 EAL: PCI device :83:00.1 on NUMA socket 1 EAL: probe driver: 8086:1583 rte_i40e_pmd EAL: PCI memory mapped at 0x7f2f80808000 EAL: PCI memory mapped at 0x7f2f81008000 PMD: eth_i40e_dev_init(): Failed to init adminq: -54 EAL: Error - exiting with code: 1 Cause: Requested device :83:00.1 cannot be used Regards, -Karthick