[dpdk-dev] [PATCH v2 24/33] i40e/base: get pf_id from HW rather than PCI function
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:04 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 24/33] i40e/base: get pf_id from HW rather than PCI > function > > Getting the pf_id from the function number was a good place to start, but > when the PF was setup in pass-thru mode, the PCI bus/device/function was > virtualized and the number in the VM is different from the number in the > bare metal. This caused HW configuration issues when the wrong pf_id was > used to set up the HMC and other structures. The PF_FUNC_RID register has > the real bus/device/function information as configured by the BIOS, so use > that for a better number. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/i40e/i40e_common.c | 23 +++ > 1 file changed, 11 insertions(+), 12 deletions(-) > > diff --git a/lib/librte_pmd_i40e/i40e/i40e_common.c > b/lib/librte_pmd_i40e/i40e/i40e_common.c > index 705b9dd..03980b9 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_common.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_common.c > @@ -592,7 +592,7 @@ enum i40e_status_code i40e_validate_mac_addr(u8 > *mac_addr) enum i40e_status_code i40e_init_shared_code(struct i40e_hw > *hw) { > enum i40e_status_code status = I40E_SUCCESS; > - u32 reg; > + u32 port, ari, func_rid; > > DEBUGFUNC("i40e_init_shared_code"); > > @@ -607,18 +607,17 @@ enum i40e_status_code > i40e_init_shared_code(struct i40e_hw *hw) > > hw->phy.get_link_info = true; > > - /* Determine port number */ > - reg = rd32(hw, I40E_PFGEN_PORTNUM); > - reg = ((reg & I40E_PFGEN_PORTNUM_PORT_NUM_MASK) >> > -I40E_PFGEN_PORTNUM_PORT_NUM_SHIFT); > - hw->port = (u8)reg; > - > - /* Determine the PF number based on the PCI fn */ > - reg = rd32(hw, I40E_GLPCI_CAPSUP); > - if (reg & I40E_GLPCI_CAPSUP_ARI_EN_MASK) > - hw->pf_id = (u8)((hw->bus.device << 3) | hw->bus.func); > + /* Determine port number and PF number*/ > + port = (rd32(hw, I40E_PFGEN_PORTNUM) & > I40E_PFGEN_PORTNUM_PORT_NUM_MASK) > +>> > I40E_PFGEN_PORTNUM_PORT_NUM_SHIFT; > + hw->port = (u8)port; > + ari = (rd32(hw, I40E_GLPCI_CAPSUP) & > I40E_GLPCI_CAPSUP_ARI_EN_MASK) >> > + > I40E_GLPCI_CAPSUP_ARI_EN_SHIFT; > + func_rid = rd32(hw, I40E_PF_FUNC_RID); > + if (ari) > + hw->pf_id = (u8)(func_rid & 0xff); > else > - hw->pf_id = (u8)hw->bus.func; > + hw->pf_id = (u8)(func_rid & 0x7); > > status = i40e_init_nvm(hw); > return status; > -- > 1.8.1.4
[dpdk-dev] [PATCH v2 04/33] i40e/base: rename 'err' to 'perrno'
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 04/33] i40e/base: rename 'err' to 'perrno' > > To be consistent with the original base driver, the variable name of 'err' > should be renamed to 'perrno'. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/i40e/i40e_nvm.c | 118 ++ > -- > 1 file changed, 59 insertions(+), 59 deletions(-) > > diff --git a/lib/librte_pmd_i40e/i40e/i40e_nvm.c > b/lib/librte_pmd_i40e/i40e/i40e_nvm.c > index 73b8997..2b70508 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_nvm.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_nvm.c > @@ -481,25 +481,25 @@ i40e_validate_nvm_checksum_exit: > > STATIC enum i40e_status_code i40e_nvmupd_state_init(struct i40e_hw > *hw, > struct i40e_nvm_access > *cmd, > - u8 *bytes, int *err); > + u8 *bytes, int *perrno); > STATIC enum i40e_status_code i40e_nvmupd_state_reading(struct i40e_hw > *hw, > struct i40e_nvm_access > *cmd, > - u8 *bytes, int *err); > + u8 *bytes, int *perrno); > STATIC enum i40e_status_code i40e_nvmupd_state_writing(struct i40e_hw > *hw, > struct i40e_nvm_access > *cmd, > - u8 *bytes, int *err); > + u8 *bytes, int *perrno); > STATIC enum i40e_nvmupd_cmd i40e_nvmupd_validate_command(struct > i40e_hw *hw, > struct i40e_nvm_access > *cmd, > - int *err); > + int *perrno); > STATIC enum i40e_status_code i40e_nvmupd_nvm_erase(struct i40e_hw > *hw, > struct i40e_nvm_access > *cmd, > -int *err); > +int *perrno); > STATIC enum i40e_status_code i40e_nvmupd_nvm_write(struct i40e_hw > *hw, > struct i40e_nvm_access > *cmd, > -u8 *bytes, int *err); > +u8 *bytes, int *perrno); > STATIC enum i40e_status_code i40e_nvmupd_nvm_read(struct i40e_hw > *hw, > struct i40e_nvm_access > *cmd, > - u8 *bytes, int *err); > + u8 *bytes, int *perrno); > STATIC inline u8 i40e_nvmupd_get_module(u32 val) { > return (u8)(val & I40E_NVM_MOD_PNT_MASK); @@ -514,38 > +514,38 @@ STATIC inline u8 i40e_nvmupd_get_transaction(u32 val) > * @hw: pointer to hardware structure > * @cmd: pointer to nvm update command > * @bytes: pointer to the data buffer > - * @err: pointer to return error code > + * @perrno: pointer to return error code > * > * Dispatches command depending on what update state is current > **/ > enum i40e_status_code i40e_nvmupd_command(struct i40e_hw *hw, > struct i40e_nvm_access *cmd, > - u8 *bytes, int *err) > + u8 *bytes, int *perrno) > { > enum i40e_status_code status; > > DEBUGFUNC("i40e_nvmupd_command"); > > /* assume success */ > - *err = 0; > + *perrno = 0; > > switch (hw->nvmupd_state) { > case I40E_NVMUPD_STATE_INIT: > - status = i40e_nvmupd_state_init(hw, cmd, bytes, err); > + status = i40e_nvmupd_state_init(hw, cmd, bytes, perrno); > break; > > case I40E_NVMUPD_STATE_READING: > - status = i40e_nvmupd_state_reading(hw, cmd, bytes, err); > + status = i40e_nvmupd_state_reading(hw, cmd, bytes, > perrno); > break; > > case I40E_NVMUPD_STATE_WRITING: > - status = i40e_nvmupd_state_writing(hw, cmd, bytes, err); > + status = i40e_nvmupd_state_writing(hw, cmd, bytes, perrno); > break; > > default: > /* invalid state, should never happen */ > status = I40E_NOT_SUPPORTED; > - *err = -ESRCH; > + *perrno = -ESRCH; > break; > } > return status; > @@ -556,29 +556,29 @@ enum i40e_status_code > i40e_nvmupd_command(struct i40e_hw *hw, > * @hw: pointer to hardware structure > * @cmd: pointer t
[dpdk-dev] [PATCH v2 03/33] i40e: adjustment of register definitions and relevant
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 03/33] i40e: adjustment of register definitions and > relevant > > Some macros of register definitions or relevant are added, modified or > deleted. In detail, they are as follows. > - I40E_PRTDCB_RUPTQ > - I40E_GLGEN_GPIO_CTL > - I40E_GLGEN_MDIO_CTRL > - I40E_GLGEN_RSTENA_EMP > - I40E_GLPCI_LATCT > - I40E_GLTPH_CTRL > - I40E_GLPRT_BPRCH > - I40E_GLPRT_TDPC > - I40E_GLSCD_QUANTA > Also reading the register of I40E_GLPRT_TDPC is deleted as its definition is > deleted. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/i40e/i40e_register.h | 52 > > lib/librte_pmd_i40e/i40e_ethdev.c| 3 -- > 2 files changed, 26 insertions(+), 29 deletions(-) > > v2 changes: > Removed anything about Fortpark or FPGA as they shouldn't be there. > > diff --git a/lib/librte_pmd_i40e/i40e/i40e_register.h > b/lib/librte_pmd_i40e/i40e/i40e_register.h > index 888c3c3..c8a8d77 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_register.h > +++ b/lib/librte_pmd_i40e/i40e/i40e_register.h > @@ -318,6 +318,10 @@ POSSIBILITY OF SUCH DAMAGE. > #define I40E_PRTDCB_RUP2TC_UP6TC_MASK I40E_MASK(0x7, > I40E_PRTDCB_RUP2TC_UP6TC_SHIFT) #define > I40E_PRTDCB_RUP2TC_UP7TC_SHIFT 21 #define > I40E_PRTDCB_RUP2TC_UP7TC_MASK I40E_MASK(0x7, > I40E_PRTDCB_RUP2TC_UP7TC_SHIFT) > +#define I40E_PRTDCB_RUPTQ(_i) (0x00122400 + ((_i) * 32)) /* > _i=0...7 */ /* Reset: CORER */ > +#define I40E_PRTDCB_RUPTQ_MAX_INDEX7 > +#define I40E_PRTDCB_RUPTQ_RXQNUM_SHIFT 0 #define > +I40E_PRTDCB_RUPTQ_RXQNUM_MASK I40E_MASK(0x3FFF, > +I40E_PRTDCB_RUPTQ_RXQNUM_SHIFT) > #define I40E_PRTDCB_TC2PFC 0x001C0980 /* Reset: CORER */ > #define I40E_PRTDCB_TC2PFC_TC2PFC_SHIFT 0 #define > I40E_PRTDCB_TC2PFC_TC2PFC_MASK I40E_MASK(0xFF, > I40E_PRTDCB_TC2PFC_TC2PFC_SHIFT) @@ -429,6 +433,8 @@ POSSIBILITY > OF SUCH DAMAGE. > #define I40E_GLGEN_GPIO_CTL_OUT_DEFAULT_MASK I40E_MASK(0x1, > I40E_GLGEN_GPIO_CTL_OUT_DEFAULT_SHIFT) > #define I40E_GLGEN_GPIO_CTL_PHY_PIN_NAME_SHIFT 20 #define > I40E_GLGEN_GPIO_CTL_PHY_PIN_NAME_MASK I40E_MASK(0x3F, > I40E_GLGEN_GPIO_CTL_PHY_PIN_NAME_SHIFT) > +#define I40E_GLGEN_GPIO_CTL_PRT_BIT_MAP_SHIFT 26 > +#define I40E_GLGEN_GPIO_CTL_PRT_BIT_MAP_MASK I40E_MASK(0xF, > I40E_GLGEN_GPIO_CTL_PRT_BIT_MAP_SHIFT) > #define I40E_GLGEN_GPIO_SET 0x00088184 /* Reset: POR */ > #define I40E_GLGEN_GPIO_SET_GPIO_INDX_SHIFT 0 #define > I40E_GLGEN_GPIO_SET_GPIO_INDX_MASK I40E_MASK(0x1F, > I40E_GLGEN_GPIO_SET_GPIO_INDX_SHIFT) > @@ -492,7 +498,9 @@ POSSIBILITY OF SUCH DAMAGE. > #define I40E_GLGEN_MDIO_CTRL_CONTMDC_SHIFT 17 > #define I40E_GLGEN_MDIO_CTRL_CONTMDC_MASK I40E_MASK(0x1, > I40E_GLGEN_MDIO_CTRL_CONTMDC_SHIFT) > #define I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD1_SHIFT 18 -#define > I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD1_MASK I40E_MASK(0x3FFF, > I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD1_SHIFT) > +#define I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD1_MASK > I40E_MASK(0x7FF, > +I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD1_SHIFT) > +#define I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD0_SHIFT 29 #define > +I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD0_MASK I40E_MASK(0x7, > +I40E_GLGEN_MDIO_CTRL_LEGACY_RSVD0_SHIFT) > #define I40E_GLGEN_MDIO_I2C_SEL(_i)(0x000881C0 + ((_i) * 4)) > /* _i=0...3 */ /* Reset: POR */ > #define I40E_GLGEN_MDIO_I2C_SEL_MAX_INDEX 3 > #define I40E_GLGEN_MDIO_I2C_SEL_MDIO_I2C_SEL_SHIFT 0 @@ -556,9 > +564,6 @@ POSSIBILITY OF SUCH DAMAGE. > #define I40E_GLGEN_RSTCTL_GRSTDEL_MASK I40E_MASK(0x3F, > I40E_GLGEN_RSTCTL_GRSTDEL_SHIFT) > #define I40E_GLGEN_RSTCTL_ECC_RST_ENA_SHIFT 8 #define > I40E_GLGEN_RSTCTL_ECC_RST_ENA_MASK I40E_MASK(0x1, > I40E_GLGEN_RSTCTL_ECC_RST_ENA_SHIFT) > -#define I40E_GLGEN_RSTENA_EMP 0x000B818C /* Reset: POR > */ > -#define I40E_GLGEN_RSTENA_EMP_EMP_RST_ENA_SHIFT 0 -#define > I40E_GLGEN_RSTENA_EMP_EMP_RST_ENA_MASK I40E_MASK(0x1, > I40E_GLGEN_RSTENA_EMP_EMP_RST_ENA_SHIFT) > #define I40E_GLGEN_RTRIG 0x000B8190 /* Reset: CORER */ > #define I40E_GLGEN_RTRIG_CORER_SHIFT 0 > #define I40E_GLGEN_RTRIG_CORER_MASK I40E_MASK(0x1, > I40E_GLGEN_RTRIG_CORER_SHIFT) > @@ -1074,7 +1079,7 @@ POSSIBILITY OF SUCH DAMAGE. > #define I40E_PFINT_RATEN_INTERVAL_MASK I40E_MASK(0x3F, > I40E_PFINT_RATEN_INTERVAL_SHIFT) > #define I40E_PFINT_RATEN_INTRL_ENA_SHIFT 6 #define > I40E_PFINT_RATEN_INTRL_ENA_MASK I40E_MASK(0x1, > I40E_PFINT_RATEN_INTRL_ENA_SHIFT) > -#define I40E_PFINT_STAT_CTL0 0x00038400 /* Reset: PFR */ > +#define I40E_PFINT_STAT_CTL0 0x00038400 /* Reset: CORER > */ > #define I40E_PFINT_STAT_CTL0_OTHER_ITR_INDX_SHIFT 2 #define > I40E_PFINT_STAT_CTL0_OTHER_ITR_INDX_MASK I40E_MASK(0x3
[dpdk-dev] [PATCH v2 01/33] i40e: copyright update
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 01/33] i40e: copyright update > > Copyright is updated. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/Makefile | 2 +- > lib/librte_pmd_i40e/i40e/i40e_adminq.c | 2 +- > lib/librte_pmd_i40e/i40e/i40e_adminq.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_adminq_cmd.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_alloc.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_common.c | 2 +- > lib/librte_pmd_i40e/i40e/i40e_dcb.c| 2 +- > lib/librte_pmd_i40e/i40e/i40e_dcb.h| 2 +- > lib/librte_pmd_i40e/i40e/i40e_diag.c | 2 +- > lib/librte_pmd_i40e/i40e/i40e_diag.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_hmc.c| 2 +- > lib/librte_pmd_i40e/i40e/i40e_hmc.h| 2 +- > lib/librte_pmd_i40e/i40e/i40e_lan_hmc.c| 2 +- > lib/librte_pmd_i40e/i40e/i40e_lan_hmc.h| 2 +- > lib/librte_pmd_i40e/i40e/i40e_nvm.c| 2 +- > lib/librte_pmd_i40e/i40e/i40e_osdep.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_prototype.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_register.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_status.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_type.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_virtchnl.h | 2 +- > lib/librte_pmd_i40e/i40e_ethdev.c | 2 +- > lib/librte_pmd_i40e/i40e_ethdev.h | 2 +- > lib/librte_pmd_i40e/i40e_ethdev_vf.c | 2 +- > lib/librte_pmd_i40e/i40e_fdir.c| 2 +- > lib/librte_pmd_i40e/i40e_logs.h| 2 +- > lib/librte_pmd_i40e/i40e_pf.c | 2 +- > lib/librte_pmd_i40e/i40e_pf.h | 2 +- > lib/librte_pmd_i40e/i40e_rxtx.c| 2 +- > lib/librte_pmd_i40e/i40e_rxtx.h| 2 +- > 30 files changed, 30 insertions(+), 30 deletions(-) > > diff --git a/lib/librte_pmd_i40e/Makefile b/lib/librte_pmd_i40e/Makefile > index 64bab16..86be3f7 100644 > --- a/lib/librte_pmd_i40e/Makefile > +++ b/lib/librte_pmd_i40e/Makefile > @@ -1,6 +1,6 @@ > # BSD LICENSE > # > -# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > +# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. > # All rights reserved. > # > # Redistribution and use in source and binary forms, with or without > diff --git a/lib/librte_pmd_i40e/i40e/i40e_adminq.c > b/lib/librte_pmd_i40e/i40e/i40e_adminq.c > index e098ed6..e8e762f 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_adminq.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_adminq.c > @@ -1,6 +1,6 @@ > > /* > ** > > -Copyright (c) 2013 - 2014, Intel Corporation > +Copyright (c) 2013 - 2015, Intel Corporation > All rights reserved. > > Redistribution and use in source and binary forms, with or without diff --git > a/lib/librte_pmd_i40e/i40e/i40e_adminq.h > b/lib/librte_pmd_i40e/i40e/i40e_adminq.h > index ea611bd..a8c6afe 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_adminq.h > +++ b/lib/librte_pmd_i40e/i40e/i40e_adminq.h > @@ -1,6 +1,6 @@ > > /* > ** > > -Copyright (c) 2013 - 2014, Intel Corporation > +Copyright (c) 2013 - 2015, Intel Corporation > All rights reserved. > > Redistribution and use in source and binary forms, with or without diff --git > a/lib/librte_pmd_i40e/i40e/i40e_adminq_cmd.h > b/lib/librte_pmd_i40e/i40e/i40e_adminq_cmd.h > index 5ea9b7d..0fe9d1c 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_adminq_cmd.h > +++ b/lib/librte_pmd_i40e/i40e/i40e_adminq_cmd.h > @@ -1,6 +1,6 @@ > > /* > ** > > -Copyright (c) 2013 - 2014, Intel Corporation > +Copyright (c) 2013 - 2015, Intel Corporation > All rights reserved. > > Redistribution and use in source and binary forms, with or without diff --git > a/lib/librte_pmd_i40e/i40e/i40e_alloc.h > b/lib/librte_pmd_i40e/i40e/i40e_alloc.h > index 6e81cd5..38c2f65 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_alloc.h > +++ b/lib/librte_pmd_i40e/i40e/i40e_alloc.h > @@ -1,6 +1,6 @@ > > /* > ** > > -Copyright (c) 2013 - 2014, Intel Corporation > +Copyright (c) 2013 - 2015, Intel Corporation > All rights reserved. > > Redistribution and use in source and binary forms, with or without diff --git > a/lib/librte_pmd_i40e/i40e/i40e_common.c > b/lib/librte_pmd_i40e/i40e/i40e_common.c > index ffaa777..23f14c1 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_common.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_common.c > @@ -1,6 +1,6 @@ > > /* > ** > > -Copyright (c) 2013 - 2014, Intel
[dpdk-dev] [PATCH v2 02/33] i40e: disable setting of phy configuration
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 02/33] i40e: disable setting of phy configuration > > There was a known link issue on 40G ports on NVM version (FVL3E), when > setting phy configuration. As a workaround, setting of phy configuration > should be disabled. The impact is that the link cannot be forcedly configured, > which doesn't affect any feature functions. > The workaround can be removed when a formal fix is ready later. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/i40e_ethdev.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c > b/lib/librte_pmd_i40e/i40e_ethdev.c > index 40c90d7..49d1067 100644 > --- a/lib/librte_pmd_i40e/i40e_ethdev.c > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c > @@ -791,6 +791,10 @@ i40e_phy_conf_link(struct i40e_hw *hw, uint8_t > abilities, uint8_t force_speed) > I40E_LINK_SPEED_100MB; > int ret = -ENOTSUP; > > + /* Skip it on 40G interfaces, as a workaround for the link issue */ > + if (i40e_is_40G_device(hw->device_id)) > + return I40E_SUCCESS; > + > status = i40e_aq_get_phy_capabilities(hw, false, false, &phy_ab, > NULL); > if (status) > -- > 1.8.1.4
[dpdk-dev] [PATCH v2 07/33] i40e: replacement of 'i40e_debug_read_register()'
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 07/33] i40e: replacement of > 'i40e_debug_read_register()' > > As base driver provides 'i40e_aq_debug_read_register()', the same > functional interface of 'i40e_debug_read_register()' can be replaced. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/i40e/i40e_common.c| 35 > +++ > lib/librte_pmd_i40e/i40e/i40e_prototype.h | 3 +++ > lib/librte_pmd_i40e/i40e_ethdev.c | 22 ++- > 3 files changed, 40 insertions(+), 20 deletions(-) > > diff --git a/lib/librte_pmd_i40e/i40e/i40e_common.c > b/lib/librte_pmd_i40e/i40e/i40e_common.c > index db24b36..4722614 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_common.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_common.c > @@ -2358,6 +2358,41 @@ enum i40e_status_code > i40e_aq_send_msg_to_vf(struct i40e_hw *hw, u16 vfid, } > > /** > + * i40e_aq_debug_read_register > + * @hw: pointer to the hw struct > + * @reg_addr: register address > + * @reg_val: register value > + * @cmd_details: pointer to command details structure or NULL > + * > + * Read the register using the admin queue commands **/ enum > +i40e_status_code i40e_aq_debug_read_register(struct i40e_hw *hw, > + u32 reg_addr, u64 *reg_val, > + struct i40e_asq_cmd_details *cmd_details) { > + struct i40e_aq_desc desc; > + struct i40e_aqc_debug_reg_read_write *cmd_resp = > + (struct i40e_aqc_debug_reg_read_write > *)&desc.params.raw; > + enum i40e_status_code status; > + > + if (reg_val == NULL) > + return I40E_ERR_PARAM; > + > + i40e_fill_default_direct_cmd_desc(&desc, > i40e_aqc_opc_debug_read_reg); > + > + cmd_resp->address = CPU_TO_LE32(reg_addr); > + > + status = i40e_asq_send_command(hw, &desc, NULL, 0, cmd_details); > + > + if (status == I40E_SUCCESS) { > + *reg_val = ((u64)LE32_TO_CPU(cmd_resp->value_high) << > 32) | > +(u64)LE32_TO_CPU(cmd_resp->value_low); > + } > + > + return status; > +} > + > +/** > * i40e_aq_debug_write_register > * @hw: pointer to the hw struct > * @reg_addr: register address > diff --git a/lib/librte_pmd_i40e/i40e/i40e_prototype.h > b/lib/librte_pmd_i40e/i40e/i40e_prototype.h > index 755733d..2165ac8 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_prototype.h > +++ b/lib/librte_pmd_i40e/i40e/i40e_prototype.h > @@ -91,6 +91,9 @@ enum i40e_status_code > i40e_aq_get_firmware_version(struct i40e_hw *hw, enum > i40e_status_code i40e_aq_debug_write_register(struct i40e_hw *hw, > u32 reg_addr, u64 reg_val, > struct i40e_asq_cmd_details *cmd_details); > +enum i40e_status_code i40e_aq_debug_read_register(struct i40e_hw *hw, > + u32 reg_addr, u64 *reg_val, > + struct i40e_asq_cmd_details *cmd_details); > enum i40e_status_code i40e_aq_set_phy_debug(struct i40e_hw *hw, u8 > cmd_flags, > struct i40e_asq_cmd_details *cmd_details); > enum i40e_status_code i40e_aq_set_default_vsi(struct i40e_hw *hw, u16 > vsi_id, diff --git a/lib/librte_pmd_i40e/i40e_ethdev.c > b/lib/librte_pmd_i40e/i40e_ethdev.c > index 3d45429..96700e4 100644 > --- a/lib/librte_pmd_i40e/i40e_ethdev.c > +++ b/lib/librte_pmd_i40e/i40e_ethdev.c > @@ -5623,25 +5623,6 @@ i40e_pctype_to_flowtype(enum > i40e_filter_pctype pctype) > return flowtype_table[pctype]; > } > > -static int > -i40e_debug_read_register(struct i40e_hw *hw, uint32_t addr, uint64_t *val) > -{ > - struct i40e_aq_desc desc; > - enum i40e_status_code status; > - > - i40e_fill_default_direct_cmd_desc(&desc, > i40e_aqc_opc_debug_read_reg); > - desc.params.internal.param1 = rte_cpu_to_le_32(addr); > - status = i40e_asq_send_command(hw, &desc, NULL, 0, NULL); > - if (status < 0) > - return status; > - > - *val = ((uint64_t)(rte_le_to_cpu_32(desc.params.internal.param2)) > << > - (CHAR_BIT * sizeof(uint32_t))) + > - > rte_le_to_cpu_32(desc.params.internal.param3); > - > - return status; > -} > - > /* > * On X710, performance number is far from the expectation on recent > firmware > * versions; on XL710, performance number is also far from the expectation > on @@ -5692,7 +5673,8 @@ i40e_configure_registers(struct i40e_hw *hw) > > I40E_GL_SWR_PM_UP_THR_EF_VALUE; > } > > - ret = i40e_debug_read_register(hw, reg_table[i].addr, ®); > + ret = i40e_aq_debug_read_register(hw, reg_table[i].addr, > + ®, NULL); > if (r
[dpdk-dev] [PATCH v2 15/33] i40e/base: replacement of DEBUGOUT() with i40e_debug()
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 15/33] i40e/base: replacement of DEBUGOUT() with > i40e_debug() > > To support better debug information printing, all DEBUGOUT() are replaced > by i40e_debug(). In addition, the NVM update state strings are added in > debug information. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/Makefile| 1 + > lib/librte_pmd_i40e/i40e/i40e_nvm.c | 77 > - > 2 files changed, 59 insertions(+), 19 deletions(-) > > diff --git a/lib/librte_pmd_i40e/Makefile b/lib/librte_pmd_i40e/Makefile > index 75b5120..22f0716 100644 > --- a/lib/librte_pmd_i40e/Makefile > +++ b/lib/librte_pmd_i40e/Makefile > @@ -68,6 +68,7 @@ CFLAGS_BASE_DRIVER += -Wno-missing-field- > initializers CFLAGS_BASE_DRIVER += -Wno-pointer-to-int-cast > CFLAGS_BASE_DRIVER += -Wno-format-nonliteral CFLAGS_BASE_DRIVER += > -Wno-format-security > +CFLAGS_BASE_DRIVER += -Wno-unused-variable > > ifeq ($(shell test $(GCC_VERSION) -ge 44 && echo 1), 1) > CFLAGS_BASE_DRIVER += -Wno-unused-but-set-variable diff --git > a/lib/librte_pmd_i40e/i40e/i40e_nvm.c > b/lib/librte_pmd_i40e/i40e/i40e_nvm.c > index 55d0bed..f1a1e88 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_nvm.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_nvm.c > @@ -82,7 +82,7 @@ enum i40e_status_code i40e_init_nvm(struct i40e_hw > *hw) > } else { /* Blank programming mode */ > nvm->blank_nvm_mode = true; > ret_code = I40E_ERR_NVM_BLANK_MODE; > - DEBUGOUT("NVM init error: unsupported blank mode.\n"); > + i40e_debug(hw, I40E_DEBUG_NVM, "NVM init error: > unsupported blank > +mode.\n"); > } > > return ret_code; > @@ -186,7 +186,7 @@ static enum i40e_status_code > i40e_poll_sr_srctl_done_bit(struct i40e_hw *hw) > i40e_usec_delay(5); > } > if (ret_code == I40E_ERR_TIMEOUT) > - DEBUGOUT("Done bit in GLNVM_SRCTL not set"); > + i40e_debug(hw, I40E_DEBUG_NVM, "Done bit in > GLNVM_SRCTL not set"); > return ret_code; > } > > @@ -705,6 +705,22 @@ STATIC inline u8 > i40e_nvmupd_get_transaction(u32 val) > return (u8)((val & I40E_NVM_TRANS_MASK) >> > I40E_NVM_TRANS_SHIFT); } > > +STATIC const char *i40e_nvm_update_state_str[] = { > + "I40E_NVMUPD_INVALID", > + "I40E_NVMUPD_READ_CON", > + "I40E_NVMUPD_READ_SNT", > + "I40E_NVMUPD_READ_LCB", > + "I40E_NVMUPD_READ_SA", > + "I40E_NVMUPD_WRITE_ERA", > + "I40E_NVMUPD_WRITE_CON", > + "I40E_NVMUPD_WRITE_SNT", > + "I40E_NVMUPD_WRITE_LCB", > + "I40E_NVMUPD_WRITE_SA", > + "I40E_NVMUPD_CSUM_CON", > + "I40E_NVMUPD_CSUM_SA", > + "I40E_NVMUPD_CSUM_LCB", > +}; > + > /** > * i40e_nvmupd_command - Process an NVM update command > * @hw: pointer to hardware structure > @@ -740,6 +756,8 @@ enum i40e_status_code > i40e_nvmupd_command(struct i40e_hw *hw, > > default: > /* invalid state, should never happen */ > + i40e_debug(hw, I40E_DEBUG_NVM, > +"NVMUPD: no such state %d\n", hw- > >nvmupd_state); > status = I40E_NOT_SUPPORTED; > *perrno = -ESRCH; > break; > @@ -900,6 +918,9 @@ STATIC enum i40e_status_code > i40e_nvmupd_state_reading(struct i40e_hw *hw, > break; > > default: > + i40e_debug(hw, I40E_DEBUG_NVM, > +"NVMUPD: bad cmd %s in reading state.\n", > +i40e_nvm_update_state_str[upd_cmd]); > status = I40E_NOT_SUPPORTED; > *perrno = -ESRCH; > break; > @@ -1035,8 +1056,9 @@ STATIC enum i40e_nvmupd_cmd > i40e_nvmupd_validate_command(struct i40e_hw *hw, > /* limits on data size */ > if ((cmd->data_size < 1) || > (cmd->data_size > I40E_NVMUPD_MAX_DATA)) { > - DEBUGOUT1("i40e_nvmupd_validate_command > data_size %d\n", > - cmd->data_size); > + i40e_debug(hw, I40E_DEBUG_NVM, > +"i40e_nvmupd_validate_command > data_size %d\n", > +cmd->data_size); > *perrno = -EFAULT; > return I40E_NVMUPD_INVALID; > } > @@ -1088,12 +1110,16 @@ STATIC enum i40e_nvmupd_cmd > i40e_nvmupd_validate_command(struct i40e_hw *hw, > } > break; > } > + i40e_debug(hw, I40E_DEBUG_NVM, "%s state %d > nvm_release_on_hold %d\n", > +i40e_nvm_update_state_str[upd_cmd], > +hw->nvmupd_state, > +hw->aq.nvm_release_on_done); > > if (upd_cmd == I40E_NVMUPD_INVALID) { > *perrno = -EFAULT; > - DEBUGOUT2( > -
[dpdk-dev] [PATCH v2 05/33] i40e/base: support of building both PF and VF driver together
Acked-by: Jingjing Wu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 05/33] i40e/base: support of building both PF and VF > driver together > > Macros of PF_DRIVER, VF_DRIVER and INTEGRATED_VF were defined to > support building both PF and VF driver together. PF_DRIVER needs to be > defined if a build is for PF only, while VF_DRIVER for VF only. PF_DRIVER, > VF_DRIVER and INTEGRATED_VF are all needed for building both PF and VF > driver together. > > Signed-off-by: Helin Zhang > --- > lib/librte_pmd_i40e/Makefile | 2 +- > lib/librte_pmd_i40e/i40e/i40e_adminq.c| 20 ++-- > lib/librte_pmd_i40e/i40e/i40e_common.c| 6 -- > lib/librte_pmd_i40e/i40e/i40e_prototype.h | 4 ++-- > 4 files changed, 17 insertions(+), 15 deletions(-) > > diff --git a/lib/librte_pmd_i40e/Makefile b/lib/librte_pmd_i40e/Makefile > index 86be3f7..75b5120 100644 > --- a/lib/librte_pmd_i40e/Makefile > +++ b/lib/librte_pmd_i40e/Makefile > @@ -37,7 +37,7 @@ include $(RTE_SDK)/mk/rte.vars.mk LIB = > librte_pmd_i40e.a > > CFLAGS += -O3 > -CFLAGS += $(WERROR_FLAGS) > +CFLAGS += $(WERROR_FLAGS) -DPF_DRIVER -DVF_DRIVER - > DINTEGRATED_VF > > EXPORT_MAP := rte_pmd_i40e_version.map > > diff --git a/lib/librte_pmd_i40e/i40e/i40e_adminq.c > b/lib/librte_pmd_i40e/i40e/i40e_adminq.c > index e8e762f..bbc6b65 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_adminq.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_adminq.c > @@ -37,7 +37,7 @@ POSSIBILITY OF SUCH DAMAGE. > #include "i40e_adminq.h" > #include "i40e_prototype.h" > > -#ifndef VF_DRIVER > +#ifdef PF_DRIVER > /** > * i40e_is_nvm_update_op - return true if this is an NVM update operation > * @desc: API request descriptor > @@ -48,7 +48,7 @@ STATIC INLINE bool i40e_is_nvm_update_op(struct > i40e_aq_desc *desc) > desc->opcode == > CPU_TO_LE16(i40e_aqc_opc_nvm_update)); > } > > -#endif /* VF_DRIVER */ > +#endif /* PF_DRIVER */ > /** > * i40e_adminq_init_regs - Initialize AdminQ registers > * @hw: pointer to the hardware structure @@ -559,7 +559,7 @@ enum > i40e_status_code i40e_shutdown_arq(struct i40e_hw *hw) enum > i40e_status_code i40e_init_adminq(struct i40e_hw *hw) { > enum i40e_status_code ret_code; > -#ifndef VF_DRIVER > +#ifdef PF_DRIVER > u16 eetrack_lo, eetrack_hi; > int retry = 0; > #endif > @@ -593,7 +593,7 @@ enum i40e_status_code i40e_init_adminq(struct > i40e_hw *hw) > if (ret_code != I40E_SUCCESS) > goto init_adminq_free_asq; > > -#ifndef VF_DRIVER > +#ifdef PF_DRIVER > /* There are some cases where the firmware may not be quite > ready >* for AdminQ operations, so we retry the AdminQ setup a few times >* if we see timeouts in this first AQ call. > @@ -633,13 +633,13 @@ enum i40e_status_code i40e_init_adminq(struct > i40e_hw *hw) > > I40E_HMC_PROFILE_DEFAULT, > 0, > NULL); > +#endif /* PF_DRIVER */ > ret_code = I40E_SUCCESS; > > -#endif /* VF_DRIVER */ > /* success! */ > goto init_adminq_exit; > > -#ifndef VF_DRIVER > +#ifdef PF_DRIVER > init_adminq_free_arq: > i40e_shutdown_arq(hw); > #endif > @@ -772,7 +772,7 @@ enum i40e_status_code > i40e_asq_send_command(struct i40e_hw *hw, > goto asq_send_command_exit; > } > > -#ifndef VF_DRIVER > +#ifdef PF_DRIVER > if (i40e_is_nvm_update_op(desc) && hw->aq.nvm_busy) { > i40e_debug(hw, I40E_DEBUG_AQ_MESSAGE, "AQTX: NVM > busy.\n"); > status = I40E_ERR_NVM; > @@ -931,11 +931,11 @@ enum i40e_status_code > i40e_asq_send_command(struct i40e_hw *hw, > status = I40E_ERR_ADMIN_QUEUE_TIMEOUT; > } > > -#ifndef VF_DRIVER > +#ifdef PF_DRIVER > if (!status && i40e_is_nvm_update_op(desc)) > hw->aq.nvm_busy = true; > > -#endif /* VF_DRIVER */ > +#endif /* PF_DRIVER */ > asq_send_command_error: > i40e_release_spinlock(&hw->aq.asq_spinlock); > asq_send_command_exit: > @@ -1053,7 +1053,7 @@ clean_arq_element_out: > *pending = (ntc > ntu ? hw->aq.arq.count : 0) + (ntu - ntc); > i40e_release_spinlock(&hw->aq.arq_spinlock); > > -#ifndef VF_DRIVER > +#ifdef PF_DRIVER > if (i40e_is_nvm_update_op(&e->desc)) { > hw->aq.nvm_busy = false; > if (hw->aq.nvm_release_on_done) { > diff --git a/lib/librte_pmd_i40e/i40e/i40e_common.c > b/lib/librte_pmd_i40e/i40e/i40e_common.c > index 23f14c1..491ffa8 100644 > --- a/lib/librte_pmd_i40e/i40e/i40e_common.c > +++ b/lib/librte_pmd_i40e/i40e/i40e_common.c > @@ -43,7 +43,7 @@ POSSIBILITY OF SUCH DAMAGE. > * This function sets the mac type of the adapter based on the > * vendor ID and
[dpdk-dev] Beyond DPDK 2.0
On 5/7/15, 9:05 AM, "Avi Kivity" wrote: >On 05/07/2015 06:49 PM, Wiles, Keith wrote: >> >> On 5/7/15, 8:33 AM, "Avi Kivity" wrote: >> >>> On 05/07/2015 06:27 PM, Wiles, Keith wrote: On 5/7/15, 7:02 AM, "Avi Kivity" wrote: > On Wed, Apr 22, 2015 at 6:11 PM, O'Driscoll, Tim > > wrote: > >> Does anybody have any input or comments on this? >> >> >>> -Original Message- >>> From: O'Driscoll, Tim >>> Sent: Thursday, April 16, 2015 11:39 AM >>> To: dev at dpdk.org >>> Subject: Beyond DPDK 2.0 >>> >>> Following the launch of DPDK by Intel as an internal development >>> project, the launch of dpdk.org by 6WIND in 2013, and the first >>>DPDK >> RPM >>> packages for Fedora in 2014, 6WIND, Red Hat and Intel would like to >>> prepare for future releases after DPDK 2.0 by starting a discussion >>> on >>> its evolution. Anyone is welcome to join this initiative. >>> >>> Since then, the project has grown significantly: >>> -The number of commits and mailing list posts has increased >>> steadily. >>> -Support has been added for a wide range of new NICs (Mellanox >>> support submitted by 6WIND, Cisco VIC, Intel i40e and fm10k etc.). >>> -DPDK is now supported on multiple architectures (IBM Power >> support >>> in DPDK 1.8, Tile support submitted by EZchip but not yet reviewed >>>or >>> applied). >>> >>> While this is great progress, we need to make sure that the project >>> is >>> structured in a way that enables it to continue to grow. To achieve >>> this, 6WIND, Red Hat and Intel would like to start a discussion >>>about >>> the future of the project, so that we can agree and establish >> processes >>> that satisfy the needs of the current and future DPDK community. >>> >>> We're very interested in hearing the views of everybody in the >>> community. In addition to debate on the mailing list, we'll also >>> schedule community calls to discuss this. >>> >>> >>> Project Goals >>> - >>> >>> Some topics to be considered for the DPDK project include: >>> -Project Charter: The charter of the DPDK project should be >> clearly >>> defined, and should explain the limits of DPDK (what it does and >>>does >>> not cover). This does not mean that we would be stuck with a >>>singular >>> charter for all time, but the direction and intent of the project >> should >>> be well understood. > One problem we've seen with dpdk is that it is a framework, not a > library: > it wants to create threads, manage memory, and generally take over. > This > is a problem for us, as we are writing a framework (seastar, [1]) and > need > to create threads, manage memory, and generally take over ourselves. > > Perhaps dpdk can be split into two layers, a library layer that only > provides mechanisms, and a framework layer that glues together those > mechanisms and applies a policy, trading in generality for ease of >use. The DPDK system is somewhat divided now between the EAL, PMDS and utility functions like malloc/rings/? The problem I see is the PMDs need a framework to be usable and the EAL plus the ethdev layers provide that support today. Setting up and initializing the DPDK system is pretty clean just call the EAL init routines along with the pool creates and the basic configs for the PMDs/hardware. Once the system is inited one can create new threads and not requiring anyone to use DPDK launch routines. Maybe I am not understanding your needs can you explain more? >>> An initialization routine that accepts argc/argv can hardly be called >>> clean. >> You want a config file or structure initialization design? If that is >>the >> case you can contribute that support as another way to initialize DPDK. > >A config file would be even worse. But we are discussing why >dpdk-as-a-framework is detrimental, not new ways for me to contribute. In a way you stated argc/argv was not a clean, I was only suggesting (more I was asking) what you would like to see? The contribute part was just an example of how you or anyone can help make DPDK better. I wanted to understand why argc/argv was not a clan way for your needs. > >>> In seastar, we have our own malloc() (since seastar is sharded we can >>> provide a faster thread-unsafe malloc implementation). We also have >>>our >>> own threading, and since dpdk is an optional component in seastar, dpdk >>> support requires code duplication. >> DPDK replies one the huge page support for allocation to get the >> performance, do you also not require huge page support. > >Sorry, is this a question? Please rephrase. Sorry, auto correct got me and trying to answer quickly before a meeting. DPDK uses huge pages to get the best
[dpdk-dev] Beyond DPDK 2.0
Hi Luke On 5/7/15, 8:34 AM, "Luke Gorrie" wrote: >On 7 May 2015 at 16:02, Avi Kivity wrote: > >> One problem we've seen with dpdk is that it is a framework, not a >>library: >> it wants to create threads, manage memory, and generally take over. >>This >> is a problem for us, as we are writing a framework (seastar, [1]) and >>need >> to create threads, manage memory, and generally take over ourselves. >> > >That is also broadly why we don't currently use DPDK in Snabb Switch [1]. > >There is a bunch of functionality in DPDK that would be tempting for us to >use and contribute back to: device drivers, SIMD routines, data >structures, >and so on. I think that we would do this if they were available piecemeal >as stand-alone libi40e, libsimd, liblpn, etc. > >The whole DPDK platform/framework is too much for us to adopt though. Some >aspects of it are in conflict with our goals and it is an all-or-nothing >proposition. So for now we are staying self-sufficient even when it means >writing our own ixgbe replacement, etc. > >Having said that we are able to share code that doesn't require linking >into our address space e.g. vhost-user and potentially the bifurcated >drivers in the future. That seems like a nice direction for things to be >going in and a way to collaborate even without our directly linking with >DPDK. Would the shared library support in DPDK be useful here? I know it still links in a dynamic way. I believe DPDK is much like your snabbswitch as it provides a basic system to run networking applications, in your case a vSwitch like design. The design has some parts that are standalone, but to be effective they require other parts of DPDK to work correctly. If you have some suggestion as to how DPDK could be split up and maintain its features and performance I would like to understand how. Regards, ++Keith > >[1] https://github.com/lukego/snabbswitch/blob/README/README.md
[dpdk-dev] Beyond DPDK 2.0
On 8 May 2015 at 06:16, Wiles, Keith wrote: > The PMDs or drivers would not be useful without DPDK MBUFS IMO > Surprisingly perhaps, I would find them very useful. To me there are two parts to a driver: the hardware setup and the transmit/receive. The hardware setup is complex and generic. You have to read a thousand-page data sheet and then write code to initialize the hardware, setup queues, enable promisc/multicast, enable features you want like vmdq or flow director, and so on. You need to accumulate workarounds for hard-to-test problems like cards being discovered with unsuitable values in their EEPROM. There is not much intellectual value in this code being written more than once. I would like to see this hardware setup code shared between many projects. That code does not depend on a specific mbuf struct. Sharing could be done with an embeddable PMD library, with a bifurcated driver in the kernel, with the SR-IOV PF/VF model, or surely other ways too. These all have limited applicability today. The transmit/receive part, on the other hand, seems very application-dependent. This part depends on the specific mbuf struct and the way you are developing your application around it. You will need to write code to suit your design for using scatter/gather, allowed sizes of individual buffers, the granularity at which you are keeping track of checksum validity, how you use TSO/LRO, how you use interrupts, how you batch work together, and so on. This is easy or hard depending on how simple or complex the application is. I am not so interested in sharing this code. I think that different applications will legitimately have different designs - including mbuf structs - and they all need code that suits their own design. I think there is a lot of value in people being creative in these areas and trying different things. So while Avi might only mean that he wants to allocate the bytes for his mbufs himself, on our side we want to design our own mbuf struct. The cost of that today is to write our own device drivers from scratch but for now that seems justified. Going forward if there were a simpler mechanism that reduced our workload and gave us access to more hardware - libixgbe, libi40e, etc - that would be extremely interesting to us. I suppose that another background question is whether the DPDK community are chiefly concerned with advancing DPDK as a platform and a brand or are broadly keen to develop and share code that is useful in diverse networking projects. (Is this whole discussion off-topic for dpdk-devel?) This is one of the many reasons why I would love to use parts of DPDK but do not want to use all of it. (We also allocate our HugeTLBs differently, etc, because we have different priorities.)
[dpdk-dev] [PATCH v2 00/33] i40e base driver update
Acked-By: Jijiang Liu > -Original Message- > From: Zhang, Helin > Sent: Thursday, April 30, 2015 11:03 PM > To: dev at dpdk.org > Cc: Cao, Min; Xu, Qian Q; Wu, Jingjing; Liu, Jijiang; Kenguva, Monica; Patel, > Rashmin N; Murray, Steven J; Nelson, Shannon; Zhang, Helin > Subject: [PATCH v2 00/33] i40e base driver update > > To support firmware version 'FVL3E', i40e base driver should be updated. > Together with necessary modifications to i40e Poll Mode Driver, it mainly > includes the base driver update which contains additional enhancements, > fixes, changes for future use and so on. The details are listed as follows. > > v2 changes: > Removed anything about Fortpark or FPGA as they shouldn't be there. > Removed anything specifically for Solaris as they are not needed. > Split patches into smaller per fixes as suggested. > > Helin Zhang (33): > i40e: copyright update > i40e: disable setting of phy configuration > i40e: adjustment of register definitions and relevant > i40e/base: rename 'err' to 'perrno' > i40e/base: support of building both PF and VF driver together > i40e/base: support of CEE DCBX on recent firmware versions > i40e: replacement of 'i40e_debug_read_register()' > i40e/base: rework of 'i40e_hmc_get_object_va' > i40e/base: update of shadow RAM read/write functions > i40e/base: catch NVM write semaphore timeout and retry > i40e/base: check for AQ timeout in aq_rc decode > i40e/base: fix up NVM update sm error handling > i40e/base: enhancement of polling NVM semaphore > i40e/base: enhancements of NVM checksum calculation > i40e/base: replacement of DEBUGOUT() with i40e_debug() > i40e/base: add fw build info to AQ data > i40e/base: define and use i40e_is_vf() > i40e/base: grab NVM devstarter version not image version > i40e/base: enhancements on adminq init and sending asq command > i40e/base: i40e_aq_get_link_info() should be used directly > i40e/base: add new interfaces for future use > i40e/base: update of get/set LED functions > i40e/base: clean up sparse complaint in i40e_debug_aq > i40e/base: get pf_id from HW rather than PCI function > i40e/base: find partition_id in npar mode, and disable FCOE by default > i40e/base: Reassign incorrect PHY type as a workaround for a FW issue > i40e/base: add AOC phy types to case statement in get_media_type > i40e/base: support for iSCSI capability > i40e/base: set FLAG_RD when sending driver version to FW > i40e/base: future proof some sizeof calls > i40e/base: add more virtual channel operations for future use > i40e/base: rework of structures and macros for future use > i40e/base: modifications for future use > > lib/librte_pmd_i40e/Makefile | 5 +- > lib/librte_pmd_i40e/i40e/i40e_adminq.c | 48 +-- > lib/librte_pmd_i40e/i40e/i40e_adminq.h | 15 +- > lib/librte_pmd_i40e/i40e/i40e_adminq_cmd.h | 183 +- > lib/librte_pmd_i40e/i40e/i40e_alloc.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_common.c | 535 > +-- > lib/librte_pmd_i40e/i40e/i40e_dcb.c| 263 +- > lib/librte_pmd_i40e/i40e/i40e_dcb.h| 22 +- > lib/librte_pmd_i40e/i40e/i40e_diag.c | 2 +- > lib/librte_pmd_i40e/i40e/i40e_diag.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_hmc.c| 2 +- > lib/librte_pmd_i40e/i40e/i40e_hmc.h| 2 +- > lib/librte_pmd_i40e/i40e/i40e_lan_hmc.c| 33 +- > lib/librte_pmd_i40e/i40e/i40e_lan_hmc.h| 2 +- > lib/librte_pmd_i40e/i40e/i40e_nvm.c| 555 > ++--- > lib/librte_pmd_i40e/i40e/i40e_osdep.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_prototype.h | 37 +- > lib/librte_pmd_i40e/i40e/i40e_register.h | 54 +-- > lib/librte_pmd_i40e/i40e/i40e_status.h | 2 +- > lib/librte_pmd_i40e/i40e/i40e_type.h | 97 +++-- > lib/librte_pmd_i40e/i40e/i40e_virtchnl.h | 43 ++- > lib/librte_pmd_i40e/i40e_ethdev.c | 31 +- > lib/librte_pmd_i40e/i40e_ethdev.h | 2 +- > lib/librte_pmd_i40e/i40e_ethdev_vf.c | 2 +- > lib/librte_pmd_i40e/i40e_fdir.c| 2 +- > lib/librte_pmd_i40e/i40e_logs.h| 2 +- > lib/librte_pmd_i40e/i40e_pf.c | 2 +- > lib/librte_pmd_i40e/i40e_pf.h | 2 +- > lib/librte_pmd_i40e/i40e_rxtx.c| 2 +- > lib/librte_pmd_i40e/i40e_rxtx.h| 2 +- > 30 files changed, 1513 insertions(+), 440 deletions(-) > > -- > 1.8.1.4
[dpdk-dev] [PULL REQUEST] i40e base driver update
The following changes since commit cddae880b69155f76efa3241d02437fc69fade45: ixgbe: use scattered Rx with bulk allocation (2015-05-07 19:19:18 +0200) are available in the git repository at: helin at dpdk.org:dpdk-i40e-next.git master for you to fetch changes up to c73e796e3b1c15ac8ae66a0c18877c7fd966b2d4: i40e/base: modifications for future use (2015-05-08 02:07:06 -0400) Helin Zhang (33): i40e: copyright update i40e: disable setting of phy configuration i40e: adjustment of register definitions and relevant i40e/base: rename 'err' to 'perrno' i40e/base: support of building both PF and VF driver together i40e/base: support of CEE DCBX on recent firmware versions i40e: replacement of 'i40e_debug_read_register()' i40e/base: rework of 'i40e_hmc_get_object_va' i40e/base: update of shadow RAM read/write functions i40e/base: catch NVM write semaphore timeout and retry i40e/base: check for AQ timeout in aq_rc decode i40e/base: fix up NVM update sm error handling i40e/base: enhancement of polling NVM semaphore i40e/base: enhancements of NVM checksum calculation i40e/base: replacement of DEBUGOUT() with i40e_debug() i40e/base: add fw build info to AQ data i40e/base: define and use i40e_is_vf() i40e/base: grab NVM devstarter version not image version i40e/base: enhancements on adminq init and sending asq command i40e/base: i40e_aq_get_link_info() should be used directly i40e/base: add new interfaces for future use i40e/base: update of get/set LED functions i40e/base: clean up sparse complaint in i40e_debug_aq i40e/base: get pf_id from HW rather than PCI function i40e/base: find partition_id in npar mode, and disable FCOE by default i40e/base: Reassign incorrect PHY type as a workaround for a FW issue i40e/base: add AOC phy types to case statement in get_media_type i40e/base: support for iSCSI capability i40e/base: set FLAG_RD when sending driver version to FW i40e/base: future proof some sizeof calls i40e/base: add more virtual channel operations for future use i40e/base: rework of structures and macros for future use i40e/base: modifications for future use lib/librte_pmd_i40e/Makefile | 5 +- lib/librte_pmd_i40e/i40e/i40e_adminq.c | 48 +-- lib/librte_pmd_i40e/i40e/i40e_adminq.h | 15 +- lib/librte_pmd_i40e/i40e/i40e_adminq_cmd.h | 183 +- lib/librte_pmd_i40e/i40e/i40e_alloc.h | 2 +- lib/librte_pmd_i40e/i40e/i40e_common.c | 535 +-- lib/librte_pmd_i40e/i40e/i40e_dcb.c| 263 +- lib/librte_pmd_i40e/i40e/i40e_dcb.h| 22 +- lib/librte_pmd_i40e/i40e/i40e_diag.c | 2 +- lib/librte_pmd_i40e/i40e/i40e_diag.h | 2 +- lib/librte_pmd_i40e/i40e/i40e_hmc.c| 2 +- lib/librte_pmd_i40e/i40e/i40e_hmc.h| 2 +- lib/librte_pmd_i40e/i40e/i40e_lan_hmc.c| 33 +- lib/librte_pmd_i40e/i40e/i40e_lan_hmc.h| 2 +- lib/librte_pmd_i40e/i40e/i40e_nvm.c| 555 ++--- lib/librte_pmd_i40e/i40e/i40e_osdep.h | 2 +- lib/librte_pmd_i40e/i40e/i40e_prototype.h | 37 +- lib/librte_pmd_i40e/i40e/i40e_register.h | 54 +-- lib/librte_pmd_i40e/i40e/i40e_status.h | 2 +- lib/librte_pmd_i40e/i40e/i40e_type.h | 97 +++-- lib/librte_pmd_i40e/i40e/i40e_virtchnl.h | 43 ++- lib/librte_pmd_i40e/i40e_ethdev.c | 31 +- lib/librte_pmd_i40e/i40e_ethdev.h | 2 +- lib/librte_pmd_i40e/i40e_ethdev_vf.c | 2 +- lib/librte_pmd_i40e/i40e_fdir.c| 2 +- lib/librte_pmd_i40e/i40e_logs.h| 2 +- lib/librte_pmd_i40e/i40e_pf.c | 2 +- lib/librte_pmd_i40e/i40e_pf.h | 2 +- lib/librte_pmd_i40e/i40e_rxtx.c| 2 +- lib/librte_pmd_i40e/i40e_rxtx.h| 2 +- 30 files changed, 1513 insertions(+), 440 deletions(-)
[dpdk-dev] Beyond DPDK 2.0
On Fri, May 08, 2015 at 07:29:39AM +0200, Luke Gorrie wrote: > On 8 May 2015 at 06:16, Wiles, Keith wrote: > > > The PMDs or drivers would not be useful without DPDK MBUFS IMO > > > > Surprisingly perhaps, I would find them very useful. > > To me there are two parts to a driver: the hardware setup and the > transmit/receive. > > The hardware setup is complex and generic. You have to read a thousand-page > data sheet and then write code to initialize the hardware, setup queues, > enable promisc/multicast, enable features you want like vmdq or flow > director, and so on. You need to accumulate workarounds for hard-to-test > problems like cards being discovered with unsuitable values in their > EEPROM. There is not much intellectual value in this code being written > more than once. For the Intel NIC drivers, the hardware setup part used in DPDK is based off the other Intel drivers for other OS's. The code you are interested in should therefore be contained within the subfolders off each individual PMD. As you point out below, the mbuf specific part is only present in the files in the top-level PMD folder with the DPDK-specific RX/TX and queue setup routines. Regards, /Bruce > > I would like to see this hardware setup code shared between many projects. > That code does not depend on a specific mbuf struct. Sharing could be done > with an embeddable PMD library, with a bifurcated driver in the kernel, > with the SR-IOV PF/VF model, or surely other ways too. These all have > limited applicability today. > > The transmit/receive part, on the other hand, seems very > application-dependent. This part depends on the specific mbuf struct and > the way you are developing your application around it. You will need to > write code to suit your design for using scatter/gather, allowed sizes of > individual buffers, the granularity at which you are keeping track of > checksum validity, how you use TSO/LRO, how you use interrupts, how you > batch work together, and so on. This is easy or hard depending on how > simple or complex the application is. > > I am not so interested in sharing this code. I think that different > applications will legitimately have different designs - including mbuf > structs - and they all need code that suits their own design. I think there > is a lot of value in people being creative in these areas and trying > different things. > > So while Avi might only mean that he wants to allocate the bytes for his > mbufs himself, on our side we want to design our own mbuf struct. The cost > of that today is to write our own device drivers from scratch but for now > that seems justified. Going forward if there were a simpler mechanism that > reduced our workload and gave us access to more hardware - libixgbe, > libi40e, etc - that would be extremely interesting to us. > > I suppose that another background question is whether the DPDK community > are chiefly concerned with advancing DPDK as a platform and a brand or are > broadly keen to develop and share code that is useful in diverse networking > projects. (Is this whole discussion off-topic for dpdk-devel?) > > This is one of the many reasons why I would love to use parts of DPDK but > do not want to use all of it. (We also allocate our HugeTLBs differently, > etc, because we have different priorities.)
[dpdk-dev] [RFC PATCH 0/2] Move PMDs out of lib directory
On Thu, May 07, 2015 at 10:11:15PM +0100, Wiles, Keith wrote: > > > On 5/7/15, 9:04 AM, "Bruce Richardson" wrote: > > >On Thu, May 07, 2015 at 05:45:20PM +0200, Marc Sune wrote: > >> > >> > >> On 07/05/15 17:35, Bruce Richardson wrote: > >> >The "lib" directory is getting very crowded, with both general libs and > >> >poll mode drivers in it. This patch set proposes to move the PMDs out > >>of the > >> >lib folder and to put them in a separate "pmds" folder. This should > >>help > >> >with code browse-ability as the number of libs, and pmds increases. > >> > > >> >Comments or objections? > >> > > >> >Bruce Richardson (2): > >> > pmds: Use relative rather than absolute paths > >> > pmds: move pmds from lib to separate pmd dir > >> > > >> > create mode 100644 pmds/librte_pmd_xenvirt/rte_mempool_gntalloc.c > >> > create mode 100644 pmds/librte_pmd_xenvirt/rte_xen_lib.c > >> > create mode 100644 pmds/librte_pmd_xenvirt/rte_xen_lib.h > >> > create mode 100644 pmds/librte_pmd_xenvirt/virtio_logs.h > >> > create mode 100644 pmds/librte_pmd_xenvirt/virtqueue.h > >> > > >> > >> But at the end they are also libraries. What about something like: > >> > >> * libs/core <= fundamental libraries (eal, mbuf rings...) > >> * libs/pmds <= all pmds > >> > >> And other feature-group oriented, higher level lib, directories (not > >>sure > >> right now how to better classify them right now): > >> * libs/processing <= packet processing > >> * libs/utils > >> ... > >> > >Yes, they are all just libs, so we could make "pmds" be a sub-dir of the > >lib > >folder. I prefer the shorter path myself, but if others want a multi-level > >hierarchy it's no big deal. > > I like the dpdk/pmds as dpdk/lib/pmds is a bit longer, but I also see if > we want to move the pmds to other repo(s) in the future it would be easier > (I think) to have the subtree at the top. To me pmds are not really > libraries as I think of libc or libcrypto or something along that path. > > The PMDs need to be plug able and they maybe more like loadable modules > then libraries in the future. > > > > >For the other libs, I'm not sure we need to split them up, and I also > >think > >that trying to divide them into categories - and what those categories > >should > >be could - cause endless discussion. However, maybe I'm overly > >pessimistic... :-) > > I agree with Bruce here we just need the PMDS split out for now. However, if in future we might want to split out the other libs into categories, it may make more sense to have the pmds as a subfolder lib. I still think I prefer having pmds at the top level of the tree, though. /Bruce
[dpdk-dev] Beyond DPDK 2.0
Hi Bruce, On 8 May 2015 at 11:06, Bruce Richardson wrote: > For the Intel NIC drivers, the hardware setup part used in DPDK is based > off > the other Intel drivers for other OS's. The code you are interested in > should > therefore be contained within the subfolders off each individual PMD. As > you point > out below, the mbuf specific part is only present in the files in the > top-level > PMD folder with the DPDK-specific RX/TX and queue setup routines. Interesting! How could one embed these Intel drivers (igb, ixgbe, i40e, ...) into new programs? If there is documentation, a platform-agnostic master repository, etc, that would be really interesting. I have the impression as an outsider that the various incarnations of these drivers (Linux, FreeBSD, DPDK) are loosely synchronized forks maintained at considerable effort by each project. If there is actually a common core that is easy to adopt, I am interested! (If dpdk-devel is the wrong mailing list for this discussion then perhaps you could reply with Cc: to a more suitable one and I will subscribe there.) Cheers, -Luke
[dpdk-dev] Beyond DPDK 2.0
On Fri, May 08, 2015 at 11:32:04AM +0200, Luke Gorrie wrote: > Hi Bruce, > > On 8 May 2015 at 11:06, Bruce Richardson > wrote: > > > For the Intel NIC drivers, the hardware setup part used in DPDK is based > > off > > the other Intel drivers for other OS's. The code you are interested in > > should > > therefore be contained within the subfolders off each individual PMD. As > > you point > > out below, the mbuf specific part is only present in the files in the > > top-level > > PMD folder with the DPDK-specific RX/TX and queue setup routines. > > > Interesting! > > How could one embed these Intel drivers (igb, ixgbe, i40e, ...) into new > programs? > > If there is documentation, a platform-agnostic master repository, etc, that > would be really interesting. > > I have the impression as an outsider that the various incarnations of these > drivers (Linux, FreeBSD, DPDK) are loosely synchronized forks maintained at > considerable effort by each project. If there is actually a common core > that is easy to adopt, I am interested! > > (If dpdk-devel is the wrong mailing list for this discussion then perhaps > you could reply with Cc: to a more suitable one and I will subscribe there.) > > Cheers, > -Luke The code in those directories is "common" code that is maintained by Intel - which is why you see repeated comments about not modifying it for DPDK. It is just contained in it's own subfolder in each DPDK driver for easier updating off the internal Intel baseline. /Bruce
[dpdk-dev] Beyond DPDK 2.0
On 8 May 2015 at 11:42, Bruce Richardson wrote: > The code in those directories is "common" code that is maintained by Intel > - > which is why you see repeated comments about not modifying it for DPDK. It > is > just contained in it's own subfolder in each DPDK driver for easier > updating > off the internal Intel baseline. > Thanks for pointing this out to me, Bruce. Food for thought. Cheers, -Luke
[dpdk-dev] Beyond DPDK 2.0
> Sounds like you want something like libc, but DPDK is a system like a user > space OS more then it is a collection of functions that are independent > like strlen, strcpy, memcpy, printf or ... Some parts of DPDK are > independent and can be used as you suggest, but the real performance > sections are tied together. > > >> Regards, > >> ++Keith This is indeed quite a statement. DPDK is not just a bunch of NIC drivers, but "a user space OS" (DPDK 1.0 had a baremetal boot: why did it disappeared?). Why Linux or Windows do not integrate DPDK concepts to catch up performance wise? Is it something so deep like the "Big Kernel Lock" that took so many years to get rid of? My assumption is that all current kernels have been built with one implicit hypothesis: the memory is much faster than cpu. This is the opposite today. DPDK internal structure has been adapted to the new paradigm where the TLBs, the memory bandwidth are the scarce resources to manage. So I guess Linux and Windows will not be able to integrate DPDK concepts for performance anytime soon, if ever... Reading the list carefully, I expect disk block PMDs (and block framework?) to come next. Beyond DPDK 2.0: is it time to accept the fact that DPDK community is actually paving the way to the next generation lightweight, high performance, para-virtualized OS? Is it a DPDK task? Another project ? Should we rename DPDK to PVDK? - HK
[dpdk-dev] Beyond DPDK 2.0
On Fri, May 08, 2015 at 12:26:39PM +0200, Hobywan Kenoby wrote: > > > Sounds like you want something like libc, but DPDK is a system like a user > > space OS more then it is a collection of functions that are independent > > like strlen, strcpy, memcpy, printf or ... Some parts of DPDK are > > independent and can be used as you suggest, but the real performance > > sections are tied together. > > > > >> Regards, > > >> ++Keith > > This is indeed quite a statement. DPDK is not just a > bunch of NIC drivers, but "a user space OS" (DPDK 1.0 had a baremetal > boot: why did it disappeared?). > > > > Why Linux or Windows do not integrate DPDK concepts to > catch up performance wise? Is it something so deep like the "Big > Kernel Lock" that took so many years to get rid of? > Some optimizations are being looked at in the kernel (more deeply ingrained use of accelerators/offloads like cam management/flow steering, checksum & encap offloads, tx batching, etc) Those are features which the kernel can opportunistically take advantage of in a hardware agnostic fashion. Some optimizations simply aren't worth the effort to take into a general purpose OS that seeks to support multiple arches. Many of the DPDK optimizations utilize instruction families like AVX or SSE, which, while potentially useful in some situations have equal potential to be catastrophic to non network-i/o bound workloads. > > > My assumption is that all current kernels have been > built with one implicit hypothesis: the memory is much faster than cpu. This > is Thats not entirely true. Or more to the point, its not true in any way thats relevant to a comparison between DPDK and the Linux network stack. Linux is as careful with its cache management as DPDK is (arguably more so, as it has to juggle multiple workloads instead of the single purpose workload that DPDK is designed for). The difference is that Linux often has to ignore some performance improvements because it has the additional responsibiilty of providing secruity and isolation to multiple processes on multiple architectures. > the opposite today. DPDK internal structure has been adapted to the new > paradigm where the TLBs, the memory bandwidth are the scarce resources to > manage. So I > guess Linux and Windows will not be able to integrate DPDK concepts for > performance anytime soon, if ever... > This is the case with every bit of software. Memory bandwidth is always a scarce resoruce to manage. The difference is that general purpose operating systems consider protection/layering/isolation to be of equal or greater importance than performance. Tradeoffs have to be made. Linux in general strives to isolate hardware from applications both functionally and physically so as to ensure that there is minimal risk in one process adversely affecting the other. The tradeoff is that the Linux device model can't just do anything it wants to improve performance. Converserly, DPDK is all about performance. Up until recently (and likely still somewhat in the future), you have to rebuild your application every time you move to a new version of DPDK, because the API fluctuated with every release to eek out additional performance. The DPDK can optimize using vectorized x86 instructions and other cpu specific features througout because it is in the position to only worry about a very narrow field of architectures. > > > Reading the list carefully, I expect disk block PMDs > (and block framework?) to come next. > > > > Beyond DPDK 2.0: is it time to accept the fact that > DPDK community is actually paving the way to the next generation lightweight, > high performance, para-virtualized OS? Is it a DPDK task? Another project ? > Should we rename DPDK to PVDK? > > > > - HK > >
[dpdk-dev] Beyond DPDK 2.0
Hi Luke, On 5/7/15, 10:29 PM, "Luke Gorrie" wrote: >On 8 May 2015 at 06:16, Wiles, Keith wrote: > >The PMDs or drivers would not be useful without DPDK MBUFS IMO > > > > > >Surprisingly perhaps, I would find them very useful. > > >To me there are two parts to a driver: the hardware setup and the >transmit/receive. > > >The hardware setup is complex and generic. You have to read a >thousand-page data sheet and then write code to initialize the hardware, >setup queues, enable promisc/multicast, enable features you want like >vmdq or flow director, and so on. You need to accumulate > workarounds for hard-to-test problems like cards being discovered with >unsuitable values in their EEPROM. There is not much intellectual value >in this code being written more than once. > > >I would like to see this hardware setup code shared between many >projects. That code does not depend on a specific mbuf struct. Sharing >could be done with an embeddable PMD library, with a bifurcated driver in >the kernel, with the SR-IOV PF/VF model, or > surely other ways too. These all have limited applicability today. > > >The transmit/receive part, on the other hand, seems very >application-dependent. This part depends on the specific mbuf struct and >the way you are developing your application around it. You will need to >write code to suit your design for using scatter/gather, > allowed sizes of individual buffers, the granularity at which you are >keeping track of checksum validity, how you use TSO/LRO, how you use >interrupts, how you batch work together, and so on. This is easy or hard >depending on how simple or complex the application > is. > > >I am not so interested in sharing this code. I think that different >applications will legitimately have different designs - including mbuf >structs - and they all need code that suits their own design. I think >there is a lot of value in people being creative > in these areas and trying different things. > > >So while Avi might only mean that he wants to allocate the bytes for his >mbufs himself, on our side we want to design our own mbuf struct. The >cost of that today is to write our own device drivers from scratch but >for now that seems justified. Going forward > if there were a simpler mechanism that reduced our workload and gave us >access to more hardware - libixgbe, libi40e, etc - that would be >extremely interesting to us. I think I see your point about hardware setup and handling packets from the rings as it would be nice to allow others to utilize those parts of the code. The drivers (I believe) are mostly from FreeBSD and changed to be our PMDs, which to me they are fairly generic in some cases. I will have a look at the drivers when I get back home. In the past I have written drivers using the your suggestion around we have a upper and lower layer the lower layer is all hardware specific and the upper layer is all around the network stack interface. My point is we should be able to split the two and possible provide you the lower layer APIs in a cleaner way. > > > >I suppose that another background question is whether the DPDK community >are chiefly concerned with advancing DPDK as a platform and a brand or >are broadly keen to develop and share code that is useful in diverse >networking projects. (Is this whole discussion > off-topic for dpdk-devel?) I would suggested you are correct DPDK as platform is more how it started and is going, but it does not mean we can not move is a slightly different direction to help other access the parts which are more generic. Regards, ++Keith > > >This is one of the many reasons why I would love to use parts of DPDK but >do not want to use all of it. (We also allocate our HugeTLBs differently, >etc, because we have different priorities.) > > > > > > >
[dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS
Hi Andrey, OK, so be it. Thus in case you want to distribute (or just calculate hash based on non standart tuple) - use your own tuple and own hash key (length of tuple and key - responsible of the programmer). In case you want to emulate NIC RSS - use union rte_thash_tuple (still needs to be updated with new NICs input tuples) and NIC RSS hash key. P.S Thanks for reviews. Regards, Vladimir 2015-05-07 14:38 GMT+03:00 Chilikin, Andrey : > Hi Vladimir, > > > > Yes, at the moment NICs support limited input sets for hash calculation, > but why limit SW for the same sets if it can be done in more general way > and be easily scalable for HW updates? Using limited input set for RSS is > not a feature of Toeplitz hash, but limitation of HW. I believe that > general Toeplitz function will be more appropriate ? it will cover input > sets currently supported by HW and also will be easily scalable for future > HW. Also, talking about different NICs ? Niantic and Fortville, for > example, have hash keys of different length, so rte_softrss() function > should take into account hash key?s length as well. > > Regards, > > Andrey > > > > > > *From:* Vladimir Medvedkin [mailto:medvedkinv at gmail.com] > *Sent:* Thursday, May 7, 2015 11:28 AM > *To:* Chilikin, Andrey > *Cc:* dev at dpdk.org > *Subject:* Re: [dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by > RSS > > > > Hi Andrey, > > The main goal of this new functions is to calculate the hash which is > equal to the hash of the NIC. > According to XL710 datasheet table 7-5 for sctp input set consists of > IP4-S, IP4-D, SCTP-Verification-Tag. I don't see any NIC that uses QinQ or > single vlan tag, ip proto number, tunnel id, vxlan, etc for calculating RSS > hash. If it appear we can always update union rte_thash_tuple. > I think it should be like: > > struct rte_ports { > uint16_t dport; > uint16_t sport; > }; > > union rte_thash_l4 { > struct rte_ports ports; > uint32_tsctp_tag; > }; > struct rte_ipv4_tuple { > uint32_tsrc_addr; > uint32_tdst_addr; > union rte_thash_l4 l4; > }; > > If it is necessary to distribute packets according to non standart tuples > I think it's more appropriate to use crc32 or jhash because of speed. > rte_softrss_be consumes 400-500 clocks for each 4-byte input at E3 > 1230v1 at 3.2GHz. This means for ipv4+tcp it consumes ~1500 clocks. > > If you or someone still think you need general toeplitz hash I'll add it. > > Regards, > > Vladimir > > > > > > 2015-05-05 19:03 GMT+03:00 Chilikin, Andrey : > > Hi Vladimir, > > Why limit Toeplitz hash calculation to predefined tuples and length? > Should it be more general, something like > rte_softrss_be(void *input, uint32_t input_len, const uint8_t *rss_key) to > enable hash calculation for an input of any size? It would be useful for > distributing packets using some non-standard tuples, like hashing on QinQ > or adding IP protocol to hash calculation to separate UDP and TCP flows or > even some other fields from a packet, for example, tunnel ID from VXLAN > headers. By the way, i40e already supports RSS for SCTP in addition to TCP > and UDP and includes Verification Tag as well as SCTP source and > destination ports for RSS hash. > > Regards, > Andrey > > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir > > Medvedkin > > Sent: Tuesday, May 5, 2015 2:20 PM > > To: dev at dpdk.org > > Subject: [dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS > > > > Software implementation of the Toeplitz hash function used by RSS. > > Can be used either for packet distribution on single queue NIC or for > > simulating of RSS computation on specific NIC (for example after GRE > header > > decapsulating). > > > > v2 changes > > - Add ipv6 support > > - Various style fixes > > > > Signed-off-by: Vladimir Medvedkin > > --- > > lib/librte_hash/Makefile| 1 + > > lib/librte_hash/rte_thash.h | 209 > > > > 2 files changed, 210 insertions(+) > > create mode 100644 lib/librte_hash/rte_thash.h > > > > diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index > > 3696cb1..981230b 100644 > > --- a/lib/librte_hash/Makefile > > +++ b/lib/librte_hash/Makefile > > @@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c > > SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h SYMLINK- > > $(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h SYMLINK- > > $(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h > > +SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h > > SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h > > > > # this lib needs eal > > diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h > new file > > mode 100644 index 000..42c7bf6 > > --- /dev/null > > +++ b/lib/librte_hash/rte_thash.h > > @@ -0,0 +1,209 @@ > > +/*- > > + * BSD LICENSE > > + * >
[dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS
Software implementation of the Toeplitz hash function used by RSS. Can be used either for packet distribution on single queue NIC or for simulating of RSS computation on specific NIC (for example after GRE header decapsulating). v3 changes - Rework API to be more generic - Add sctp_tag into tuple v2 changes - Add ipv6 support - Various style fixes Signed-off-by: Vladimir Medvedkin --- lib/librte_hash/Makefile| 1 + lib/librte_hash/rte_thash.h | 207 2 files changed, 208 insertions(+) create mode 100644 lib/librte_hash/rte_thash.h diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index 3696cb1..981230b 100644 --- a/lib/librte_hash/Makefile +++ b/lib/librte_hash/Makefile @@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h +SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h # this lib needs eal diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h new file mode 100644 index 000..5d5111b --- /dev/null +++ b/lib/librte_hash/rte_thash.h @@ -0,0 +1,207 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_THASH_H +#define _RTE_THASH_H + +/** + * @file + * + * toeplitz hash functions. + */ + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Software implementation of the Toeplitz hash function used by RSS. + * Can be used either for packet distribution on single queue NIC + * or for simulating of RSS computation on specific NIC (for example + * after GRE header decapsulating) + */ + +#include +#include +#include + +#ifdef __SSE3__ +static const __m128i bswap_mask = {0x0405060700010203, 0x0C0D0E0F08090A0B}; +#endif + +#define RTE_THASH_V4_L3 2 /*calculate hash of ipv4 header only*/ +#define RTE_THASH_V4_L4 3 /*calculate hash of ipv4 + transport headers*/ +#define RTE_THASH_V6_L3 8 /*calculate hash of ipv6 header only */ +#define RTE_THASH_V6_L4 9 /*calculate hash of ipv6 + transport headers */ + +struct rte_ports { +uint16_t dport; +uint16_t sport; +}; + +union rte_thash_l4 { +struct rte_ports ports; +uint32_tsctp_tag; +}; + +/** + * IPv4 tuple + * addreses and ports/sctp_tag have to be CPU byte order + */ +struct rte_ipv4_tuple { + uint32_tsrc_addr; + uint32_tdst_addr; + union rte_thash_l4 l4; +}; + +/** + * IPv6 tuple + * Addresses have to be filled by rte_thash_load_v6_addr() + * ports/sctp_tag have to be CPU byte order + */ +struct rte_ipv6_tuple { + uint8_t src_addr[16]; + uint8_t dst_addr[16]; + union rte_thash_l4 l4; +}; + +union rte_thash_tuple { + struct rte_ipv4_tuple v4; + struct rte_ipv6_tuple v6; +} __attribute__((aligned(16))); + +/** + * Prepare special converted key to use with rte_softrss_be() + * @param orig + * pointer to original RSS key + * @param targ + * pointer to target RSS key + * @param len + * RSS key length + */ +static inline void +rte_convert_rss_key(const uint32_t *orig, uint32_t *tar
[dpdk-dev] Beyond DPDK 2.0
On Fri, 8 May 2015 14:44:17 + "Wiles, Keith" wrote: > Hi Luke, > > On 5/7/15, 10:29 PM, "Luke Gorrie" wrote: > > >On 8 May 2015 at 06:16, Wiles, Keith wrote: > > > >The PMDs or drivers would not be useful without DPDK MBUFS IMO > > > > > > > > > > > >Surprisingly perhaps, I would find them very useful. > > > > > >To me there are two parts to a driver: the hardware setup and the > >transmit/receive. > > > > > >The hardware setup is complex and generic. You have to read a > >thousand-page data sheet and then write code to initialize the hardware, > >setup queues, enable promisc/multicast, enable features you want like > >vmdq or flow director, and so on. You need to accumulate > > workarounds for hard-to-test problems like cards being discovered with > >unsuitable values in their EEPROM. There is not much intellectual value > >in this code being written more than once. > > > > > >I would like to see this hardware setup code shared between many > >projects. That code does not depend on a specific mbuf struct. Sharing > >could be done with an embeddable PMD library, with a bifurcated driver in > >the kernel, with the SR-IOV PF/VF model, or > > surely other ways too. These all have limited applicability today. > > > > > >The transmit/receive part, on the other hand, seems very > >application-dependent. This part depends on the specific mbuf struct and > >the way you are developing your application around it. You will need to > >write code to suit your design for using scatter/gather, > > allowed sizes of individual buffers, the granularity at which you are > >keeping track of checksum validity, how you use TSO/LRO, how you use > >interrupts, how you batch work together, and so on. This is easy or hard > >depending on how simple or complex the application > > is. > > > > > >I am not so interested in sharing this code. I think that different > >applications will legitimately have different designs - including mbuf > >structs - and they all need code that suits their own design. I think > >there is a lot of value in people being creative > > in these areas and trying different things. > > > > > >So while Avi might only mean that he wants to allocate the bytes for his > >mbufs himself, on our side we want to design our own mbuf struct. The > >cost of that today is to write our own device drivers from scratch but > >for now that seems justified. Going forward > > if there were a simpler mechanism that reduced our workload and gave us > >access to more hardware - libixgbe, libi40e, etc - that would be > >extremely interesting to us. > > I think I see your point about hardware setup and handling packets from > the rings as it would be nice to allow others to utilize those parts of > the code. The drivers (I believe) are mostly from FreeBSD and changed to > be our PMDs, which to me they are fairly generic in some cases. I will > have a look at the drivers when I get back home. In the past I have > written drivers using the your suggestion around we have a upper and lower > layer the lower layer is all hardware specific and the upper layer is all > around the network stack interface. My point is we should be able to split > the two and possible provide you the lower layer APIs in a cleaner way. The point is this is BSD code, you can do with it what you will. But the DPDK community doesn't have to care about changes breaking your proprietary application. That is the problem with the whole concept of making DPDK drivers a separate component. It makes them immutable and unmaintainable. Developers don't want to be responsible for code that is used outside its original scope.
[dpdk-dev] Beyond DPDK 2.0
On Fri, 8 May 2015 09:31:34 -0400 Neil Horman wrote: > On Fri, May 08, 2015 at 12:26:39PM +0200, Hobywan Kenoby wrote: > > > > > Sounds like you want something like libc, but DPDK is a system like a user > > > space OS more then it is a collection of functions that are independent > > > like strlen, strcpy, memcpy, printf or ... Some parts of DPDK are > > > independent and can be used as you suggest, but the real performance > > > sections are tied together. > > > > > > >> Regards, > > > >> ++Keith > > > > This is indeed quite a statement. DPDK is not just a > > bunch of NIC drivers, but "a user space OS" (DPDK 1.0 had a baremetal > > boot: why did it disappeared?). > > > > > > > > Why Linux or Windows do not integrate DPDK concepts to > > catch up performance wise? Is it something so deep like the "Big > > Kernel Lock" that took so many years to get rid of? > > > Some optimizations are being looked at in the kernel (more deeply ingrained > use > of accelerators/offloads like cam management/flow steering, checksum & encap > offloads, tx batching, etc) Those are features which the kernel can > opportunistically take advantage of in a hardware agnostic fashion. > > Some optimizations simply aren't worth the effort to take into a general > purpose > OS that seeks to support multiple arches. Many of the DPDK optimizations > utilize instruction families like AVX or SSE, which, while potentially useful > in > some situations have equal potential to be catastrophic to non network-i/o > bound > workloads. > > > > > > My assumption is that all current kernels have been > > built with one implicit hypothesis: the memory is much faster than cpu. > > This is > Thats not entirely true. Or more to the point, its not true in any way thats > relevant to a comparison between DPDK and the Linux network stack. Linux is > as > careful with its cache management as DPDK is (arguably more so, as it has to > juggle multiple workloads instead of the single purpose workload that DPDK is > designed for). The difference is that Linux often has to ignore some > performance improvements because it has the additional responsibiilty of > providing secruity and isolation to multiple processes on multiple > architectures. > > > the opposite today. DPDK internal structure has been adapted to the new > > paradigm where the TLBs, the memory bandwidth are the scarce resources to > > manage. So I > > guess Linux and Windows will not be able to integrate DPDK concepts for > > performance anytime soon, if ever... > > > This is the case with every bit of software. Memory bandwidth is always a > scarce resoruce to manage. The difference is that general purpose operating > systems consider protection/layering/isolation to be of equal or greater > importance than performance. Tradeoffs have to be made. Linux in general > strives to isolate hardware from applications both functionally and physically > so as to ensure that there is minimal risk in one process adversely affecting > the other. The tradeoff is that the Linux device model can't just do anything > it wants to improve performance. > > Converserly, DPDK is all about performance. Up until recently (and likely > still > somewhat in the future), you have to rebuild your application every time you > move to a new version of DPDK, because the API fluctuated with every release > to > eek out additional performance. The DPDK can optimize using vectorized x86 > instructions and other cpu specific features througout because it is in the > position to only worry about a very narrow field of architectures. > > > > > > > Reading the list carefully, I expect disk block PMDs > > (and block framework?) to come next. > > > > > > > > Beyond DPDK 2.0: is it time to accept the fact that > > DPDK community is actually paving the way to the next generation > > lightweight, > > high performance, para-virtualized OS? Is it a DPDK task? Another project ? > > Should we rename DPDK to PVDK? The difference is DPDK doesn't care about being general purpose: - scheduler, that is the applications problem - locking, the application must be bound to cpus or do its own locking - buffer management, up to the application. - memory protection (haha) Any operating system provides an abstraction that makes programming easier. If you strip away those abstractions, then sure things go faster but it is less safe and harder. Linux is about providing safe abstraction. If you want an OS that doesn't do that, look to Cloudius or the other DIY environments like DPDK. This is not a new concept. Oracle and other DBMS vendors have been asking for the OS to get out of the way for years, but then customers find that things like filesystems are convenient necessities.
[dpdk-dev] [RFC PATCH 0/2] dynamic memzones
Please NOTE that this series is meant to illustrate an idea/approach and start discussion on the topic. Current implemetation allows reserving/creating memzones but not the opposite (unreserve/delete). This affects mempools and other memzone based objects. >From my point of view, implementing unreserve functionality for memzones would look like malloc over memsegs. Thus, this approach moves malloc inside eal (which in turn removes a circular dependency), where malloc heaps are composed of memsegs. We keep both malloc and memzone APIs as they are, but memzones allocate its memory by calling malloc_heap_alloc (there would be some ABI changes, see below). Some extra functionality is required in malloc to allow for boundary constrained memory requests. In summary, currently malloc is based on memzones, and with this approach memzones are based on malloc. An alternative would be to move malloc internals (malloc_heap, malloc_elem) to the eal, but keeping the malloc library as is, where malloc is based on memzones. This way we could avoid ABI changes while keeping the existing circular dependency between malloc and eal. TODOs: - Implement memzone_unreserve, simply call rte_malloc_free. - Implement mempool_delete, simply call rte_memzone_unreserve. - Init heaps with all available memsegs at once. - Review symbols in version map. ABI changes: - Removed support for rte_memzone_reserve_ with len=0 (not needed?). - Removed librte_malloc as single library (linker script as work around?). IDEAS FOR FUTURE WORK: - More control over requested memory, ie. shared/private, phys_contig, etc. One of the goals would be trying to reduce the need of physically contiguous memory when not required. - Attach/unattach hugepages at runtime (faster VM migration). - Improve malloc algorithm? ie. jemalloc (or any other). Any comments/toughts and/or different approaches are welcome. Sergio Gonzalez Monroy (2): eal: move librte_malloc to eal/common eal: memzone allocated by malloc config/common_bsdapp| 9 +- config/common_linuxapp | 9 +- lib/Makefile| 1 - lib/librte_acl/Makefile | 2 +- lib/librte_eal/bsdapp/eal/Makefile | 4 +- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 18 ++ lib/librte_eal/common/Makefile | 1 + lib/librte_eal/common/eal_common_memzone.c | 233 ++-- lib/librte_eal/common/include/rte_malloc.h | 342 lib/librte_eal/common/include/rte_malloc_heap.h | 4 +- lib/librte_eal/common/include/rte_memory.h | 1 + lib/librte_eal/common/malloc_elem.c | 342 lib/librte_eal/common/malloc_elem.h | 192 + lib/librte_eal/common/malloc_heap.c | 287 lib/librte_eal/common/malloc_heap.h | 70 + lib/librte_eal/common/rte_malloc.c | 259 ++ lib/librte_eal/linuxapp/eal/Makefile| 4 +- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 18 ++ lib/librte_hash/Makefile| 2 +- lib/librte_lpm/Makefile | 2 +- lib/librte_malloc/Makefile | 52 lib/librte_malloc/malloc_elem.c | 320 -- lib/librte_malloc/malloc_elem.h | 190 - lib/librte_malloc/malloc_heap.c | 209 --- lib/librte_malloc/malloc_heap.h | 70 - lib/librte_malloc/rte_malloc.c | 260 -- lib/librte_malloc/rte_malloc.h | 342 lib/librte_malloc/rte_malloc_version.map| 19 -- lib/librte_mempool/Makefile | 2 - lib/librte_pmd_af_packet/Makefile | 1 - lib/librte_pmd_bond/Makefile| 1 - lib/librte_pmd_e1000/Makefile | 2 +- lib/librte_pmd_enic/Makefile| 2 +- lib/librte_pmd_fm10k/Makefile | 2 +- lib/librte_pmd_i40e/Makefile| 2 +- lib/librte_pmd_ixgbe/Makefile | 2 +- lib/librte_pmd_mlx4/Makefile| 1 - lib/librte_pmd_null/Makefile| 1 - lib/librte_pmd_pcap/Makefile| 1 - lib/librte_pmd_virtio/Makefile | 2 +- lib/librte_pmd_vmxnet3/Makefile | 2 +- lib/librte_pmd_xenvirt/Makefile | 2 +- lib/librte_port/Makefile| 1 - lib/librte_ring/Makefile| 3 +- lib/librte_table/Makefile | 1 - 45 files changed, 1571 insertions(+), 1719 deletions(-) create mode 100644 lib/librte_eal/common/include/rte_malloc.h create mode 100644 lib/librte_eal/common/malloc_elem.c create mode 1
[dpdk-dev] [RFC PATCH 2/2] eal: memzone allocated by malloc
In the current memory hierarchy, memsegs are groups of physically contiguous hugepages, memzone are slices of memsegs and malloc further slices memzones into smaller memory chunks. This patch modifies malloc so it slices/partitions memsegs instead of memzones. Thus memzones would call malloc internally for memoy allocation while maintaining its ABI. The only exception is the reserving a memzone with len=0 is not supported anymore. Signed-off-by: Sergio Gonzalez Monroy --- lib/librte_eal/common/eal_common_memzone.c | 233 ++-- lib/librte_eal/common/include/rte_malloc_heap.h | 4 +- lib/librte_eal/common/include/rte_memory.h | 1 + lib/librte_eal/common/malloc_elem.c | 60 -- lib/librte_eal/common/malloc_elem.h | 14 +- lib/librte_eal/common/malloc_heap.c | 188 +-- lib/librte_eal/common/malloc_heap.h | 4 +- lib/librte_eal/common/rte_malloc.c | 7 +- 8 files changed, 207 insertions(+), 304 deletions(-) diff --git a/lib/librte_eal/common/eal_common_memzone.c b/lib/librte_eal/common/eal_common_memzone.c index 888f9e5..3dc8133 100644 --- a/lib/librte_eal/common/eal_common_memzone.c +++ b/lib/librte_eal/common/eal_common_memzone.c @@ -50,11 +50,10 @@ #include #include +#include "malloc_heap.h" +#include "malloc_elem.h" #include "eal_private.h" -/* internal copy of free memory segments */ -static struct rte_memseg *free_memseg = NULL; - static inline const struct rte_memzone * memzone_lookup_thread_unsafe(const char *name) { @@ -88,53 +87,12 @@ rte_memzone_reserve(const char *name, size_t len, int socket_id, len, socket_id, flags, RTE_CACHE_LINE_SIZE); } -/* - * Helper function for memzone_reserve_aligned_thread_unsafe(). - * Calculate address offset from the start of the segment. - * Align offset in that way that it satisfy istart alignmnet and - * buffer of the requested length would not cross specified boundary. - */ -static inline phys_addr_t -align_phys_boundary(const struct rte_memseg *ms, size_t len, size_t align, - size_t bound) -{ - phys_addr_t addr_offset, bmask, end, start; - size_t step; - - step = RTE_MAX(align, bound); - bmask = ~((phys_addr_t)bound - 1); - - /* calculate offset to closest alignment */ - start = RTE_ALIGN_CEIL(ms->phys_addr, align); - addr_offset = start - ms->phys_addr; - - while (addr_offset + len < ms->len) { - - /* check, do we meet boundary condition */ - end = start + len - (len != 0); - if ((start & bmask) == (end & bmask)) - break; - - /* calculate next offset */ - start = RTE_ALIGN_CEIL(start + 1, step); - addr_offset = start - ms->phys_addr; - } - - return (addr_offset); -} - static const struct rte_memzone * memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, int socket_id, unsigned flags, unsigned align, unsigned bound) { struct rte_mem_config *mcfg; - unsigned i = 0; - int memseg_idx = -1; - uint64_t addr_offset, seg_offset = 0; size_t requested_len; - size_t memseg_len = 0; - phys_addr_t memseg_physaddr; - void *memseg_addr; /* get pointer to global configuration */ mcfg = rte_eal_get_configuration()->mem_config; @@ -166,10 +124,10 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, if (align < RTE_CACHE_LINE_SIZE) align = RTE_CACHE_LINE_SIZE; - - /* align length on cache boundary. Check for overflow before doing so */ - if (len > SIZE_MAX - RTE_CACHE_LINE_MASK) { - rte_errno = EINVAL; /* requested size too big */ + /* align length on cache boundary. Check for overflow before doing so +* FIXME need to update API doc regarding len value*/ + if ((len > SIZE_MAX - RTE_CACHE_LINE_MASK) || (len == 0)){ + rte_errno = EINVAL; return NULL; } @@ -186,123 +144,29 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len, return NULL; } - /* find the smallest segment matching requirements */ - for (i = 0; i < RTE_MAX_MEMSEG; i++) { - /* last segment */ - if (free_memseg[i].addr == NULL) - break; - - /* empty segment, skip it */ - if (free_memseg[i].len == 0) - continue; - - /* bad socket ID */ - if (socket_id != SOCKET_ID_ANY && - free_memseg[i].socket_id != SOCKET_ID_ANY && - socket_id != free_memseg[i].socket_id) - continue; - - /* -* calculate offset to closest alignment that -* meets boundary conditions. -*/ -
[dpdk-dev] [RFC PATCH 1/2] eal: move librte_malloc to eal/common
This patch moves the malloc library inside the eal. This is the first step towards using malloc to allocate memory directly from memsegs. Thus, memzones would allocate memory through malloc, allowing unreserve/free memzones. Signed-off-by: Sergio Gonzalez Monroy --- config/common_bsdapp| 9 +- config/common_linuxapp | 9 +- lib/Makefile| 1 - lib/librte_acl/Makefile | 2 +- lib/librte_eal/bsdapp/eal/Makefile | 4 +- lib/librte_eal/bsdapp/eal/rte_eal_version.map | 18 ++ lib/librte_eal/common/Makefile | 1 + lib/librte_eal/common/include/rte_malloc.h | 342 lib/librte_eal/common/malloc_elem.c | 320 ++ lib/librte_eal/common/malloc_elem.h | 190 + lib/librte_eal/common/malloc_heap.c | 209 +++ lib/librte_eal/common/malloc_heap.h | 70 + lib/librte_eal/common/rte_malloc.c | 260 ++ lib/librte_eal/linuxapp/eal/Makefile| 4 +- lib/librte_eal/linuxapp/eal/rte_eal_version.map | 18 ++ lib/librte_hash/Makefile| 2 +- lib/librte_lpm/Makefile | 2 +- lib/librte_malloc/Makefile | 52 lib/librte_malloc/malloc_elem.c | 320 -- lib/librte_malloc/malloc_elem.h | 190 - lib/librte_malloc/malloc_heap.c | 209 --- lib/librte_malloc/malloc_heap.h | 70 - lib/librte_malloc/rte_malloc.c | 260 -- lib/librte_malloc/rte_malloc.h | 342 lib/librte_malloc/rte_malloc_version.map| 19 -- lib/librte_mempool/Makefile | 2 - lib/librte_pmd_af_packet/Makefile | 1 - lib/librte_pmd_bond/Makefile| 1 - lib/librte_pmd_e1000/Makefile | 2 +- lib/librte_pmd_enic/Makefile| 2 +- lib/librte_pmd_fm10k/Makefile | 2 +- lib/librte_pmd_i40e/Makefile| 2 +- lib/librte_pmd_ixgbe/Makefile | 2 +- lib/librte_pmd_mlx4/Makefile| 1 - lib/librte_pmd_null/Makefile| 1 - lib/librte_pmd_pcap/Makefile| 1 - lib/librte_pmd_virtio/Makefile | 2 +- lib/librte_pmd_vmxnet3/Makefile | 2 +- lib/librte_pmd_xenvirt/Makefile | 2 +- lib/librte_port/Makefile| 1 - lib/librte_ring/Makefile| 3 +- lib/librte_table/Makefile | 1 - 42 files changed, 1450 insertions(+), 1501 deletions(-) create mode 100644 lib/librte_eal/common/include/rte_malloc.h create mode 100644 lib/librte_eal/common/malloc_elem.c create mode 100644 lib/librte_eal/common/malloc_elem.h create mode 100644 lib/librte_eal/common/malloc_heap.c create mode 100644 lib/librte_eal/common/malloc_heap.h create mode 100644 lib/librte_eal/common/rte_malloc.c delete mode 100644 lib/librte_malloc/Makefile delete mode 100644 lib/librte_malloc/malloc_elem.c delete mode 100644 lib/librte_malloc/malloc_elem.h delete mode 100644 lib/librte_malloc/malloc_heap.c delete mode 100644 lib/librte_malloc/malloc_heap.h delete mode 100644 lib/librte_malloc/rte_malloc.c delete mode 100644 lib/librte_malloc/rte_malloc.h delete mode 100644 lib/librte_malloc/rte_malloc_version.map diff --git a/config/common_bsdapp b/config/common_bsdapp index c2374c0..8f74e7b 100644 --- a/config/common_bsdapp +++ b/config/common_bsdapp @@ -97,6 +97,8 @@ CONFIG_RTE_LOG_LEVEL=8 CONFIG_RTE_LOG_HISTORY=256 CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n +CONFIG_RTE_MALLOC_DEBUG=n +CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M # # FreeBSD contiguous memory driver settings @@ -295,13 +297,6 @@ CONFIG_RTE_LIBRTE_TIMER=y CONFIG_RTE_LIBRTE_TIMER_DEBUG=n # -# Compile librte_malloc -# -CONFIG_RTE_LIBRTE_MALLOC=y -CONFIG_RTE_LIBRTE_MALLOC_DEBUG=n -CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M - -# # Compile librte_cfgfile # CONFIG_RTE_LIBRTE_CFGFILE=y diff --git a/config/common_linuxapp b/config/common_linuxapp index 0078dc9..78ce1e7 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -100,6 +100,8 @@ CONFIG_RTE_EAL_ALLOW_INV_SOCKET_ID=n CONFIG_RTE_EAL_ALWAYS_PANIC_ON_ERROR=n CONFIG_RTE_EAL_IGB_UIO=y CONFIG_RTE_EAL_VFIO=y +CONFIG_RTE_MALLOC_DEBUG=n +CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M # # Special configurations in PCI Config Space for high performance @@ -302,13 +304,6 @@ CONFIG_RTE_LIBRTE_TIMER=y CONFIG_RTE_LIBRTE_TIMER_DEBUG=n # -# Compile librte_malloc -# -CONFIG_RTE_LIBRTE_MALLOC=y -CONFIG_RTE_LIBRTE_MALLOC_DEBUG=n -CONFIG_RTE_MALLOC_MEMZONE_SIZE=11M -
[dpdk-dev] [PATCH v2] Implement rte_memcmp with AVX/SSE instructions.
Background: After preliminary discussion with John (Zhihong) and Tim from Intel it was decided that it would be beneficial to use AVX/SSE instructions for memcmp similar to memcpy being implemeneted. In addition, we decided to use librte_hash as a test candidate to test both functionality and performance. Currently memcmp in librte_hash is used for key comparisons whose length can vary and max key length is defined to 64 bytes. Preliminary tests on memory comparison alone shows using AVX/SSE instructions takes 1/3rd CPU ticks compared with regular memcmp function. Furthermore, hash_perf_autotest shows better results in all categories. Please note that memory comparison is a small portion in hash functionality and CPU Ticks/Op is for hash operations (Add on Empty, Add update, Lookup). Only hash lookup results are shown below. I can send complete results if interested. Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04, x86_64, 16GB DDR3 system. PS: I would like to keep "rte_memcmp" simple with return codes 0 - match 1 - no-match since usage in DPDK is for equality or inequality and I have not seen any instance where less-than/greater-than comparison is needed. Hence "if (unlikely(...))" portion in the code will probably be removed and it will be made specific to DPDK rather than being generic. /*Existing code**/ *** Hash table performance test results *** Hash Func. , Operation , Key size (bytes), Entries, Entries per bucket, Errors , Avg. bucket entries, Ticks/Op. rte_hash_crc, Lookup , 16 , 1024 , 1 , 1 , 0.00 , 88.55 rte_hash_crc, Lookup , 16 , 1024 , 2 , 1 , 0.00 , 99.28 rte_hash_crc, Lookup , 16 , 1024 , 4 , 1 , 0.00 , 106.73 rte_hash_crc, Lookup , 16 , 1024 , 8 , 1 , 0.00 , 126.99 rte_hash_crc, Lookup , 16 , 1024 , 16, 1 , 0.00 , 159.80 rte_hash_crc, Lookup , 16 , 1048576, 1 , 51 , 0.01 , 175.23 rte_hash_crc, Lookup , 16 , 1048576, 2 , 2 , 0.02 , 171.24 rte_hash_crc, Lookup , 16 , 1048576, 4 , 0 , 0.04 , 145.48 rte_hash_crc, Lookup , 16 , 1048576, 8 , 0 , 0.08 , 162.35 rte_hash_crc, Lookup , 16 , 1048576, 16, 0 , 0.15 , 182.42 jhash , Lookup , 16 , 1048576, 1 , 33 , 0.01 , 219.71 jhash , Lookup , 16 , 1048576, 2 , 1 , 0.02 , 216.44 jhash , Lookup , 16 , 1048576, 4 , 0 , 0.04 , 188.29 jhash , Lookup , 16 , 1048576, 8 , 0 , 0.08 , 203.70 jhash , Lookup , 16 , 1048576, 16, 0 , 0.15 , 229.50 /**New AVX/SSE code**/ Hash Func. , Operation , Key size (bytes), Entries, Entries per bucket, Errors , Avg. bucket entries, Ticks/Op. rte_hash_crc, Lookup , 16 , 1024 , 1 , 1 , 0.00 , 85.69 rte_hash_crc, Lookup , 16 , 1024 , 2 , 1 , 0.00 , 93.95 rte_hash_crc, Lookup , 16 , 1024 , 4 , 1 , 0.00 , 102.80 rte_hash_crc, Lookup , 16 , 1024 , 8 , 1 , 0.00 , 122.60 rte_hash_crc, Lookup , 16 , 1024 , 16, 1 , 0.00 , 156.58 rte_hash_crc, Lookup , 16 , 1048576, 1 , 41 , 0.01 , 156.84 rte_hash_crc, Lookup , 16 , 1048576, 2 , 0 , 0.02 , 157.90 rte_hash_crc, Lookup , 16 , 1048576, 4 , 0 , 0.04 , 134.92 rte_hash_crc, Lookup , 16 , 1048576, 8 , 0 , 0.08 , 150.99 rte_hash_crc, Lookup , 16 , 1048576, 16, 0 , 0.15 , 174.08 jhash , Lookup , 16 , 1048576, 1 , 45 , 0.01 , 212.03 jhash , Lookup , 16 , 1048576, 2 , 2 , 0.02 , 210.65 jhash , Lookup , 16 , 1048576, 4 , 0
[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.
This patch replaces memcmp in librte_hash with rte_memcmp which is implemented with AVX/SSE instructions. Preliminary results on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu 14.04 x86_64 shows comparisons using AVX/SSE instructions taking 1/3rd CPU ticks for 16, 32, 48 and 64 bytes comparison. In addition, hash_perf_autotest results shows using new comparison function results in faster completion of hash operations than existing memcmp in all categories. Signed-off-by: Ravi Kerur --- app/test/test_hash_perf.c | 36 +- .../common/include/arch/ppc_64/rte_memcmp.h| 62 +++ .../common/include/arch/x86/rte_memcmp.h | 421 + lib/librte_eal/common/include/generic/rte_memcmp.h | 131 +++ lib/librte_hash/rte_hash.c | 59 ++- 5 files changed, 675 insertions(+), 34 deletions(-) create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcmp.h create mode 100644 lib/librte_eal/common/include/generic/rte_memcmp.h diff --git a/app/test/test_hash_perf.c b/app/test/test_hash_perf.c index 6eabb21..6887629 100644 --- a/app/test/test_hash_perf.c +++ b/app/test/test_hash_perf.c @@ -440,7 +440,7 @@ run_single_tbl_perf_test(const struct rte_hash *h, hash_operation func, uint32_t *invalid_pos_count) { uint64_t begin, end, ticks = 0; - uint8_t *key = NULL; + uint8_t * volatile key = NULL; uint32_t *bucket_occupancies = NULL; uint32_t num_buckets, i, j; int32_t pos; @@ -547,30 +547,30 @@ run_tbl_perf_test(struct tbl_perf_test_params *params) case ADD_UPDATE: num_iterations = params->num_iterations; params->num_iterations = params->entries; - run_single_tbl_perf_test(handle, rte_hash_add_key, params, - &avg_occupancy, &invalid_pos); - params->num_iterations = num_iterations; ticks = run_single_tbl_perf_test(handle, rte_hash_add_key, params, &avg_occupancy, &invalid_pos); + params->num_iterations = num_iterations; + ticks += run_single_tbl_perf_test(handle, rte_hash_add_key, + params, &avg_occupancy, &invalid_pos); break; case DELETE: num_iterations = params->num_iterations; params->num_iterations = params->entries; - run_single_tbl_perf_test(handle, rte_hash_add_key, params, - &avg_occupancy, &invalid_pos); + ticks = run_single_tbl_perf_test(handle, rte_hash_add_key, + params, &avg_occupancy, &invalid_pos); params->num_iterations = num_iterations; - ticks = run_single_tbl_perf_test(handle, rte_hash_del_key, + ticks += run_single_tbl_perf_test(handle, rte_hash_del_key, params, &avg_occupancy, &invalid_pos); break; case LOOKUP: num_iterations = params->num_iterations; params->num_iterations = params->entries; - run_single_tbl_perf_test(handle, rte_hash_add_key, params, - &avg_occupancy, &invalid_pos); + ticks = run_single_tbl_perf_test(handle, rte_hash_add_key, + params, &avg_occupancy, &invalid_pos); params->num_iterations = num_iterations; - ticks = run_single_tbl_perf_test(handle, rte_hash_lookup, + ticks += run_single_tbl_perf_test(handle, rte_hash_lookup, params, &avg_occupancy, &invalid_pos); break; default: return -1; @@ -623,10 +623,15 @@ static int run_all_tbl_perf_tests(void) static void run_hash_func_test(rte_hash_function f, uint32_t init_val, uint32_t key_len) { - static uint8_t key[RTE_HASH_KEY_LENGTH_MAX]; + static uint8_t * volatile key; uint64_t ticks = 0, start, end; unsigned i, j; + key = rte_zmalloc("func hash key", + key_len * sizeof(uint8_t), 16); + if (key == NULL) + return; + for (i = 0; i < HASHTEST_ITERATIONS; i++) { for (j = 0; j < key_len; j++) @@ -638,8 +643,11 @@ static void run_hash_func_test(rte_hash_function f, uint32_t init_val, ticks += end - start; } - printf("%-12s, %-18u, %-13u, %.02f\n", get_hash_name(f), (unsigned) key_len, - (unsigned) init_val, (double)ticks / HASHTEST_ITERATIONS); + rte_free(key); + + printf("%-12s, %-18u, %-13u, %.02f\n", + get_hash_name(f), (unsigned) key_len, (unsigned) init_val, + (double)ticks / HASHTEST_ITERATIONS); } /* @@ -687,7 +695,7 @@ fbk_hash_perf
[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.
On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur wrote: > This patch replaces memcmp in librte_hash with rte_memcmp which is > implemented with AVX/SSE instructions. > > +static inline int > +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) > +{ > + const uint8_t *src_1 = (const uint8_t *)_src_1; > + const uint8_t *src_2 = (const uint8_t *)_src_2; > + int ret = 0; > + > + if (n & 0x80) > + return rte_cmp128(src_1, src_2); > + > + if (n & 0x40) > + return rte_cmp64(src_1, src_2); > + > + if (n & 0x20) { > + ret = rte_cmp32(src_1, src_2); > + n -= 0x20; > + src_1 += 0x20; > + src_2 += 0x20; > + } > > Pardon me for butting in, but this seems incorrect for the first two cases listed above, as the function as written will only compare the first 128 or 64 bytes of each source and return the result. The pattern expressed in the 32 byte case appears more correct, as it compares the first 32 bytes and then lets later pieces of the function handle the smaller remaining bits of the sources. Also, if this function is to handle arbitrarily large source data, the 128 byte case needs to be in a loop. What am I missing? -- Matt Laswell infinite io, inc. laswell at infiniteio.com
[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.
On Fri, May 8, 2015 at 3:29 PM, Matt Laswell wrote: > > > On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur wrote: > >> This patch replaces memcmp in librte_hash with rte_memcmp which is >> implemented with AVX/SSE instructions. >> >> +static inline int >> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) >> +{ >> + const uint8_t *src_1 = (const uint8_t *)_src_1; >> + const uint8_t *src_2 = (const uint8_t *)_src_2; >> + int ret = 0; >> + >> + if (n & 0x80) >> + return rte_cmp128(src_1, src_2); >> + >> + if (n & 0x40) >> + return rte_cmp64(src_1, src_2); >> + >> + if (n & 0x20) { >> + ret = rte_cmp32(src_1, src_2); >> + n -= 0x20; >> + src_1 += 0x20; >> + src_2 += 0x20; >> + } >> >> > Pardon me for butting in, but this seems incorrect for the first two cases > listed above, as the function as written will only compare the first 128 or > 64 bytes of each source and return the result. The pattern expressed in > the 32 byte case appears more correct, as it compares the first 32 bytes > and then lets later pieces of the function handle the smaller remaining > bits of the sources. Also, if this function is to handle arbitrarily large > source data, the 128 byte case needs to be in a loop. > > What am I missing? > Current max hash key length supported is 64 bytes, hence no comparison is done after 64 bytes. 128 bytes comparison is added to measure performance only and there is no use-case as of now. With the current use-cases its not required but if there is a need to handle large arbitrary data upto 128 bytes it can be modified. > > -- > Matt Laswell > infinite io, inc. > laswell at infiniteio.com > >
[dpdk-dev] [PATCH v2] Clean up rte_memcpy.h file
Any inputs here? No functionality change just cleanup. I have run "make test" and "memcpy_perf_autotest". I have not noticed any changes in numbers. On Mon, Apr 20, 2015 at 1:33 PM, Ravi Kerur wrote: > Remove unnecessary type casting in functions. > > Tested on Ubuntu (14.04 x86_64) with "make test". > "make test" results match the results with baseline. > "Memcpy perf" results match the results with baseline. > > Signed-off-by: Ravi Kerur > --- > .../common/include/arch/x86/rte_memcpy.h | 340 > +++-- > 1 file changed, 175 insertions(+), 165 deletions(-) > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > index 6a57426..839d4ec 100644 > --- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h > @@ -106,8 +106,8 @@ rte_mov32(uint8_t *dst, const uint8_t *src) > static inline void > rte_mov64(uint8_t *dst, const uint8_t *src) > { > - rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32); > - rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32); > + rte_mov32(dst + 0 * 32, src + 0 * 32); > + rte_mov32(dst + 1 * 32, src + 1 * 32); > } > > /** > @@ -117,10 +117,10 @@ rte_mov64(uint8_t *dst, const uint8_t *src) > static inline void > rte_mov128(uint8_t *dst, const uint8_t *src) > { > - rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32); > - rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32); > - rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 * 32); > - rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32); > + rte_mov32(dst + 0 * 32, src + 0 * 32); > + rte_mov32(dst + 1 * 32, src + 1 * 32); > + rte_mov32(dst + 2 * 32, src + 2 * 32); > + rte_mov32(dst + 3 * 32, src + 3 * 32); > } > > /** > @@ -130,14 +130,14 @@ rte_mov128(uint8_t *dst, const uint8_t *src) > static inline void > rte_mov256(uint8_t *dst, const uint8_t *src) > { > - rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32); > - rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32); > - rte_mov32((uint8_t *)dst + 2 * 32, (const uint8_t *)src + 2 * 32); > - rte_mov32((uint8_t *)dst + 3 * 32, (const uint8_t *)src + 3 * 32); > - rte_mov32((uint8_t *)dst + 4 * 32, (const uint8_t *)src + 4 * 32); > - rte_mov32((uint8_t *)dst + 5 * 32, (const uint8_t *)src + 5 * 32); > - rte_mov32((uint8_t *)dst + 6 * 32, (const uint8_t *)src + 6 * 32); > - rte_mov32((uint8_t *)dst + 7 * 32, (const uint8_t *)src + 7 * 32); > + rte_mov32(dst + 0 * 32, src + 0 * 32); > + rte_mov32(dst + 1 * 32, src + 1 * 32); > + rte_mov32(dst + 2 * 32, src + 2 * 32); > + rte_mov32(dst + 3 * 32, src + 3 * 32); > + rte_mov32(dst + 4 * 32, src + 4 * 32); > + rte_mov32(dst + 5 * 32, src + 5 * 32); > + rte_mov32(dst + 6 * 32, src + 6 * 32); > + rte_mov32(dst + 7 * 32, src + 7 * 32); > } > > /** > @@ -150,13 +150,16 @@ rte_mov64blocks(uint8_t *dst, const uint8_t *src, > size_t n) > __m256i ymm0, ymm1; > > while (n >= 64) { > - ymm0 = _mm256_loadu_si256((const __m256i *)((const uint8_t > *)src + 0 * 32)); > + > + ymm0 = _mm256_loadu_si256((const __m256i *)(src + 0 * 32)); > + ymm1 = _mm256_loadu_si256((const __m256i *)(src + 1 * 32)); > + > + _mm256_storeu_si256((__m256i *)(dst + 0 * 32), ymm0); > + _mm256_storeu_si256((__m256i *)(dst + 1 * 32), ymm1); > + > n -= 64; > - ymm1 = _mm256_loadu_si256((const __m256i *)((const uint8_t > *)src + 1 * 32)); > - src = (const uint8_t *)src + 64; > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 0 * 32), > ymm0); > - _mm256_storeu_si256((__m256i *)((uint8_t *)dst + 1 * 32), > ymm1); > - dst = (uint8_t *)dst + 64; > + src = src + 64; > + dst = dst + 64; > } > } > > @@ -170,34 +173,39 @@ rte_mov256blocks(uint8_t *dst, const uint8_t *src, > size_t n) > __m256i ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7; > > while (n >= 256) { > - ymm0 = _mm256_loadu_si256((const __m256i *)((const uint8_t > *)src + 0 * 32)); > + > + ymm0 = _mm256_loadu_si256((const __m256i *)(src + 0 * 32)); > + ymm1 = _mm256_loadu_si256((const __m256i *)(src + 1 * 32)); > + ymm2 = _mm256_loadu_si256((const __m256i *)(src + 2 * 32)); > + ymm3 = _mm256_loadu_si256((const __m256i *)(src + 3 * 32)); > + ymm4 = _mm256_loadu_si256((const __m256i *)(src + 4 * 32)); > + ymm5 = _mm256_loadu_si256((const __m256i *)(src + 5 * 32)); > + ymm6 = _mm256_loadu_si256((const __m256i *)(src + 6 * 32)); > + ymm7 = _mm256_l
[dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE instructions.
On Fri, May 8, 2015 at 5:54 PM, Ravi Kerur wrote: > > > On Fri, May 8, 2015 at 3:29 PM, Matt Laswell > wrote: > >> >> >> On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur wrote: >> >>> This patch replaces memcmp in librte_hash with rte_memcmp which is >>> implemented with AVX/SSE instructions. >>> >>> +static inline int >>> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) >>> +{ >>> + const uint8_t *src_1 = (const uint8_t *)_src_1; >>> + const uint8_t *src_2 = (const uint8_t *)_src_2; >>> + int ret = 0; >>> + >>> + if (n & 0x80) >>> + return rte_cmp128(src_1, src_2); >>> + >>> + if (n & 0x40) >>> + return rte_cmp64(src_1, src_2); >>> + >>> + if (n & 0x20) { >>> + ret = rte_cmp32(src_1, src_2); >>> + n -= 0x20; >>> + src_1 += 0x20; >>> + src_2 += 0x20; >>> + } >>> >>> >> Pardon me for butting in, but this seems incorrect for the first two >> cases listed above, as the function as written will only compare the first >> 128 or 64 bytes of each source and return the result. The pattern >> expressed in the 32 byte case appears more correct, as it compares the >> first 32 bytes and then lets later pieces of the function handle the >> smaller remaining bits of the sources. Also, if this function is to handle >> arbitrarily large source data, the 128 byte case needs to be in a loop. >> >> What am I missing? >> > > Current max hash key length supported is 64 bytes, hence no comparison is > done after 64 bytes. 128 bytes comparison is added to measure performance > only and there is no use-case as of now. With the current use-cases its not > required but if there is a need to handle large arbitrary data upto 128 > bytes it can be modified. > Ah, gotcha. I misunderstood and thought that this was meant to be a generic AVX/SSE enabled memcmp() replacement, and that the use of it in rte_hash was meant merely as a test case. If it's more limited than that, carry on, though you might want to make a note of it in the documentation. I suspect others will misinterpret the name as I did. -- Matt Laswell infinite io, inc. laswell at infiniteio.com