[dpdk-dev] fm10k

2016-12-14 Thread Ruth Christen
Hey guys,



I'm using dpdk 16.4 and fm10k card. According to the code, there's no support 
for disabling vlan stripping and VLAN QinQ in pmd fm10k.

Does anybody know why? If there's any way to work-around it, or when is a 
behavior change expected?

I need my VF to receive the packets with the  VLAN headers.

Even if it's possible to change this configurations through the linux kernel 
fm10k driver?



Thanks!


Re: [dpdk-dev] [PATCH v2 12/12] drivers: update PMDs to use rte_driver probe and remove

2016-12-14 Thread Shreyansh Jain

On Tuesday 13 December 2016 07:07 PM, Shreyansh Jain wrote:

These callbacks now act as first layer of PCI interfaces from the Bus.
Bus probe would enter the PMDs through the rte_driver->probe/remove
callbacks, falling to rte_xxx_driver->probe/remove (Currently, all the
drivers are rte_pci_driver).

Signed-off-by: Shreyansh Jain 
---
 drivers/net/bnx2x/bnx2x_ethdev.c| 8 
 drivers/net/bnxt/bnxt_ethdev.c  | 4 
 drivers/net/cxgbe/cxgbe_ethdev.c| 4 
 drivers/net/e1000/em_ethdev.c   | 4 
 drivers/net/e1000/igb_ethdev.c  | 8 
 drivers/net/ena/ena_ethdev.c| 4 
 drivers/net/enic/enic_ethdev.c  | 4 
 drivers/net/fm10k/fm10k_ethdev.c| 4 
 drivers/net/i40e/i40e_ethdev.c  | 4 
 drivers/net/i40e/i40e_ethdev_vf.c   | 4 
 drivers/net/ixgbe/ixgbe_ethdev.c| 8 
 drivers/net/mlx4/mlx4.c | 4 +++-
 drivers/net/mlx5/mlx5.c | 1 +
 drivers/net/nfp/nfp_net.c   | 4 
 drivers/net/qede/qede_ethdev.c  | 8 
 drivers/net/szedata2/rte_eth_szedata2.c | 4 
 drivers/net/thunderx/nicvf_ethdev.c | 4 
 drivers/net/virtio/virtio_ethdev.c  | 2 ++
 drivers/net/vmxnet3/vmxnet3_ethdev.c| 4 
 19 files changed, 86 insertions(+), 1 deletion(-)




drivers/crypto/qat/rte_qat_cryptodev.c should also be changed for this. 
It seems to be only PCI registered PMD. All others are VDEV.


I will send a v3 soon to fix this.

@Jan, would you be looking in the VDEV part or should I start with that? [1]

[1] http://dpdk.org/ml/archives/dev/2016-November/050443.html



Re: [dpdk-dev] [PATCH 00/28] introduce I/O device memory read/write operations

2016-12-14 Thread Bruce Richardson
On Wed, Dec 14, 2016 at 10:53:57AM +0800, Yuanhan Liu wrote:
> On Wed, Dec 14, 2016 at 07:25:30AM +0530, Jerin Jacob wrote:
> > patchset 14-28: Replace the raw readl/writel in the drivers with
> > new rte_read[b/w/l/q], rte_write[b/w/l/q] eal abstraction
> 
> Instead of rte_read[b/w/l/q], there is another typical naming style:
> rte_read[8/16/32/64]. Any preferences? If you ask me, I'd prefer the
> later.
> 
I think I prefer the latter too, as it aligns with our naming of atomic
functions and our use of uint16_t etc. types.

/Bruce


Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-14 Thread Kevin Traynor
hi Adrien, sorry for the delay

<...>


 Is it expected that the application or pmd will provide locking between
 these functions if required? I think it's going to have to be the app.
>>>
>>> Locking is indeed expected to be performed by applications. This API only
>>> documents places where locking would make sense if necessary and expected
>>> behavior.
>>>
>>> Like all control path APIs, this one assumes a single control thread.
>>> Applications must take the necessary precautions.
>>
>> If you look at OVS now it's quite possible that you have 2 rx queues
>> serviced by different threads, that would also install the flow rules in
>> the software flow caches - possibly that could extend to adding hardware
>> flows. There could also be another thread that is querying for stats. So
>> anything that can be done to minimise the locking would be helpful -
>> maybe query() could be atomic and not require any locking?
> 
> I think we need basic functions with as few constraints as possible on PMDs
> first, this API being somewhat complex to implement on their side. That
> covers the common use case where applications have a single control thread
> or otherwise perform locking on their own.
> 
> Once the basics are there for most PMDs, we may add new functions, items,
> properties and actions that provide additional constraints (timing,
> multi-threading and so on), which remain to be defined according to
> feedback. It is designed to be extended without causing ABI breakage.

I think Sugesh and I are trying to foresee some of the issues that may
arise when integrating with something like OVS. OTOH it's
hard/impossible to say what will be needed exactly in the API right now
to make it suitable for OVS.

So, I'm ok with the approach you are taking by exposing a basic API
but I think there should be an expectation that it may not be sufficient
for a project like OVS to integrate in and may take several
iterations/extensions - don't go anywhere!

> 
> As for query(), let's see how PMDs handle it first. A race between query()
> and create() on a given device is almost unavoidable without locking, same
> for queries that reset counters in a given flow rule. Basic parallel queries
> should not cause any harm otherwise, although this cannot be guaranteed yet.

You still have a race if there is locking, except it is for the lock,
but it has the same effect. The downside of my suggestion is that all
the PMDs would need to guarantee they could gets stats atomically - I'm
not sure if they can or it's too restrictive.

> 

<...>

>>
>>>
> +
> +/**
> + * Destroy a flow rule on a given port.
> + *
> + * Failure to destroy a flow rule handle may occur when other flow rules
> + * depend on it, and destroying it would result in an inconsistent state.
> + *
> + * This function is only guaranteed to succeed if handles are destroyed 
> in
> + * reverse order of their creation.

 How can the application find this information out on error?
>>>
>>> Without maintaining a list, they cannot. The specified case is the only
>>> possible guarantee. That does not mean PMDs should not do their best to
>>> destroy flow rules, only that ordering must remain consistent in case of
>>> inability to destroy one.
>>>
>>> What do you suggest?
>>
>> I think if the app cannot remove a specific rule it may want to remove
>> all rules and deal with flows in software for a time. So once the app
>> knows it fails that should be enough.
> 
> OK, then since destruction may return an error already, is it fine?
> Applications may call rte_flow_flush() (not supposed to fail unless there is
> a serious issue, abort() in that case) and switch to SW fallback.

yes, it's fine.

> 

<...>

> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +rte_flow_query(uint8_t port_id,
> +struct rte_flow *flow,
> +enum rte_flow_action_type action,
> +void *data,
> +struct rte_flow_error *error);
> +
> +#ifdef __cplusplus
> +}
> +#endif

 I don't see a way to dump all the rules for a port out. I think this is
 neccessary for degbugging. You could have a look through dpif.h in OVS
 and see how dpif_flow_dump_next() is used, it might be a good reference.
>>>
>>> DPDK does not maintain flow rules and, depending on hardware capabilities
>>> and level of compliance, PMDs do not necessarily do it either, particularly
>>> since it requires space and application probably have a better method to
>>> store these pointers for their own needs.
>>
>> understood
>>
>>>
>>> What you see here is only a PMD interface. Depending on applications needs,
>>> generic helper functions built on top of these may be added to manage flow
>>> rules in the future.
>>
>> I'm thinking of the case where so

Re: [dpdk-dev] [PATCH 04/13] acl: allow zero verdict

2016-12-14 Thread Ananyev, Konstantin
Hi Michal,

> -Original Message-
> From: Michal Miroslaw [mailto:mirq-li...@rere.qmqm.pl]
> Sent: Wednesday, December 14, 2016 2:11 AM
> To: Ananyev, Konstantin 
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 04/13] acl: allow zero verdict
> 
> On Tue, Dec 13, 2016 at 09:55:12PM +, Ananyev, Konstantin wrote:
> >
> >
> > > -Original Message-
> > > From: Michal Miroslaw [mailto:mirq-li...@rere.qmqm.pl]
> > > Sent: Tuesday, December 13, 2016 6:03 PM
> > > To: Ananyev, Konstantin 
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH 04/13] acl: allow zero verdict
> > >
> > > On Tue, Dec 13, 2016 at 05:27:31PM +, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Michal Miroslaw [mailto:mirq-li...@rere.qmqm.pl]
> > > > > Sent: Tuesday, December 13, 2016 4:14 PM
> > > > > To: Ananyev, Konstantin 
> > > > > Cc: dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [PATCH 04/13] acl: allow zero verdict
> > > > >
> > > > > On Tue, Dec 13, 2016 at 03:13:42PM +, Ananyev, Konstantin wrote:
> > > > > [...]
> > > > > > > > > > > Subject: [dpdk-dev] [PATCH 04/13] acl: allow zero verdict
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Michał Mirosław 
> > > > > > > > > > > 
> > > > > > > > > > > ---
> > > > > > > > > > >  lib/librte_acl/rte_acl.c | 3 +--
> > > > > > > > > > >  lib/librte_acl/rte_acl.h | 2 --
> > > > > > > > > > >  lib/librte_table/rte_table_acl.c | 2 +-
> > > > > > > > > > >  3 files changed, 2 insertions(+), 5 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/lib/librte_acl/rte_acl.c 
> > > > > > > > > > > b/lib/librte_acl/rte_acl.c
> > > > > > > > > > > index 8b7e92c..d1f40be 100644
> > > > > > > > > > > --- a/lib/librte_acl/rte_acl.c
> > > > > > > > > > > +++ b/lib/librte_acl/rte_acl.c
> > > > > > > > > > > @@ -313,8 +313,7 @@ acl_check_rule(const struct 
> > > > > > > > > > > rte_acl_rule_data *rd)
> > > > > > > > > > >   if ((RTE_LEN2MASK(RTE_ACL_MAX_CATEGORIES, 
> > > > > > > > > > > typeof(rd->category_mask)) &
> > > > > > > > > > >   rd->category_mask) == 0 ||
> > > > > > > > > > >   rd->priority > RTE_ACL_MAX_PRIORITY ||
> > > > > > > > > > > - rd->priority < RTE_ACL_MIN_PRIORITY ||
> > > > > > > > > > > - rd->userdata == 
> > > > > > > > > > > RTE_ACL_INVALID_USERDATA)
> > > > > > > > > > > + rd->priority < RTE_ACL_MIN_PRIORITY)
> > > > > > > > > > >   return -EINVAL;
> > > > > > > > > > >   return 0;
> > > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > I am not sure, how it supposed to work properly?
> > > > > > > > > > Zero value is reserved and ifnicates that no match were 
> > > > > > > > > > found for that input.
> > > > > > > > >
> > > > > > > > > This is actually in use by us. In our use we don't need to 
> > > > > > > > > differentiate
> > > > > > > > > matching a rule with zero verdict vs not matching a rule at 
> > > > > > > > > all. I also
> > > > > > > > > have a patch that changes the value returned in non-matching 
> > > > > > > > > case, but
> > > > > > > > > it's in "dirty hack" state, as of yet.
> > > > > > > >
> > > > > > > > With that chane rte_acl_classify() might produce invalid 
> > > > > > > > results.
> > > > > > > > Even if you don't need it (I still don't understand how) , it 
> > > > > > > > doesn't mean other people
> > > > > > > > don't  need it either and it is ok to change it.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > The ACL code does not treat zero userdata specially, so this 
> > > > > > > > > is only
> > > > > > > > > a policy choice and as such would be better to be made by the 
> > > > > > > > > user.
> > > > > > > >
> > > > > > > > I believe it does.
> > > > > > > > userdata==0 is a reserved value.
> > > > > > > > When rte_acl_clasify() returns 0 for that particular input, it 
> > > > > > > > means 'no matches were found'.
> > > > > > >
> > > > > > > Dear Konstantin,
> > > > > > >
> > > > > > > Can you describe how the ACL code treats zero specially? I could 
> > > > > > > not find
> > > > > > > anything, really. The only thing I found is that iff I use zero 
> > > > > > > userdata
> > > > > > > in a rule I won't be able to differentiate a case where it 
> > > > > > > matched from
> > > > > > > a case where no rule matched.
> > > > > >
> > > > > > Yes, that's what I am talking about.
> > > > > >
> > > > > > > If I all my rules have non-zero userdata,
> > > > > > > then this patch changes nothing.
> > > > > >
> > > > > > Ok, then why do you remove a code that does checking for invalid 
> > > > > > userdata==0?
> > > > > > That supposed to prevent user to setup invalid value by mistake.
> > > > > >
> > > > > >  But if I have a table where 0 means drop
> > > > > > > (default-drop policy) then being able to use zero userdata in 
> > > > > > > DROP rules
> > > > > > > makes the ACLs just that more useful.
> > > > > >
> > > > > > Ok,

Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model

2016-12-14 Thread Jerin Jacob
On Thu, Dec 08, 2016 at 11:02:16AM +, Van Haaren, Harry wrote:
> > From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> > Sent: Thursday, December 8, 2016 1:24 AM
> > To: Van Haaren, Harry 
> 
> 
> 
> > > * Operation and sched_type *increased* to 4 bits each (from previous 
> > > value of 2) to
> > allow future expansion without ABI changes
> > 
> > Anyway it will break ABI if we add new operation. I would propose to keep 
> > 4bit
> > reserved and add it when required.
> 
> Ok sounds good. I'll suggest to move it to the middle between operation or 
> sched type, which would allow expanding operation without ABI breaks. On 
> expanding the field would remain in the same place with the same bits 
> available in that place (no ABI break), but new bits can be added into the 
> currently reserved space.

OK. We will move the rsvd field as you suggested.

> 
> 
> > > * Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
> > > * sub-event-type reduced to 4 bits (previous value was 8 bits). Can we 
> > > think of
> > situations where 16 values for application specified identifiers of each 
> > event-type is
> > genuinely not enough?
> > One packet will not go beyond 16 stages but an application may have more 
> > stages and
> > each packet may go mutually exclusive stages. For example,
> > 
> > packet 0: stagex_0 ->stagex_1
> > packet 1: stagey_0 ->stagey_1
> > 
> > In that sense, IMO, more than 16 is required.(AFIAK, VPP has any much 
> > larger limit on
> > number of stages)
> 
> My understanding was that stages are linked to event queues, so the 
> application can determine the stage the packet comes from by reading queue_id?

That is one way of doing it. But it is limited to number of queues
therefore scalability issues.Another approach is through
sub_event_type scheme without depended on the number of queues.

> 
> I'm not opposed to having an 8 bit sub_event_type, but it seems unnecessarily 
> large from my point of view. If you have a use for it, I'm ok with 8 bits.

OK

> 
> 
> > > In my opinion this structure layout is more balanced, and will perform 
> > > better due to
> > less loads that will need masking to access the required value.
> > OK. Considering more balanced layout and above points. I propose following 
> > scheme(based on
> > your input)
> > 
> > union {
> > uint64_t event;
> > struct {
> > uint32_t flow_id: 20;
> > uint32_t sub_event_type : 8;
> > uint32_t event_type : 4;
> > 
> > uint8_t rsvd: 4; /* for future additions */
> > uint8_t operation  : 2; /* new fwd drop */
> > uint8_t sched_type : 2;
> > 
> > uint8_t queue_id;
> > uint8_t priority;
> > uint8_t impl_opaque;
> > };
> > };
> > 
> > Feedback and improvements welcomed,
> 
> 
> So incorporating my latest suggestions on moving fields around, excluding 
> sub_event_type *size* changes:
> 
> union {
>   uint64_t event;
>   struct {
>   uint32_t flow_id: 20;
>   uint32_t event_type : 4;
>   uint32_t sub_event_type : 8; /* 8 bits now naturally aligned */

Just one suggestion here. I am not sure about the correct split between
number of bits to represent flow_id and sub_event_type fields. And its
connected in our implementation, so I propose to move sub_event_type up so
that future flow_id/sub_event_type bit field size change request wont
impact our implementation. Since it is represented as 32bit as whole, I
don't think there is an alignment issue.

So incorporating my latest suggestions on moving sub_event_type field around:

union {
uint64_t event;
struct {
uint32_t flow_id: 20;
uint32_t sub_event_type : 8;
uint32_t event_type : 4;

uint8_t operation  : 2; /* new fwd drop */
uint8_t rsvd: 4; /* for future additions, can be expanded into 
without ABI break */
uint8_t sched_type : 2;

uint8_t queue_id;
uint8_t priority;
uint8_t impl_opaque;
};
};



Re: [dpdk-dev] [PATCH 00/28] introduce I/O device memory read/write operations

2016-12-14 Thread Jerin Jacob
On Wed, Dec 14, 2016 at 10:53:57AM +0800, Yuanhan Liu wrote:
> On Wed, Dec 14, 2016 at 07:25:30AM +0530, Jerin Jacob wrote:
> > patchset 14-28: Replace the raw readl/writel in the drivers with
> > new rte_read[b/w/l/q], rte_write[b/w/l/q] eal abstraction
> 
> Instead of rte_read[b/w/l/q], there is another typical naming style:
> rte_read[8/16/32/64]. Any preferences? If you ask me, I'd prefer the
> later.

No strong opinion here. The rte_read[b/w/l/q] naming style is from Linux
kernel. I will change to rte_read[8/16/32/64] in v2 if there is no
objection.




Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-14 Thread Adrien Mazarguil
Hi Kevin,

On Wed, Dec 14, 2016 at 11:48:04AM +, Kevin Traynor wrote:
> hi Adrien, sorry for the delay
> 
> <...>
> 
> 
>  Is it expected that the application or pmd will provide locking between
>  these functions if required? I think it's going to have to be the app.
> >>>
> >>> Locking is indeed expected to be performed by applications. This API only
> >>> documents places where locking would make sense if necessary and expected
> >>> behavior.
> >>>
> >>> Like all control path APIs, this one assumes a single control thread.
> >>> Applications must take the necessary precautions.
> >>
> >> If you look at OVS now it's quite possible that you have 2 rx queues
> >> serviced by different threads, that would also install the flow rules in
> >> the software flow caches - possibly that could extend to adding hardware
> >> flows. There could also be another thread that is querying for stats. So
> >> anything that can be done to minimise the locking would be helpful -
> >> maybe query() could be atomic and not require any locking?
> > 
> > I think we need basic functions with as few constraints as possible on PMDs
> > first, this API being somewhat complex to implement on their side. That
> > covers the common use case where applications have a single control thread
> > or otherwise perform locking on their own.
> > 
> > Once the basics are there for most PMDs, we may add new functions, items,
> > properties and actions that provide additional constraints (timing,
> > multi-threading and so on), which remain to be defined according to
> > feedback. It is designed to be extended without causing ABI breakage.
> 
> I think Sugesh and I are trying to foresee some of the issues that may
> arise when integrating with something like OVS. OTOH it's
> hard/impossible to say what will be needed exactly in the API right now
> to make it suitable for OVS.
> 
> So, I'm ok with the approach you are taking by exposing a basic API
> but I think there should be an expectation that it may not be sufficient
> for a project like OVS to integrate in and may take several
> iterations/extensions - don't go anywhere!
> 
> > 
> > As for query(), let's see how PMDs handle it first. A race between query()
> > and create() on a given device is almost unavoidable without locking, same
> > for queries that reset counters in a given flow rule. Basic parallel queries
> > should not cause any harm otherwise, although this cannot be guaranteed yet.
> 
> You still have a race if there is locking, except it is for the lock,
> but it has the same effect. The downside of my suggestion is that all
> the PMDs would need to guarantee they could gets stats atomically - I'm
> not sure if they can or it's too restrictive.
> 
> > 
> 
> <...>
> 
> >>
> >>>
> > +
> > +/**
> > + * Destroy a flow rule on a given port.
> > + *
> > + * Failure to destroy a flow rule handle may occur when other flow 
> > rules
> > + * depend on it, and destroying it would result in an inconsistent 
> > state.
> > + *
> > + * This function is only guaranteed to succeed if handles are 
> > destroyed in
> > + * reverse order of their creation.
> 
>  How can the application find this information out on error?
> >>>
> >>> Without maintaining a list, they cannot. The specified case is the only
> >>> possible guarantee. That does not mean PMDs should not do their best to
> >>> destroy flow rules, only that ordering must remain consistent in case of
> >>> inability to destroy one.
> >>>
> >>> What do you suggest?
> >>
> >> I think if the app cannot remove a specific rule it may want to remove
> >> all rules and deal with flows in software for a time. So once the app
> >> knows it fails that should be enough.
> > 
> > OK, then since destruction may return an error already, is it fine?
> > Applications may call rte_flow_flush() (not supposed to fail unless there is
> > a serious issue, abort() in that case) and switch to SW fallback.
> 
> yes, it's fine.
> 
> > 
> 
> <...>
> 
> > + * @param[out] error
> > + *   Perform verbose error reporting if not NULL.
> > + *
> > + * @return
> > + *   0 on success, a negative errno value otherwise and rte_errno is 
> > set.
> > + */
> > +int
> > +rte_flow_query(uint8_t port_id,
> > +  struct rte_flow *flow,
> > +  enum rte_flow_action_type action,
> > +  void *data,
> > +  struct rte_flow_error *error);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> 
>  I don't see a way to dump all the rules for a port out. I think this is
>  neccessary for degbugging. You could have a look through dpif.h in OVS
>  and see how dpif_flow_dump_next() is used, it might be a good reference.
> >>>
> >>> DPDK does not maintain flow rules and, depending on hardware capabilities
> >>> and level of compliance, PMDs do not necessarily do it either, 
> >>> particularly
> >>> 

Re: [dpdk-dev] [PATCH 19/28] net/ena: use eal I/O device memory read/write API

2016-12-14 Thread Jan Mędala
Despite the issue with naming convention (either it will be writel or
write32), I'm fine with this change and new API.

Acked-by: Jan Medala 

  Jan

2016-12-14 2:55 GMT+01:00 Jerin Jacob :

> From: Santosh Shukla 
>
> Replace the raw I/O device memory read/write access with eal
> abstraction for I/O device memory read/write access to fix
> portability issues across different architectures.
>
> Signed-off-by: Santosh Shukla 
> Signed-off-by: Jerin Jacob 
> CC: Jan Medala 
> CC: Jakub Palider 
> ---
>  drivers/net/ena/base/ena_plat_dpdk.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ena/base/ena_plat_dpdk.h
> b/drivers/net/ena/base/ena_plat_dpdk.h
> index 87c3bf1..4db07c7 100644
> --- a/drivers/net/ena/base/ena_plat_dpdk.h
> +++ b/drivers/net/ena/base/ena_plat_dpdk.h
> @@ -50,6 +50,7 @@
>  #include 
>
>  #include 
> +#include 
>
>  typedef uint64_t u64;
>  typedef uint32_t u32;
> @@ -226,12 +227,12 @@ typedef uint64_t dma_addr_t;
>
>  static inline void writel(u32 value, volatile void  *addr)
>  {
> -   *(volatile u32 *)addr = value;
> +   rte_writel(value, addr);
>  }
>
>  static inline u32 readl(const volatile void *addr)
>  {
> -   return *(const volatile u32 *)addr;
> +   return rte_readl(addr);
>  }
>
>  #define ENA_REG_WRITE32(value, reg) writel((value), (reg))
> --
> 2.5.5
>
>


[dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst

2016-12-14 Thread tom . barbette
Hi list,

Between 2.2.0 and 16.04 (up to at least 16.07.2 if not current), with the XL710 
controller I do not get any packet when calling rte_eth_rx_burst if nb_pkts is 
too small. I would say smaller than 32. The input rate is not big, if that 
helps. But It should definitely get at least one packet per second.

Any ideas? Is that a bug or expected behaviour? Could be caused by other ABI 
changes?

Thanks,
Tom


Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model

2016-12-14 Thread Bruce Richardson
On Wed, Dec 14, 2016 at 06:43:58PM +0530, Jerin Jacob wrote:
> On Thu, Dec 08, 2016 at 11:02:16AM +, Van Haaren, Harry wrote:
> > > From: Jerin Jacob [mailto:jerin.ja...@caviumnetworks.com]
> > > Sent: Thursday, December 8, 2016 1:24 AM
> > > To: Van Haaren, Harry 
> > 
> > 
> > 
> > > > * Operation and sched_type *increased* to 4 bits each (from previous 
> > > > value of 2) to
> > > allow future expansion without ABI changes
> > > 
> > > Anyway it will break ABI if we add new operation. I would propose to keep 
> > > 4bit
> > > reserved and add it when required.
> > 
> > Ok sounds good. I'll suggest to move it to the middle between operation or 
> > sched type, which would allow expanding operation without ABI breaks. On 
> > expanding the field would remain in the same place with the same bits 
> > available in that place (no ABI break), but new bits can be added into the 
> > currently reserved space.
> 
> OK. We will move the rsvd field as you suggested.
> 
> > 
> > 
> > > > * Restore flow_id to 24 bits of a 32 bit int (previous size was 20 bits)
> > > > * sub-event-type reduced to 4 bits (previous value was 8 bits). Can we 
> > > > think of
> > > situations where 16 values for application specified identifiers of each 
> > > event-type is
> > > genuinely not enough?
> > > One packet will not go beyond 16 stages but an application may have more 
> > > stages and
> > > each packet may go mutually exclusive stages. For example,
> > > 
> > > packet 0: stagex_0 ->stagex_1
> > > packet 1: stagey_0 ->stagey_1
> > > 
> > > In that sense, IMO, more than 16 is required.(AFIAK, VPP has any much 
> > > larger limit on
> > > number of stages)
> > 
> > My understanding was that stages are linked to event queues, so the 
> > application can determine the stage the packet comes from by reading 
> > queue_id?
> 
> That is one way of doing it. But it is limited to number of queues
> therefore scalability issues.Another approach is through
> sub_event_type scheme without depended on the number of queues.
> 
> > 
> > I'm not opposed to having an 8 bit sub_event_type, but it seems 
> > unnecessarily large from my point of view. If you have a use for it, I'm ok 
> > with 8 bits.
> 
> OK
> 
> > 
> > 
> > > > In my opinion this structure layout is more balanced, and will perform 
> > > > better due to
> > > less loads that will need masking to access the required value.
> > > OK. Considering more balanced layout and above points. I propose 
> > > following scheme(based on
> > > your input)
> > > 
> > >   union {
> > >   uint64_t event;
> > >   struct {
> > >   uint32_t flow_id: 20;
> > >   uint32_t sub_event_type : 8;
> > >   uint32_t event_type : 4;
> > > 
> > >   uint8_t rsvd: 4; /* for future additions */
> > >   uint8_t operation  : 2; /* new fwd drop */
> > >   uint8_t sched_type : 2;
> > > 
> > >   uint8_t queue_id;
> > >   uint8_t priority;
> > >   uint8_t impl_opaque;
> > >   };
> > >   };
> > > 
> > > Feedback and improvements welcomed,
> > 
> > 
> > So incorporating my latest suggestions on moving fields around, excluding 
> > sub_event_type *size* changes:
> > 
> > union {
> > uint64_t event;
> > struct {
> > uint32_t flow_id: 20;
> > uint32_t event_type : 4;
> > uint32_t sub_event_type : 8; /* 8 bits now naturally aligned */
> 
> Just one suggestion here. I am not sure about the correct split between
> number of bits to represent flow_id and sub_event_type fields. And its
> connected in our implementation, so I propose to move sub_event_type up so
> that future flow_id/sub_event_type bit field size change request wont
> impact our implementation. Since it is represented as 32bit as whole, I
> don't think there is an alignment issue.
> 
> So incorporating my latest suggestions on moving sub_event_type field around:
> 
> union {
>   uint64_t event;
>   struct {
>   uint32_t flow_id: 20;
>   uint32_t sub_event_type : 8;
>   uint32_t event_type : 4;
> 

The issue with the above layout is that you have an 8-bit value which
can never be accessed as a byte. With the layout above proposed by
Harry, the sub_event_type can be accessed without any bit manipultaion
operations just by doing a byte read. With the layout you propose, all
fields require masking and/or shifting to access. It won't affect the
scheduler performance for us, but it means potentially more cycles in
the app to access those fields.

/Bruce



Re: [dpdk-dev] [PATCH v2 1/6] eventdev: introduce event driven programming model

2016-12-14 Thread Bruce Richardson
On Tue, Dec 06, 2016 at 09:22:15AM +0530, Jerin Jacob wrote:
> In a polling model, lcores poll ethdev ports and associated
> rx queues directly to look for packet. In an event driven model,
> by contrast, lcores call the scheduler that selects packets for
> them based on programmer-specified criteria. Eventdev library
> adds support for event driven programming model, which offer
> applications automatic multicore scaling, dynamic load balancing,
> pipelining, packet ingress order maintenance and
> synchronization services to simplify application packet processing.
> 
> By introducing event driven programming model, DPDK can support
> both polling and event driven programming models for packet processing,
> and applications are free to choose whatever model
> (or combination of the two) that best suits their needs.
> 
> This patch adds the eventdev specification header file.
> 
> Signed-off-by: Jerin Jacob 
> ---

> + *
> + * The *nb_events* parameter is the number of event objects to enqueue which 
> are
> + * supplied in the *ev* array of *rte_event* structure.
> + *
> + * The rte_event_enqueue_burst() function returns the number of
> + * events objects it actually enqueued. A return value equal to *nb_events*
> + * means that all event objects have been enqueued.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param port_id
> + *   The identifier of the event port.
> + * @param ev
> + *   Points to an array of *nb_events* objects of type *rte_event* structure
> + *   which contain the event object enqueue operations to be processed.
> + * @param nb_events
> + *   The number of event objects to enqueue, typically number of
> + *   rte_event_port_enqueue_depth() available for this port.
> + *
> + * @return
> + *   The number of event objects actually enqueued on the event device. The
> + *   return value can be less than the value of the *nb_events* parameter 
> when
> + *   the event devices queue is full or if invalid parameters are specified 
> in a
> + *   *rte_event*. If return value is less than *nb_events*, the remaining 
> events
> + *   at the end of ev[] are not consumed,and the caller has to take care of 
> them
> + *
> + * @see rte_event_port_enqueue_depth()
> + */
> +uint16_t
> +rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event 
> ev[],
> + uint16_t nb_events);
> +
One suggestion - do we want to make the ev[] array const, to disallow
drivers from modifying the events passed in? Since the event structure
is only 16B big, it should be small enough to be copied around in
scheduler instances, allow the original events to remain unmodified.

/Bruce


[dpdk-dev] [PATCH v5] vhost: allow for many vhost user ports

2016-12-14 Thread Jan Wickbom
Currently select() is used to monitor file descriptors for vhostuser
ports. This limits the number of ports possible to create since the
fd number is used as index in the fd_set and we have seen fds > 1023.
This patch changes select() to poll(). This way we can keep an
packed (pollfd) array for the fds, e.g. as many fds as the size of
the array.

Also see:
http://dpdk.org/ml/archives/dev/2016-April/037024.html

Signed-off-by: Jan Wickbom 
Reported-by: Patrik Andersson 
---

v5:
* pack of arrays moved after poll loop

v4:
* fdset_del can not shrink the array. Took care of that in sveral places
* Moved rwfds[] into struct fdset

v3:
* removed unnecessary include
* removed fdset_fill, made it functionally part of poll loop

v2:
* removed unnecessary casts
* static array replacing allocated memory

 lib/librte_vhost/fd_man.c | 234 ++
 lib/librte_vhost/fd_man.h |   2 +
 2 files changed, 135 insertions(+), 101 deletions(-)

diff --git a/lib/librte_vhost/fd_man.c b/lib/librte_vhost/fd_man.c
index 2d3eeb7..c7c2b35 100644
--- a/lib/librte_vhost/fd_man.c
+++ b/lib/librte_vhost/fd_man.c
@@ -35,93 +35,111 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
 #include "fd_man.h"
 
+#define FDPOLLERR (POLLERR | POLLHUP | POLLNVAL)
+
 /**
- * Returns the index in the fdset for a given fd.
- * If fd is -1, it means to search for a free entry.
+ * Adjusts the highest index populated in the array of fds
  * @return
- *   index for the fd, or -1 if fd isn't in the fdset.
+ *   The new size of fdset.
  */
 static int
-fdset_find_fd(struct fdset *pfdset, int fd)
+fdset_shrink(struct fdset *pfdset)
 {
-   int i;
+   int idx;
 
-   if (pfdset == NULL)
-   return -1;
+   /* Remove all empty spots in the end */
 
-   for (i = 0; i < MAX_FDS && pfdset->fd[i].fd != fd; i++)
+   for (idx = pfdset->num - 1;
+idx >= 0 && pfdset->fd[idx].fd == -1;
+idx--)
;
 
-   return i ==  MAX_FDS ? -1 : i;
+   pfdset->num = idx + 1;
+
+   return pfdset->num;
 }
 
-static int
-fdset_find_free_slot(struct fdset *pfdset)
+/**
+ *  Moves the fd from last slot to specified slot, including
+ *  corresponding pollfd
+ */
+static void
+fdset_move_last(struct fdset *pfdset, int idx)
 {
-   return fdset_find_fd(pfdset, -1);
+   int last_idx = pfdset->num - 1;
+
+   if (idx < last_idx) {
+   pfdset->fd[idx] = pfdset->fd[last_idx];
+   pfdset->fd[last_idx].fd = -1;
+
+   pfdset->rwfds[idx] = pfdset->rwfds[last_idx];
+   pfdset->rwfds[last_idx].revents = 0;
+   }
 }
 
-static int
-fdset_add_fd(struct fdset  *pfdset, int idx, int fd,
-   fd_cb rcb, fd_cb wcb, void *dat)
-{
-   struct fdentry *pfdentry;
 
-   if (pfdset == NULL || idx >= MAX_FDS || fd >= FD_SETSIZE)
-   return -1;
+/**
+ *  Packs the two arrays in the fdset structure
+ */
+static void
+fdset_pack(struct fdset *pfdset)
+{
+   int idx;
+   int num = pfdset->num;
 
-   pfdentry = &pfdset->fd[idx];
-   pfdentry->fd = fd;
-   pfdentry->rcb = rcb;
-   pfdentry->wcb = wcb;
-   pfdentry->dat = dat;
+   for (idx = 0; idx < num; idx++) {
+   if (pfdset->fd[idx].fd == -1) {
+   num = fdset_shrink(pfdset);
+   fdset_move_last(pfdset, idx);
+   }
+   }
 
-   return 0;
+   fdset_shrink(pfdset);
 }
 
 /**
- * Fill the read/write fd_set with the fds in the fdset.
+ * Returns the index in the fdset for a given fd.
+ * If fd is -1, it means to search for a free entry.
  * @return
- *  the maximum fds filled in the read/write fd_set.
+ *   index for the fd, or -1 if fd isn't in the fdset.
  */
 static int
-fdset_fill(fd_set *rfset, fd_set *wfset, struct fdset *pfdset)
+fdset_find_fd(struct fdset *pfdset, int fd)
 {
-   struct fdentry *pfdentry;
-   int i, maxfds = -1;
-   int num = MAX_FDS;
+   int i;
 
-   if (pfdset == NULL)
-   return -1;
+   for (i = 0; i < pfdset->num && pfdset->fd[i].fd != fd; i++)
+   ;
 
-   for (i = 0; i < num; i++) {
-   pfdentry = &pfdset->fd[i];
-   if (pfdentry->fd != -1) {
-   int added = 0;
-   if (pfdentry->rcb && rfset) {
-   FD_SET(pfdentry->fd, rfset);
-   added = 1;
-   }
-   if (pfdentry->wcb && wfset) {
-   FD_SET(pfdentry->fd, wfset);
-   added = 1;
-   }
-   if (added)
-   maxfds = pfdentry->fd < maxfds ?
-   maxfds : pfdentry->fd;
-   }
-   }
-   return maxfds;
+   return i == pfdset->num ? -1 : i;
+}

Re: [dpdk-dev] [PATCH v4] net/kni: add KNI PMD

2016-12-14 Thread Ferruh Yigit
On 12/12/2016 9:59 PM, Yong Wang wrote:
>> -Original Message-
>> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
>> Sent: Wednesday, November 30, 2016 10:12 AM
>> To: dev@dpdk.org
>> Cc: Ferruh Yigit ; Yong Wang
>> 
>> Subject: [PATCH v4] net/kni: add KNI PMD
>>
>> Add KNI PMD which wraps librte_kni for ease of use.
>>
>> KNI PMD can be used as any regular PMD to send / receive packets to the
>> Linux networking stack.
>>
>> Signed-off-by: Ferruh Yigit 
>> ---
>>
>> v4:
>> * allow only single queue
>> * use driver.name as name
>>
>> v3:
>> * rebase on top of latest master
>>
>> v2:
>> * updated driver name eth_kni -> net_kni
>> ---
>>  config/common_base  |   1 +
>>  config/common_linuxapp  |   1 +
>>  drivers/net/Makefile|   1 +
>>  drivers/net/kni/Makefile|  63 +
>>  drivers/net/kni/rte_eth_kni.c   | 462
>> 
>>  drivers/net/kni/rte_pmd_kni_version.map |   4 +
>>  mk/rte.app.mk   |  10 +-
>>  7 files changed, 537 insertions(+), 5 deletions(-)
>>  create mode 100644 drivers/net/kni/Makefile
>>  create mode 100644 drivers/net/kni/rte_eth_kni.c
>>  create mode 100644 drivers/net/kni/rte_pmd_kni_version.map
>>
>> diff --git a/config/common_base b/config/common_base
>> index 4bff83a..3385879 100644
>> --- a/config/common_base
>> +++ b/config/common_base
>> @@ -543,6 +543,7 @@ CONFIG_RTE_PIPELINE_STATS_COLLECT=n
>>  # Compile librte_kni
>>  #
>>  CONFIG_RTE_LIBRTE_KNI=n
>> +CONFIG_RTE_LIBRTE_PMD_KNI=n
>>  CONFIG_RTE_KNI_KMOD=n
>>  CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
>>  CONFIG_RTE_KNI_VHOST=n
>> diff --git a/config/common_linuxapp b/config/common_linuxapp
>> index 2483dfa..2ecd510 100644
>> --- a/config/common_linuxapp
>> +++ b/config/common_linuxapp
>> @@ -39,6 +39,7 @@ CONFIG_RTE_EAL_IGB_UIO=y
>>  CONFIG_RTE_EAL_VFIO=y
>>  CONFIG_RTE_KNI_KMOD=y
>>  CONFIG_RTE_LIBRTE_KNI=y
>> +CONFIG_RTE_LIBRTE_PMD_KNI=y
>>  CONFIG_RTE_LIBRTE_VHOST=y
>>  CONFIG_RTE_LIBRTE_PMD_VHOST=y
>>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
>> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>> index bc93230..c4771cd 100644
>> --- a/drivers/net/Makefile
>> +++ b/drivers/net/Makefile
>> @@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
>>  DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
>>  DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
>>  DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
>> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += kni
>>  DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
>>  DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
>>  DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
>> diff --git a/drivers/net/kni/Makefile b/drivers/net/kni/Makefile
>> new file mode 100644
>> index 000..0b7cf91
>> --- /dev/null
>> +++ b/drivers/net/kni/Makefile
>> @@ -0,0 +1,63 @@
>> +#   BSD LICENSE
>> +#
>> +#   Copyright(c) 2016 Intel Corporation. All rights reserved.
>> +#
>> +#   Redistribution and use in source and binary forms, with or without
>> +#   modification, are permitted provided that the following conditions
>> +#   are met:
>> +#
>> +# * Redistributions of source code must retain the above copyright
>> +#   notice, this list of conditions and the following disclaimer.
>> +# * Redistributions in binary form must reproduce the above copyright
>> +#   notice, this list of conditions and the following disclaimer in
>> +#   the documentation and/or other materials provided with the
>> +#   distribution.
>> +# * Neither the name of Intel Corporation nor the names of its
>> +#   contributors may be used to endorse or promote products derived
>> +#   from this software without specific prior written permission.
>> +#
>> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>> CONTRIBUTORS
>> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
>> NOT
>> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
>> FITNESS FOR
>> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
>> COPYRIGHT
>> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
>> INCIDENTAL,
>> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
>> NOT
>> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
>> OF USE,
>> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
>> AND ON ANY
>> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
>> TORT
>> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
>> THE USE
>> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
>> DAMAGE.
>> +
>> +include $(RTE_SDK)/mk/rte.vars.mk
>> +
>> +#
>> +# library name
>> +#
>> +LIB = librte_pmd_kni.a
>> +
>> +CFLAGS += -O3
>> +CFLAGS += $(WERROR_FLAGS)
>> +LDLIBS += -lpthread
>> +
>> +EXPORT_MAP := rte_pmd_kni_version.map
>> +
>> +LIBABIVER := 1
>> +
>> +#
>> +# all source are stored in SRCS-y
>> +#
>> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += rte_eth_kni.c
>> +
>> +#
>> +# Export include files
>>

Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API

2016-12-14 Thread Kevin Traynor
On 12/14/2016 01:54 PM, Adrien Mazarguil wrote:

>>
>>> + * @param[out] error
>>> + *   Perform verbose error reporting if not NULL.
>>> + *
>>> + * @return
>>> + *   0 on success, a negative errno value otherwise and rte_errno is 
>>> set.
>>> + */
>>> +int
>>> +rte_flow_query(uint8_t port_id,
>>> +  struct rte_flow *flow,
>>> +  enum rte_flow_action_type action,
>>> +  void *data,
>>> +  struct rte_flow_error *error);
>>> +
>>> +#ifdef __cplusplus
>>> +}
>>> +#endif
>>
>> I don't see a way to dump all the rules for a port out. I think this is
>> neccessary for degbugging. You could have a look through dpif.h in OVS
>> and see how dpif_flow_dump_next() is used, it might be a good reference.
>
> DPDK does not maintain flow rules and, depending on hardware capabilities
> and level of compliance, PMDs do not necessarily do it either, 
> particularly
> since it requires space and application probably have a better method to
> store these pointers for their own needs.

 understood

>
> What you see here is only a PMD interface. Depending on applications 
> needs,
> generic helper functions built on top of these may be added to manage flow
> rules in the future.

 I'm thinking of the case where something goes wrong and I want to get a
 dump of all the flow rules from hardware, not query the rules I think I
 have. I don't see a way to do it or something to build a helper on top of?
>>>
>>> Generic helper functions would exist on top of this API and would likely
>>> maintain a list of flow rules themselves. The dump in that case would be
>>> entirely implemented in software. I think that recovering flow rules from HW
>>> may be complicated in many cases (even without taking storage allocation and
>>> rules conversion issues into account), therefore if there is really a need
>>> for it, we could perhaps add a dump() function that PMDs are free to
>>> implement later.
>>>
>>
>> ok. Maybe there are some more generic stats that can be got from the
>> hardware that would help debugging that would suffice, like total flow
>> rule hits/misses (i.e. not on a per flow rule basis).
>>
>> You can get this from the software flow caches and it's widely used for
>> debugging. e.g.
>>
>> pmd thread numa_id 0 core_id 3:
>>  emc hits:0
>>  megaflow hits:0
>>  avg. subtable lookups per hit:0.00
>>  miss:0
>>
> 
> Perhaps a rule such as the following could do the trick:
> 
>  group: 42 (or priority 42)
>  pattern: void
>  actions: count / passthru
> 
> Assuming useful flow rules are defined with higher priorities (using lower
> group ID or priority level) and provide a terminating action, this one would
> count all packets that were not caught by them.
> 
> That is one example to illustrate how "global" counters can be requested by
> applications.
> 
> Otherwise you could just make sure all rules contain mark / flag actions, in
> which case mbufs would tell directly if they went through them or need
> additional SW processing.
> 

ok, sounds like there's some options at least to work with on this which
is good. thanks.


Re: [dpdk-dev] No packets received if burst is too small in rte_eth_rx_burst

2016-12-14 Thread Bruce Richardson
On Wed, Dec 14, 2016 at 04:13:53PM +0100, tom.barbe...@ulg.ac.be wrote:
> Hi list,
> 
> Between 2.2.0 and 16.04 (up to at least 16.07.2 if not current), with the 
> XL710 controller I do not get any packet when calling rte_eth_rx_burst if 
> nb_pkts is too small. I would say smaller than 32. The input rate is not big, 
> if that helps. But It should definitely get at least one packet per second.
> 
> Any ideas? Is that a bug or expected behaviour? Could be caused by other ABI 
> changes?
> 
Does this issue still occur even if you disable the vector driver in
your build-time configuration?

/Bruce


Re: [dpdk-dev] [PATCH 10/13] KNI: provided netif name's source is user-space

2016-12-14 Thread Ferruh Yigit
Hi Michal,

On 12/13/2016 1:08 AM, Michał Mirosław wrote:
> Signed-off-by: Michał Mirosław 
> ---
>  lib/librte_eal/linuxapp/kni/kni_misc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
> b/lib/librte_eal/linuxapp/kni/kni_misc.c
> index 497db9b..f0247aa 100644
> --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
> +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
> @@ -363,8 +363,8 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
>   up_read(&knet->kni_list_lock);
>  
>   net_dev = alloc_netdev(sizeof(struct kni_dev), dev_info.name,
> -#ifdef NET_NAME_UNKNOWN
> - NET_NAME_UNKNOWN,
> +#ifdef NET_NAME_USER
> + NET_NAME_USER,

Probably NET_NAME_USER fits better to definition.
But functionally, I wonder if you have a use case to use
"name_assign_type" ?

>  #endif
>   kni_net_init);
>   if (net_dev == NULL) {
> 



[dpdk-dev] [PATCH v2] acl: allow zero verdict

2016-12-14 Thread Michał Mirosław
Signed-off-by: Michał Mirosław 
---
v2: fixes prog_guide and ACL tests

---
 app/test/test_acl.c  | 13 -
 doc/guides/prog_guide/packet_classif_access_ctrl.rst |  3 ++-
 lib/librte_acl/rte_acl.c |  3 +--
 lib/librte_acl/rte_acl.h |  2 --
 lib/librte_table/rte_table_acl.c |  2 +-
 5 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index be744ec..c6b511f 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -1357,19 +1357,6 @@ test_invalid_rules(void)
goto err;
}
 
-   rule.dst_mask_len = 0;
-   rule.src_mask_len = 0;
-   rule.data.userdata = 0;
-
-   /* try adding this rule (it should fail because userdata is invalid) */
-   ret = rte_acl_ipv4vlan_add_rules(acx, &rule, 1);
-   if (ret == 0) {
-   printf("Line %i: Adding a rule with invalid user data "
-   "should have failed!\n", __LINE__);
-   rte_acl_free(acx);
-   return -1;
-   }
-
rte_acl_free(acx);
 
return 0;
diff --git a/doc/guides/prog_guide/packet_classif_access_ctrl.rst 
b/doc/guides/prog_guide/packet_classif_access_ctrl.rst
index 5fd3d34..a6bee9b 100644
--- a/doc/guides/prog_guide/packet_classif_access_ctrl.rst
+++ b/doc/guides/prog_guide/packet_classif_access_ctrl.rst
@@ -329,8 +329,9 @@ When creating a set of rules, for each rule, additional 
information must be supp
 Each set could be assigned its own category and by combining them into a 
single database,
 one lookup returns a result for each of the four sets.
 
-*   **userdata**: A user-defined field that could be any value except zero.
+*   **userdata**: A user-defined value.
 For each category, a successful match returns the userdata field of the 
highest priority matched rule.
+When no rules match, returned value is zero.
 
 .. note::
 
diff --git a/lib/librte_acl/rte_acl.c b/lib/librte_acl/rte_acl.c
index 8b7e92c..d1f40be 100644
--- a/lib/librte_acl/rte_acl.c
+++ b/lib/librte_acl/rte_acl.c
@@ -313,8 +313,7 @@ acl_check_rule(const struct rte_acl_rule_data *rd)
if ((RTE_LEN2MASK(RTE_ACL_MAX_CATEGORIES, typeof(rd->category_mask)) &
rd->category_mask) == 0 ||
rd->priority > RTE_ACL_MAX_PRIORITY ||
-   rd->priority < RTE_ACL_MIN_PRIORITY ||
-   rd->userdata == RTE_ACL_INVALID_USERDATA)
+   rd->priority < RTE_ACL_MIN_PRIORITY)
return -EINVAL;
return 0;
 }
diff --git a/lib/librte_acl/rte_acl.h b/lib/librte_acl/rte_acl.h
index caa91f7..b53179a 100644
--- a/lib/librte_acl/rte_acl.h
+++ b/lib/librte_acl/rte_acl.h
@@ -120,8 +120,6 @@ enum {
RTE_ACL_MIN_PRIORITY = 0,
 };
 
-#defineRTE_ACL_INVALID_USERDATA0
-
 #defineRTE_ACL_MASKLEN_TO_BITMASK(v, s)\
 ((v) == 0 ? (v) : (typeof(v))((uint64_t)-1 << ((s) * CHAR_BIT - (v
 
diff --git a/lib/librte_table/rte_table_acl.c b/lib/librte_table/rte_table_acl.c
index 8f1f8ce..94b69a9 100644
--- a/lib/librte_table/rte_table_acl.c
+++ b/lib/librte_table/rte_table_acl.c
@@ -792,7 +792,7 @@ rte_table_acl_lookup(
 
pkts_mask &= ~pkt_mask;
 
-   if (action_table_pos != RTE_ACL_INVALID_USERDATA) {
+   if (action_table_pos != 0) {
pkts_out_mask |= pkt_mask;
entries[pkt_pos] = (void *)
&acl->memory[action_table_pos *
-- 
2.10.2



[dpdk-dev] [PATCH] acl: remove invalid test

2016-12-14 Thread Michał Mirosław
rte_acl_add_rules() has no way of checking rule size.

This was hidden because the test effectively checked that
adding a rule with userdata == 0 failed.

Signed-off-by: Michał Mirosław 
---
 app/test/test_acl.c | 20 
 1 file changed, 20 deletions(-)

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index 28955f0..be744ec 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -1515,26 +1515,6 @@ test_invalid_parameters(void)
/* free ACL context */
rte_acl_free(acx);
 
-   /* set wrong rule_size so that adding any rules would fail */
-   param.rule_size = RTE_ACL_IPV4VLAN_RULE_SZ + 4;
-   acx = rte_acl_create(¶m);
-   if (acx == NULL) {
-   printf("Line %i: ACL context creation failed!\n", __LINE__);
-   return -1;
-   }
-
-   /* try adding a rule with size different from context rule_size */
-   result = rte_acl_ipv4vlan_add_rules(acx, &rule, 1);
-   if (result == 0) {
-   printf("Line %i: Adding an invalid sized rule "
-   "should have failed!\n", __LINE__);
-   rte_acl_free(acx);
-   return -1;
-   }
-
-   /* free ACL context */
-   rte_acl_free(acx);
-
 
/**
 * rte_acl_ipv4vlan_build
-- 
2.10.2



Re: [dpdk-dev] [PATCH 10/13] KNI: provided netif name's source is user-space

2016-12-14 Thread Michał Mirosław
On Wed, Dec 14, 2016 at 05:06:23PM +, Ferruh Yigit wrote:
> Hi Michal,
> 
> On 12/13/2016 1:08 AM, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław 
> > ---
> >  lib/librte_eal/linuxapp/kni/kni_misc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
> > b/lib/librte_eal/linuxapp/kni/kni_misc.c
> > index 497db9b..f0247aa 100644
> > --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
> > +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
> > @@ -363,8 +363,8 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
> > up_read(&knet->kni_list_lock);
> >  
> > net_dev = alloc_netdev(sizeof(struct kni_dev), dev_info.name,
> > -#ifdef NET_NAME_UNKNOWN
> > -   NET_NAME_UNKNOWN,
> > +#ifdef NET_NAME_USER
> > +   NET_NAME_USER,
> 
> Probably NET_NAME_USER fits better to definition.
> But functionally, I wonder if you have a use case to use
> "name_assign_type" ?

I just read what NET_NAME_UNKNOWN/NET_NAME_USER is supposed to do.
UNKNOWN is for "old" drivers that don't know the source of the name.
USER is for a name, that comes from a user and as such is not to be
renamed by udev.

Best Regards,
Michał Mirosław


Re: [dpdk-dev] [PATCH 11/13] KNI: guard against unterminated dev_info.name leading to BUG in alloc_netdev()

2016-12-14 Thread Ferruh Yigit
On 12/13/2016 1:08 AM, Michał Mirosław wrote:
> Signed-off-by: Michał Mirosław 
> ---

Acked-by: Ferruh Yigit 

I guess no harm doing extra check on user input.


Re: [dpdk-dev] [PATCH 10/13] KNI: provided netif name's source is user-space

2016-12-14 Thread Ferruh Yigit
On 12/14/2016 5:19 PM, Michał Mirosław wrote:
> On Wed, Dec 14, 2016 at 05:06:23PM +, Ferruh Yigit wrote:
>> Hi Michal,
>>
>> On 12/13/2016 1:08 AM, Michał Mirosław wrote:
>>> Signed-off-by: Michał Mirosław 
>>> ---
>>>  lib/librte_eal/linuxapp/kni/kni_misc.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/librte_eal/linuxapp/kni/kni_misc.c 
>>> b/lib/librte_eal/linuxapp/kni/kni_misc.c
>>> index 497db9b..f0247aa 100644
>>> --- a/lib/librte_eal/linuxapp/kni/kni_misc.c
>>> +++ b/lib/librte_eal/linuxapp/kni/kni_misc.c
>>> @@ -363,8 +363,8 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num,
>>> up_read(&knet->kni_list_lock);
>>>  
>>> net_dev = alloc_netdev(sizeof(struct kni_dev), dev_info.name,
>>> -#ifdef NET_NAME_UNKNOWN
>>> -   NET_NAME_UNKNOWN,
>>> +#ifdef NET_NAME_USER
>>> +   NET_NAME_USER,
>>
>> Probably NET_NAME_USER fits better to definition.
>> But functionally, I wonder if you have a use case to use
>> "name_assign_type" ?
> 
> I just read what NET_NAME_UNKNOWN/NET_NAME_USER is supposed to do.
> UNKNOWN is for "old" drivers that don't know the source of the name.
> USER is for a name, that comes from a user and as such is not to be
> renamed by udev.

That is what I wonder if you are doing anything special in userspace
related iface name. But it seems the patch is to make it more proper, I
have no objection.

> 
> Best Regards,
> Michał Mirosław
> 



Re: [dpdk-dev] [PATCH 10/13] KNI: provided netif name's source is user-space

2016-12-14 Thread Ferruh Yigit
On 12/14/2016 5:35 PM, Ferruh Yigit wrote:
> On 12/14/2016 5:19 PM, Michał Mirosław wrote:
>> On Wed, Dec 14, 2016 at 05:06:23PM +, Ferruh Yigit wrote:
>>> Hi Michal,
>>>
>>> On 12/13/2016 1:08 AM, Michał Mirosław wrote:
 Signed-off-by: Michał Mirosław 
 ---

Acked-by: Ferruh Yigit 



Re: [dpdk-dev] [PATCH 11/13] KNI: guard against unterminated dev_info.name leading to BUG in alloc_netdev()

2016-12-14 Thread Michał Mirosław
On Wed, Dec 14, 2016 at 05:33:11PM +, Ferruh Yigit wrote:
> On 12/13/2016 1:08 AM, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław 
> > ---
> 
> Acked-by: Ferruh Yigit 
> 
> I guess no harm doing extra check on user input.

This actually prevents an OOPS that happens when you try to create KNI with
too long name.

Best Regards,
Michał Mirosław


Re: [dpdk-dev] [PATCH 11/13] KNI: guard against unterminated dev_info.name leading to BUG in alloc_netdev()

2016-12-14 Thread Ferruh Yigit
On 12/14/2016 5:37 PM, Michał Mirosław wrote:
> On Wed, Dec 14, 2016 at 05:33:11PM +, Ferruh Yigit wrote:
>> On 12/13/2016 1:08 AM, Michał Mirosław wrote:
>>> Signed-off-by: Michał Mirosław 
>>> ---
>>
>> Acked-by: Ferruh Yigit 
>>
>> I guess no harm doing extra check on user input.
> 
> This actually prevents an OOPS that happens when you try to create KNI with
> too long name.

Thanks for fixing then.



Re: [dpdk-dev] [PATCH 07/13] pcap: fix timestamps in output pcap file

2016-12-14 Thread Ferruh Yigit
On 12/13/2016 1:08 AM, Michał Mirosław wrote:
> From: Piotr Bartosiewicz 
> 
> Signed-off-by: Michał Mirosław 
> ---

Acked-by: Ferruh Yigit 



[dpdk-dev] [PATCH v4] null: fake PMD capabilities

2016-12-14 Thread Michał Mirosław
From: Paweł Małachowski 

This allows for testing code which needs offloads to be supported.

Signed-off-by: Michał Mirosław 
---
 drivers/net/null/rte_eth_null.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/net/null/rte_eth_null.c b/drivers/net/null/rte_eth_null.c
index 836d982..e60516f 100644
--- a/drivers/net/null/rte_eth_null.c
+++ b/drivers/net/null/rte_eth_null.c
@@ -284,6 +284,9 @@ eth_tx_queue_setup(struct rte_eth_dev *dev, uint16_t 
tx_queue_id,
return 0;
 }
 
+static void
+eth_dev_void_ok(struct rte_eth_dev *dev __rte_unused) { return; }
+
 
 static void
 eth_dev_info(struct rte_eth_dev *dev,
@@ -304,6 +307,19 @@ eth_dev_info(struct rte_eth_dev *dev,
dev_info->pci_dev = NULL;
dev_info->reta_size = internals->reta_size;
dev_info->flow_type_rss_offloads = internals->flow_type_rss_offloads;
+   /* We hereby declare we can RX offload VLAN-s out of thin air (they are 
not there)
+*/
+   dev_info->rx_offload_capa = DEV_RX_OFFLOAD_VLAN_STRIP |
+   DEV_RX_OFFLOAD_QINQ_STRIP;
+   /* We promise we will do any TX offloads before passing packets to 
/dev/null
+*/
+   dev_info->tx_offload_capa = DEV_TX_OFFLOAD_VLAN_INSERT |
+   DEV_TX_OFFLOAD_IPV4_CKSUM | DEV_TX_OFFLOAD_UDP_CKSUM |
+   DEV_TX_OFFLOAD_TCP_CKSUM | DEV_TX_OFFLOAD_SCTP_CKSUM |
+   DEV_TX_OFFLOAD_TCP_TSO | DEV_TX_OFFLOAD_UDP_TSO |
+   DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM | DEV_TX_OFFLOAD_QINQ_INSERT |
+   DEV_TX_OFFLOAD_VXLAN_TNL_TSO | DEV_TX_OFFLOAD_GRE_TNL_TSO |
+   DEV_TX_OFFLOAD_IPIP_TNL_TSO | DEV_TX_OFFLOAD_GENEVE_TNL_TSO;
 }
 
 static void
@@ -477,7 +493,12 @@ static const struct eth_dev_ops ops = {
.reta_update = eth_rss_reta_update,
.reta_query = eth_rss_reta_query,
.rss_hash_update = eth_rss_hash_update,
-   .rss_hash_conf_get = eth_rss_hash_conf_get
+   .rss_hash_conf_get = eth_rss_hash_conf_get,
+   /* Fake our capabilities */
+   .promiscuous_enable   = eth_dev_void_ok,
+   .promiscuous_disable  = eth_dev_void_ok,
+   .allmulticast_enable  = eth_dev_void_ok,
+   .allmulticast_disable = eth_dev_void_ok
 };
 
 int
-- 
2.10.2



Re: [dpdk-dev] [PATCH v4] net/kni: add KNI PMD

2016-12-14 Thread Yong Wang
> -Original Message-
> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> Sent: Wednesday, December 14, 2016 8:00 AM
> To: Yong Wang ; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v4] net/kni: add KNI PMD
> 
> On 12/12/2016 9:59 PM, Yong Wang wrote:
> >> -Original Message-
> >> From: Ferruh Yigit [mailto:ferruh.yi...@intel.com]
> >> Sent: Wednesday, November 30, 2016 10:12 AM
> >> To: dev@dpdk.org
> >> Cc: Ferruh Yigit ; Yong Wang
> >> 
> >> Subject: [PATCH v4] net/kni: add KNI PMD
> >>
> >> Add KNI PMD which wraps librte_kni for ease of use.
> >>
> >> KNI PMD can be used as any regular PMD to send / receive packets to the
> >> Linux networking stack.
> >>
> >> Signed-off-by: Ferruh Yigit 
> >> ---
> >>
> >> v4:
> >> * allow only single queue
> >> * use driver.name as name
> >>
> >> v3:
> >> * rebase on top of latest master
> >>
> >> v2:
> >> * updated driver name eth_kni -> net_kni
> >> ---
> >>  config/common_base  |   1 +
> >>  config/common_linuxapp  |   1 +
> >>  drivers/net/Makefile|   1 +
> >>  drivers/net/kni/Makefile|  63 +
> >>  drivers/net/kni/rte_eth_kni.c   | 462
> >> 
> >>  drivers/net/kni/rte_pmd_kni_version.map |   4 +
> >>  mk/rte.app.mk   |  10 +-
> >>  7 files changed, 537 insertions(+), 5 deletions(-)
> >>  create mode 100644 drivers/net/kni/Makefile
> >>  create mode 100644 drivers/net/kni/rte_eth_kni.c
> >>  create mode 100644 drivers/net/kni/rte_pmd_kni_version.map
> >>
> >> diff --git a/config/common_base b/config/common_base
> >> index 4bff83a..3385879 100644
> >> --- a/config/common_base
> >> +++ b/config/common_base
> >> @@ -543,6 +543,7 @@ CONFIG_RTE_PIPELINE_STATS_COLLECT=n
> >>  # Compile librte_kni
> >>  #
> >>  CONFIG_RTE_LIBRTE_KNI=n
> >> +CONFIG_RTE_LIBRTE_PMD_KNI=n
> >>  CONFIG_RTE_KNI_KMOD=n
> >>  CONFIG_RTE_KNI_PREEMPT_DEFAULT=y
> >>  CONFIG_RTE_KNI_VHOST=n
> >> diff --git a/config/common_linuxapp b/config/common_linuxapp
> >> index 2483dfa..2ecd510 100644
> >> --- a/config/common_linuxapp
> >> +++ b/config/common_linuxapp
> >> @@ -39,6 +39,7 @@ CONFIG_RTE_EAL_IGB_UIO=y
> >>  CONFIG_RTE_EAL_VFIO=y
> >>  CONFIG_RTE_KNI_KMOD=y
> >>  CONFIG_RTE_LIBRTE_KNI=y
> >> +CONFIG_RTE_LIBRTE_PMD_KNI=y
> >>  CONFIG_RTE_LIBRTE_VHOST=y
> >>  CONFIG_RTE_LIBRTE_PMD_VHOST=y
> >>  CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
> >> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> >> index bc93230..c4771cd 100644
> >> --- a/drivers/net/Makefile
> >> +++ b/drivers/net/Makefile
> >> @@ -41,6 +41,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
> >>  DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
> >>  DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
> >>  DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
> >> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += kni
> >>  DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
> >>  DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
> >>  DIRS-$(CONFIG_RTE_LIBRTE_MPIPE_PMD) += mpipe
> >> diff --git a/drivers/net/kni/Makefile b/drivers/net/kni/Makefile
> >> new file mode 100644
> >> index 000..0b7cf91
> >> --- /dev/null
> >> +++ b/drivers/net/kni/Makefile
> >> @@ -0,0 +1,63 @@
> >> +#   BSD LICENSE
> >> +#
> >> +#   Copyright(c) 2016 Intel Corporation. All rights reserved.
> >> +#
> >> +#   Redistribution and use in source and binary forms, with or without
> >> +#   modification, are permitted provided that the following conditions
> >> +#   are met:
> >> +#
> >> +# * Redistributions of source code must retain the above copyright
> >> +#   notice, this list of conditions and the following disclaimer.
> >> +# * Redistributions in binary form must reproduce the above copyright
> >> +#   notice, this list of conditions and the following disclaimer in
> >> +#   the documentation and/or other materials provided with the
> >> +#   distribution.
> >> +# * Neither the name of Intel Corporation nor the names of its
> >> +#   contributors may be used to endorse or promote products derived
> >> +#   from this software without specific prior written permission.
> >> +#
> >> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> >> CONTRIBUTORS
> >> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
> BUT
> >> NOT
> >> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> >> FITNESS FOR
> >> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> >> COPYRIGHT
> >> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> >> INCIDENTAL,
> >> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
> BUT
> >> NOT
> >> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> LOSS
> >> OF USE,
> >> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> >> AND ON ANY
> >> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> >> TORT
> >> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
> OF
> >> THE USE
> >> +#   OF THIS

[dpdk-dev] KNI broken again with 4.9 kernel

2016-12-14 Thread Stephen Hemminger
/build/lib/librte_eal/linuxapp/kni/igb_main.c:2317:21: error: initialization 
from incompatible pointer type [-Werror=incompatible-pointer-types]
  .ndo_set_vf_vlan = igb_ndo_set_vf_vlan,
 ^~~

I am sure Ferruh Yigit will fix it.

Which raises a couple of questions:
 1. Why is DPDK still keeping KNI support for Intel specific ethtool 
functionality.
This always breaks, is code bloat, and means a 3rd copy of base code 
(Linux, DPDK PMD, + KNI)

 2. Why is KNI not upstream?
If not acceptable due to security or supportablity then why does it still 
exist?

 3. If not upstream, then maintainer should track upstream kernel changes and 
fix DPDK before
kernel is released.  The ABI is normally set early in the rc cycle weeks 
before release.


[dpdk-dev] [PATCH 0/2] support for Hyper-V VMBUS

2016-12-14 Thread Stephen Hemminger
This is the core changes required to support VMBUS.
It does cause some ABI change to ethdev etc.

Stephen Hemminger (2):
  ethdev: increase length ethernet device internal name
  hyperv: VMBUS support infrastucture

 doc/guides/rel_notes/deprecation.rst|   3 +
 lib/librte_eal/common/Makefile  |   2 +-
 lib/librte_eal/common/eal_common_devargs.c  |   7 +
 lib/librte_eal/common/eal_common_options.c  |  38 ++
 lib/librte_eal/common/eal_internal_cfg.h|   3 +-
 lib/librte_eal/common/eal_options.h |   6 +
 lib/librte_eal/common/eal_private.h |   5 +
 lib/librte_eal/common/include/rte_devargs.h |   8 +
 lib/librte_eal/common/include/rte_vmbus.h   | 247 
 lib/librte_eal/linuxapp/eal/Makefile|   6 +
 lib/librte_eal/linuxapp/eal/eal.c   |  11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c | 906 
 lib/librte_ether/rte_ethdev.c   |  90 +++
 lib/librte_ether/rte_ethdev.h   |  34 +-
 mk/rte.app.mk   |   1 +
 15 files changed, 1362 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c

-- 
2.10.2



[dpdk-dev] [PATCH 1/2] ethdev: increase length ethernet device internal name

2016-12-14 Thread Stephen Hemminger
Allow sufficicent space for UUID in string form (36+1).
Needed to use UUID with Hyper-V

Signed-off-by: Stephen Hemminger 
---
 doc/guides/rel_notes/deprecation.rst | 3 +++
 lib/librte_ether/rte_ethdev.h| 6 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/deprecation.rst 
b/doc/guides/rel_notes/deprecation.rst
index 2d17bc6..b83f23a 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -58,6 +58,9 @@ Deprecation Notices
   ``port`` field, may be moved or removed as part of this mbuf work. A
   ``timestamp`` will also be added.
 
+* ethdev: for 17.02 the size of internal device name will be increased
+  to 40 characters to allow for storing UUID.
+
 * The mbuf flags PKT_RX_VLAN_PKT and PKT_RX_QINQ_PKT are deprecated and
   are respectively replaced by PKT_RX_VLAN_STRIPPED and
   PKT_RX_QINQ_STRIPPED, that are better described. The old flags and
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 9678179..3c85e33 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1652,7 +1652,11 @@ struct rte_eth_dev_sriov {
 };
 #define RTE_ETH_DEV_SRIOV(dev) ((dev)->data->sriov)
 
-#define RTE_ETH_NAME_MAX_LEN (32)
+/*
+ * Internal identifier length
+ * Sufficiently large to allow for UUID or PCI address
+ */
+#define RTE_ETH_NAME_MAX_LEN 40
 
 /**
  * @internal
-- 
2.10.2



[dpdk-dev] [PATCH 2/2] hyperv: VMBUS support infrastucture

2016-12-14 Thread Stephen Hemminger
Generalize existing bus support to handle VMBUS in Hyper-V.
Most of the code is based of existing model for PCI, the difference
is how bus is represented in sysfs and how addressing works.

This is based on earlier code contributed by Brocade.
It supports only 4.9 or later versions of the Linux kernel
at this time (not older kernels or BSD).

Signed-off-by: Stephen Hemminger 
---
 lib/librte_eal/common/Makefile  |   2 +-
 lib/librte_eal/common/eal_common_devargs.c  |   7 +
 lib/librte_eal/common/eal_common_options.c  |  38 ++
 lib/librte_eal/common/eal_internal_cfg.h|   3 +-
 lib/librte_eal/common/eal_options.h |   6 +
 lib/librte_eal/common/eal_private.h |   5 +
 lib/librte_eal/common/include/rte_devargs.h |   8 +
 lib/librte_eal/common/include/rte_vmbus.h   | 247 
 lib/librte_eal/linuxapp/eal/Makefile|   6 +
 lib/librte_eal/linuxapp/eal/eal.c   |  11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c | 906 
 lib/librte_ether/rte_ethdev.c   |  90 +++
 lib/librte_ether/rte_ethdev.h   |  28 +-
 mk/rte.app.mk   |   1 +
 14 files changed, 1354 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index a92c984..9254bae 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk
 
 INC := rte_branch_prediction.h rte_common.h
 INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h
-INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h
+INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h rte_vmbus.h
 INC += rte_per_lcore.h rte_random.h
 INC += rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_version.h
diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index e403717..934ca84 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -113,6 +113,13 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
goto fail;
 
break;
+   case RTE_DEVTYPE_WHITELISTED_VMBUS:
+   case RTE_DEVTYPE_BLACKLISTED_VMBUS:
+#ifdef RTE_LIBRTE_HV_PMD
+   if (uuid_parse(buf, devargs->uuid) == 0)
+   break;
+#endif
+   goto fail;
}
 
free(buf);
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 6ca8af1..6aea87d 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -95,6 +95,11 @@ eal_long_options[] = {
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
{OPT_VMWARE_TSC_MAP,0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
{OPT_XEN_DOM0,  0, NULL, OPT_XEN_DOM0_NUM },
+#ifdef RTE_LIBRTE_HV_PMD
+   {OPT_NO_VMBUS,  0, NULL, OPT_NO_VMBUS_NUM },
+   {OPT_VMBUS_BLACKLIST,   1, NULL, OPT_VMBUS_BLACKLIST_NUM  },
+   {OPT_VMBUS_WHITELIST,   1, NULL, OPT_VMBUS_WHITELIST_NUM  },
+#endif
{0, 0, NULL, 0}
 };
 
@@ -855,6 +860,21 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_pci = 1;
break;
 
+#ifdef RTE_LIBRTE_HV_PMD
+   case OPT_NO_VMBUS_NUM:
+   conf->no_vmbus = 1;
+   break;
+   case OPT_VMBUS_BLACKLIST_NUM:
+   if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_VMBUS,
+   optarg) < 0)
+   return -1;
+   break;
+   case OPT_VMBUS_WHITELIST_NUM:
+   if (rte_eal_devargs_add(RTE_DEVTYPE_WHITELISTED_VMBUS,
+   optarg) < 0)
+   return -1;
+   break;
+#endif
case OPT_NO_HPET_NUM:
conf->no_hpet = 1;
break;
@@ -987,6 +1007,14 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}
 
+#ifdef RTE_LIBRTE_HV_PMD
+   if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_VMBUS) != 0 &&
+   rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_VMBUS) != 0) 
{
+   RTE_LOG(ERR, EAL, "Options vmbus blacklist and whitelist "
+   "cannot be used at the same time\n");
+   return -1;
+   }
+#endif
return 0;
 }
 
@@ -1036,5 +1064,15 @@ eal_common_usage(void)
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
+#ifdef RTE_LIBRTE_HV_PMD
+  "  --"OPT_NO_VMBUS"  Disable VM

Re: [dpdk-dev] [PATCH 26/28] net/virtio: use eal I/O device memory read/write API

2016-12-14 Thread Santosh Shukla
On Wed, Dec 14, 2016 at 11:02:23AM +0800, Yuanhan Liu wrote:
> On Wed, Dec 14, 2016 at 07:25:56AM +0530, Jerin Jacob wrote:
> >   * Following macros are derived from linux/pci_regs.h, however,
> >   * we can't simply include that header here, as there is no such
> > @@ -320,37 +322,37 @@ static const struct virtio_pci_ops legacy_ops = {
> >  static inline uint8_t
> >  io_read8(uint8_t *addr)
> >  {
> > -   return *(volatile uint8_t *)addr;
> > +   return rte_readb(addr);
> >  }
> 
> Oh, one more comments: why not replacing io_read8 with rte_readb(),
> and do similar for others? Then we don't have to define those wrappers.
> 
> I think you can also do something similar for other patches?

Make sense for the virtio-pci case where API name io_read/write as good as
rte_read/write. However, IMO for other drivers for example ADF_CSR_RD/WR
improves code readability compared to plain rte_read/write.

Also IMO replacing code incident like below 
 
static inline void writel(unsigned int val, volatile void __iomem *addr)
{
-*(volatile unsigned int *)addr = val;
+rte_writel(val, addr);
}

with direct rte_read/write more appropriate. does above said make sense
to you?
If so then I will take care for all such driver in V2.

--Santosh.

>   --yliu
> >  
> >  static inline void
> >  io_write8(uint8_t val, uint8_t *addr)
> >  {
> > -   *(volatile uint8_t *)addr = val;
> > +   rte_writeb(val, addr);
> >  }
> >  
> >  static inline uint16_t
> >  io_read16(uint16_t *addr)
> >  {
> > -   return *(volatile uint16_t *)addr;
> > +   return rte_readw(addr);
> >  }
> >  
> >  static inline void
> >  io_write16(uint16_t val, uint16_t *addr)
> >  {
> > -   *(volatile uint16_t *)addr = val;
> > +   rte_writew(val, addr);
> >  }
> >  
> >  static inline uint32_t
> >  io_read32(uint32_t *addr)
> >  {
> > -   return *(volatile uint32_t *)addr;
> > +   return rte_readl(addr);
> >  }
> >  
> >  static inline void
> >  io_write32(uint32_t val, uint32_t *addr)
> >  {
> > -   *(volatile uint32_t *)addr = val;
> > +   rte_writel(val, addr);
> >  }
> >  
> >  static inline void
> > -- 
> > 2.5.5


Re: [dpdk-dev] [PATCH 27/28] net/vmxnet3: use eal I/O device memory read/write API

2016-12-14 Thread Santosh Shukla
On Wed, Dec 14, 2016 at 10:55:34AM +0800, Yuanhan Liu wrote:
> On Wed, Dec 14, 2016 at 07:25:57AM +0530, Jerin Jacob wrote:
> > From: Santosh Shukla 
> > 
> > Replace the raw I/O device memory read/write access with eal
> > abstraction for I/O device memory read/write access to fix
> > portability issues across different architectures.
> > 
> > Signed-off-by: Santosh Shukla 
> > Signed-off-by: Jerin Jacob 
> > CC: Yong Wang 
> > ---
> >  drivers/net/vmxnet3/vmxnet3_ethdev.h | 14 ++
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.h 
> > b/drivers/net/vmxnet3/vmxnet3_ethdev.h
> > index 7d3b11e..5b6501b 100644
> > --- a/drivers/net/vmxnet3/vmxnet3_ethdev.h
> > +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.h
> > @@ -34,6 +34,8 @@
> >  #ifndef _VMXNET3_ETHDEV_H_
> >  #define _VMXNET3_ETHDEV_H_
> >  
> > +#include 
> > +
> >  #define VMXNET3_MAX_MAC_ADDRS 1
> >  
> >  /* UPT feature to negotiate */
> > @@ -120,7 +122,11 @@ struct vmxnet3_hw {
> >  
> >  /* Config space read/writes */
> >  
> > -#define VMXNET3_PCI_REG(reg) (*((volatile uint32_t *)(reg)))
> > +#define VMXNET3_PCI_REG(reg) ({\
> > +   uint32_t __val; \
> > +   __val = rte_readl(reg); \
> > +   __val;  \
> > +})
> 
> Why not simply using rte_readl directly?
> 
>   #define VMXNET3_PCI_REG(reg)rte_readl(reg)
>

Ok.

> >  
> >  static inline uint32_t
> >  vmxnet3_read_addr(volatile void *addr)
> > @@ -128,9 +134,9 @@ vmxnet3_read_addr(volatile void *addr)
> > return VMXNET3_PCI_REG(addr);
> >  }
> >  
> > -#define VMXNET3_PCI_REG_WRITE(reg, value) do { \
> > -   VMXNET3_PCI_REG((reg)) = (value); \
> > -} while(0)
> > +#define VMXNET3_PCI_REG_WRITE(reg, value) ({   \
> > +   rte_writel(value, reg); \
> > +})
> 
> I think this could be done in one line.
>

Ok.
will take care in V2.

>   --yliu


Re: [dpdk-dev] [PATCH 02/32] drivers/common: introducing dpaa2 mc driver

2016-12-14 Thread Jerin Jacob
On Sun, Dec 04, 2016 at 11:46:57PM +0530, Hemant Agrawal wrote:
> This patch intoduces the DPAA2 MC(Management complex Driver)
> 
> This driver is common to be used by various DPAA2 net, crypto
> and other drivers
> 
> Signed-off-by: Cristian Sovaiala 
> [Hemant:rebase and conversion to library for DPDK]
> Signed-off-by: Hemant Agrawal 
> +#ifndef _FSL_MC_SYS_H
> +#define _FSL_MC_SYS_H
> +
> +#ifdef __linux_driver__
> +
> +#include 
> +#include 
> +#include 
> +
> +struct fsl_mc_io {
> + void *regs;
> +};
> +
> +#ifndef ENOTSUP
> +#define ENOTSUP  95
> +#endif
> +
> +#define ioread64(_p) readq(_p)
> +#define iowrite64(_v, _p)   writeq(_v, _p)
> +
> +#else /* __linux_driver__ */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define cpu_to_le64(x) __cpu_to_le64(x)
> +#ifndef dmb
> +#define dmb() {__asm__ __volatile__("" : : : "memory"); }
> +#endif

Better to use DPDK macros here.

> +#define __iormb()   dmb()
> +#define __iowmb()   dmb()
> +#define __arch_getq(a)  (*(volatile unsigned long *)(a))
> +#define __arch_putq(v, a)(*(volatile unsigned long *)(a) = 
> (v))
> +#define __arch_putq32(v, a)(*(volatile unsigned int *)(a) = 
> (v))
> +#define readq(c)\
> + ({ uint64_t __v = __arch_getq(c); __iormb(); __v; })
> +#define writeq(v, c) \
> + ({ uint64_t __v = v; __iowmb(); __arch_putq(__v, c); __v; })
> +#define writeq32(v, c) \
> + ({ uint32_t __v = v; __iowmb(); __arch_putq32(__v, c); __v; })
> +#define ioread64(_p) readq(_p)
> +#define iowrite64(_v, _p)   writeq(_v, _p)
> +#define iowrite32(_v, _p)   writeq32(_v, _p)

Hopefully, we can clean all this once rte_read32 and rte_write32 becomes
mainline

http://dpdk.org/dev/patchwork/patch/17935/

> +#define __iomem
> +
> +struct fsl_mc_io {
> + void *regs;
> +};
> +
> +#ifndef ENOTSUP
> +#define ENOTSUP  95
> +#endif
> +
> +/*GPP is supposed to use MC commands with low priority*/
> +#define CMD_PRI_LOW  0 /*!< Low Priority command indication */
> +
> +struct mc_command;
> +
> +int mc_send_command(struct fsl_mc_io *mc_io, struct mc_command *cmd);
> +
> +#endif /* __linux_driver__ */
> +
> +#endif /* _FSL_MC_SYS_H */
> +
> +/** User space framework uses MC Portal in shared mode. Following change
> +* introduces lock in MC FLIB
> +*/
> +
> +/**
> +* The mc_spinlock_t type.
> +*/
> +typedef struct {
> + volatile int locked; /**< lock status 0 = unlocked, 1 = locked */
> +} mc_spinlock_t;
> +
> +/**
> +* A static spinlock initializer.
> +*/
> +static mc_spinlock_t mc_portal_lock = { 0 };
> +
> +static inline void mc_pause(void) {}
> +
> +static inline void mc_spinlock_lock(mc_spinlock_t *sl)
> +{
> + while (__sync_lock_test_and_set(&sl->locked, 1))
> + while (sl->locked)
> + mc_pause();
> +}
> +
> +static inline void mc_spinlock_unlock(mc_spinlock_t *sl)
> +{
> + __sync_lock_release(&sl->locked);
> +}
> +

DPDK spinlock can be used here.


Re: [dpdk-dev] [PATCH 17/32] net/dpaa2: dpbp based mempool hw offload driver

2016-12-14 Thread Jerin Jacob
On Sun, Dec 04, 2016 at 11:47:12PM +0530, Hemant Agrawal wrote:
> DPBP represent a buffer pool instance in DPAA2-QBMAN
> HW accelerator.
> 
> All buffers needs to be programmed in the HW accelerator.
> 
> Signed-off-by: Hemant Agrawal 
> ---
>  config/defconfig_arm64-dpaa2-linuxapp-gcc |   5 +
>  drivers/net/dpaa2/Makefile|   2 +
>  drivers/net/dpaa2/base/dpaa2_hw_dpbp.c| 366 
> ++
>  drivers/net/dpaa2/base/dpaa2_hw_dpbp.h| 101 +
>  drivers/net/dpaa2/base/dpaa2_hw_pvt.h |   7 +


How about moving the external mempool driver to RTE_SDK/driver/pool.
We are planning to push our external mempool driver to driver/pool.

>  drivers/net/dpaa2/dpaa2_vfio.c|  13 +-
>  6 files changed, 493 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.c
>  create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.h
> 


Re: [dpdk-dev] [PATCH 16/32] net/dpaa2: dpio add support to check SOC type

2016-12-14 Thread Jerin Jacob
On Sun, Dec 04, 2016 at 11:47:11PM +0530, Hemant Agrawal wrote:
> Signed-off-by: Hemant Agrawal 
> ---
>  drivers/net/dpaa2/base/dpaa2_hw_dpio.c | 74 
> ++
>  1 file changed, 74 insertions(+)
> 
> diff --git a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c 
> b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
> index 9c6eb96..3b8f87d 100644
> --- a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
> +++ b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
> @@ -70,6 +70,18 @@
>  static struct dpio_device_list *dpio_dev_list; /*!< DPIO device list */
>  static uint32_t io_space_count;
>  
> +#define ARM_CORTEX_A53   0xD03
> +#define ARM_CORTEX_A57   0xD07
> +#define ARM_CORTEX_A72   0xD08

May not be good idea to have generic ARM part number definition in driver
file.

> +
> +static int dpaa2_soc_core = ARM_CORTEX_A72;
> +
> +#define NXP_LS2085   1
> +#define NXP_LS2088   2
> +#define NXP_LS1088   3
> +
> +static int dpaa2_soc_family  = NXP_LS2088;
> +
>  /*Stashing Macros default for LS208x*/
>  static int dpaa2_core_cluster_base = 0x04;
>  static int dpaa2_cluster_sz = 2;
> @@ -101,6 +113,58 @@
>   return dpaa2_core_cluster_base + x;
>  }
>  
> +static int cpuinfo_arm(FILE *file)
> +{
> + char str[128], *pos;
> + int part = -1;
> +
> + #define ARM_CORTEX_A53_INFO "Cortex-A53"
> + #define ARM_CORTEX_A57_INFO "Cortex-A57"
> + #define ARM_CORTEX_A72_INFO "Cortex-A72"
> +
> + while (fgets(str, sizeof(str), file) != NULL) {
> + if (part >= 0)
> + break;
> + pos = strstr(str, "CPU part");
> + if (pos != NULL) {
> + pos = strchr(pos, ':');
> + if (pos != NULL)
> + sscanf(++pos, "%x", &part);
> + }
> + }
> +
> + dpaa2_soc_core = part;
> + if (part == ARM_CORTEX_A53) {
> + dpaa2_soc_family = NXP_LS1088;
> + printf("\n## Detected NXP LS108x with %s\n",
> +ARM_CORTEX_A53_INFO);
> + } else if (part == ARM_CORTEX_A57) {
> + dpaa2_soc_family = NXP_LS2085;
> + printf("\n## Detected NXP LS208x Rev1.0 with %s\n",
> +ARM_CORTEX_A57_INFO);
> + } else if (part == ARM_CORTEX_A72) {
> + dpaa2_soc_family = NXP_LS2088;
> + printf("\n## Detected NXP LS208x with %s\n",
> +ARM_CORTEX_A72_INFO);
> + }
> + return 0;
> +}
> +
> +static void
> +check_cpu_part(void)
> +{
> + FILE *stream;
> +
> + stream = fopen("/proc/cpuinfo", "r");
> + if (!stream) {
> + PMD_INIT_LOG(WARNING, "Unable to open /proc/cpuinfo\n");
> + return;
> + }
> + cpuinfo_arm(stream);
> +
> + fclose(stream);
> +}
> +
>  static int
>  configure_dpio_qbman_swp(struct dpaa2_dpio_dev *dpio_dev)
>  {
> @@ -326,6 +390,16 @@ static inline struct dpaa2_dpio_dev 
> *dpaa2_get_qbman_swp(void)
>  {
>   struct dpaa2_dpio_dev *dpio_dev;
>   struct vfio_region_info reg_info = { .argsz = sizeof(reg_info)};
> + static int first_time;
> +
> + if (!first_time) {
> + check_cpu_part();
> + if (dpaa2_soc_family == NXP_LS1088) {
> + dpaa2_core_cluster_base = 0x02;
> + dpaa2_cluster_sz = 4;
Can this device configuration information passed through dt/the means
where you are populating the fsl bus for dpio ?

if not arm64 cpu part identification code can go in arm64 common
code. Even better if we have EAL API for same. Looks like x86 similar
attribute called "model"



Re: [dpdk-dev] [PATCH 17/32] net/dpaa2: dpbp based mempool hw offload driver

2016-12-14 Thread Shreyansh Jain

On Thursday 15 December 2016 11:39 AM, Jerin Jacob wrote:

On Sun, Dec 04, 2016 at 11:47:12PM +0530, Hemant Agrawal wrote:

DPBP represent a buffer pool instance in DPAA2-QBMAN
HW accelerator.

All buffers needs to be programmed in the HW accelerator.

Signed-off-by: Hemant Agrawal 
---
 config/defconfig_arm64-dpaa2-linuxapp-gcc |   5 +
 drivers/net/dpaa2/Makefile|   2 +
 drivers/net/dpaa2/base/dpaa2_hw_dpbp.c| 366 ++
 drivers/net/dpaa2/base/dpaa2_hw_dpbp.h| 101 +
 drivers/net/dpaa2/base/dpaa2_hw_pvt.h |   7 +



How about moving the external mempool driver to RTE_SDK/driver/pool.
We are planning to push our external mempool driver to driver/pool.


I really like the idea of this separation:

So,
..drivers/net/
..drivers/crypto/
..drivers/bus/
..drivers/pool/

only concern I see for now is resolving dependency of symbols across 
this structure. for example, DPAA2 Pool would be dependent on some DPAA2 
specific objects - which then are again used in crypto/ and net/.


It is possible to have drivers/common (which DPAA2 PMD patchset is 
already doing). How are you doing that?





 drivers/net/dpaa2/dpaa2_vfio.c|  13 +-
 6 files changed, 493 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.c
 create mode 100644 drivers/net/dpaa2/base/dpaa2_hw_dpbp.h







Re: [dpdk-dev] [PATCH 08/32] mk/dpaa2: add the crc support to the machine type

2016-12-14 Thread Jerin Jacob
On Sun, Dec 04, 2016 at 11:47:03PM +0530, Hemant Agrawal wrote:
> Signed-off-by: Hemant Agrawal 

Acked-by: Jerin Jacob 

> ---
>  mk/machine/dpaa2/rte.vars.mk | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mk/machine/dpaa2/rte.vars.mk b/mk/machine/dpaa2/rte.vars.mk
> index 8541633..e4735c2 100644
> --- a/mk/machine/dpaa2/rte.vars.mk
> +++ b/mk/machine/dpaa2/rte.vars.mk
> @@ -1,6 +1,7 @@
>  #   BSD LICENSE
>  #
> -#   Copyright(c) 2016 Freescale Semiconductor, Inc. All rights reserved.
> +#   Copyright (c) 2016 Freescale Semiconductor, Inc. All rights reserved.
> +#   Copyright (c) 2016 NXP. All rights reserved.
>  #
>  #   Redistribution and use in source and binary forms, with or without
>  #   modification, are permitted provided that the following conditions
> @@ -53,7 +54,7 @@
>  # CPU_CFLAGS =
>  # CPU_LDFLAGS =
>  # CPU_ASFLAGS =
> -MACHINE_CFLAGS += -march=armv8-a
> +MACHINE_CFLAGS += -march=armv8-a+crc
>  
>  ifdef CONFIG_RTE_ARCH_ARM_TUNE
>  MACHINE_CFLAGS += -mcpu=$(CONFIG_RTE_ARCH_ARM_TUNE)
> -- 
> 1.9.1
> 


Re: [dpdk-dev] [PATCH 2/2] hyperv: VMBUS support infrastucture

2016-12-14 Thread Shreyansh Jain

On Thursday 15 December 2016 05:29 AM, Stephen Hemminger wrote:

Generalize existing bus support to handle VMBUS in Hyper-V.
Most of the code is based of existing model for PCI, the difference
is how bus is represented in sysfs and how addressing works.

This is based on earlier code contributed by Brocade.
It supports only 4.9 or later versions of the Linux kernel
at this time (not older kernels or BSD).

Signed-off-by: Stephen Hemminger 
---
 lib/librte_eal/common/Makefile  |   2 +-
 lib/librte_eal/common/eal_common_devargs.c  |   7 +
 lib/librte_eal/common/eal_common_options.c  |  38 ++
 lib/librte_eal/common/eal_internal_cfg.h|   3 +-
 lib/librte_eal/common/eal_options.h |   6 +
 lib/librte_eal/common/eal_private.h |   5 +
 lib/librte_eal/common/include/rte_devargs.h |   8 +
 lib/librte_eal/common/include/rte_vmbus.h   | 247 
 lib/librte_eal/linuxapp/eal/Makefile|   6 +
 lib/librte_eal/linuxapp/eal/eal.c   |  11 +
 lib/librte_eal/linuxapp/eal/eal_vmbus.c | 906 
 lib/librte_ether/rte_ethdev.c   |  90 +++
 lib/librte_ether/rte_ethdev.h   |  28 +-
 mk/rte.app.mk   |   1 +
 14 files changed, 1354 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_eal/common/include/rte_vmbus.h
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_vmbus.c

diff --git a/lib/librte_eal/common/Makefile b/lib/librte_eal/common/Makefile
index a92c984..9254bae 100644
--- a/lib/librte_eal/common/Makefile
+++ b/lib/librte_eal/common/Makefile
@@ -33,7 +33,7 @@ include $(RTE_SDK)/mk/rte.vars.mk

 INC := rte_branch_prediction.h rte_common.h
 INC += rte_debug.h rte_eal.h rte_errno.h rte_launch.h rte_lcore.h
-INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h
+INC += rte_log.h rte_memory.h rte_memzone.h rte_pci.h rte_vmbus.h
 INC += rte_per_lcore.h rte_random.h
 INC += rte_tailq.h rte_interrupts.h rte_alarm.h
 INC += rte_string_fns.h rte_version.h
diff --git a/lib/librte_eal/common/eal_common_devargs.c 
b/lib/librte_eal/common/eal_common_devargs.c
index e403717..934ca84 100644
--- a/lib/librte_eal/common/eal_common_devargs.c
+++ b/lib/librte_eal/common/eal_common_devargs.c
@@ -113,6 +113,13 @@ rte_eal_devargs_add(enum rte_devtype devtype, const char 
*devargs_str)
goto fail;

break;
+   case RTE_DEVTYPE_WHITELISTED_VMBUS:
+   case RTE_DEVTYPE_BLACKLISTED_VMBUS:
+#ifdef RTE_LIBRTE_HV_PMD
+   if (uuid_parse(buf, devargs->uuid) == 0)
+   break;
+#endif
+   goto fail;
}

free(buf);
diff --git a/lib/librte_eal/common/eal_common_options.c 
b/lib/librte_eal/common/eal_common_options.c
index 6ca8af1..6aea87d 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -95,6 +95,11 @@ eal_long_options[] = {
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
{OPT_VMWARE_TSC_MAP,0, NULL, OPT_VMWARE_TSC_MAP_NUM   },
{OPT_XEN_DOM0,  0, NULL, OPT_XEN_DOM0_NUM },
+#ifdef RTE_LIBRTE_HV_PMD
+   {OPT_NO_VMBUS,  0, NULL, OPT_NO_VMBUS_NUM },
+   {OPT_VMBUS_BLACKLIST,   1, NULL, OPT_VMBUS_BLACKLIST_NUM  },
+   {OPT_VMBUS_WHITELIST,   1, NULL, OPT_VMBUS_WHITELIST_NUM  },
+#endif
{0, 0, NULL, 0}
 };

@@ -855,6 +860,21 @@ eal_parse_common_option(int opt, const char *optarg,
conf->no_pci = 1;
break;

+#ifdef RTE_LIBRTE_HV_PMD
+   case OPT_NO_VMBUS_NUM:
+   conf->no_vmbus = 1;
+   break;
+   case OPT_VMBUS_BLACKLIST_NUM:
+   if (rte_eal_devargs_add(RTE_DEVTYPE_BLACKLISTED_VMBUS,
+   optarg) < 0)
+   return -1;
+   break;
+   case OPT_VMBUS_WHITELIST_NUM:
+   if (rte_eal_devargs_add(RTE_DEVTYPE_WHITELISTED_VMBUS,
+   optarg) < 0)
+   return -1;
+   break;
+#endif
case OPT_NO_HPET_NUM:
conf->no_hpet = 1;
break;
@@ -987,6 +1007,14 @@ eal_check_common_options(struct internal_config 
*internal_cfg)
return -1;
}

+#ifdef RTE_LIBRTE_HV_PMD
+   if (rte_eal_devargs_type_count(RTE_DEVTYPE_WHITELISTED_VMBUS) != 0 &&
+   rte_eal_devargs_type_count(RTE_DEVTYPE_BLACKLISTED_VMBUS) != 0) 
{
+   RTE_LOG(ERR, EAL, "Options vmbus blacklist and whitelist "
+   "cannot be used at the same time\n");
+   return -1;
+   }
+#endif
return 0;
 }

@@ -1036,5 +1064,15 @@ eal_common_usage(void)
   "  --"OPT_NO_PCI"Disable PCI\n"
   "  --"OPT_NO_HPET"   Disable HPET\n"
   "  --"OPT_NO_SHCONF" No shared config (mmap'd files)\n"
+#ifdef RTE_LIBRTE_HV

Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on IA platform

2016-12-14 Thread Yang, Zhiyong
Hi, Thomas, Konstantin:

> -Original Message-
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Yang, Zhiyong
> Sent: Sunday, December 11, 2016 8:33 PM
> To: Ananyev, Konstantin ; Thomas
> Monjalon 
> Cc: dev@dpdk.org; yuanhan@linux.intel.com; Richardson, Bruce
> ; De Lara Guarch, Pablo
> 
> Subject: Re: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset on
> IA platform
> 
> Hi, Konstantin, Bruce:
> 
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Thursday, December 8, 2016 6:31 PM
> > To: Yang, Zhiyong ; Thomas Monjalon
> > 
> > Cc: dev@dpdk.org; yuanhan@linux.intel.com; Richardson, Bruce
> > ; De Lara Guarch, Pablo
> > 
> > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > on IA platform
> >
> >
> >
> > > -Original Message-
> > > From: Yang, Zhiyong
> > > Sent: Thursday, December 8, 2016 9:53 AM
> > > To: Ananyev, Konstantin ; Thomas
> > > Monjalon 
> > > Cc: dev@dpdk.org; yuanhan@linux.intel.com; Richardson, Bruce
> > > ; De Lara Guarch, Pablo
> > > 
> > > Subject: RE: [dpdk-dev] [PATCH 1/4] eal/common: introduce rte_memset
> > > on IA platform
> > >
> > extern void *(*__rte_memset_vector)( (void *s, int c, size_t n);
> >
> > static inline void*
> > rte_memset_huge(void *s, int c, size_t n) {
> >return __rte_memset_vector(s, c, n); }
> >
> > static inline void *
> > rte_memset(void *s, int c, size_t n)
> > {
> > If (n < XXX)
> > return rte_memset_scalar(s, c, n);
> > else
> > return rte_memset_huge(s, c, n);
> > }
> >
> > XXX could be either a define, or could also be a variable, so it can
> > be setuped at startup, depending on the architecture.
> >
> > Would that work?
> > Konstantin
> >
I have implemented the code for  choosing the functions at run time.
rte_memcpy is used more frequently, So I test it at run time. 

typedef void *(*rte_memcpy_vector_t)(void *dst, const void *src, size_t n);
extern rte_memcpy_vector_t rte_memcpy_vector;
static inline void *
rte_memcpy(void *dst, const void *src, size_t n)
{
return rte_memcpy_vector(dst, src, n);
}
In order to reduce the overhead at run time, 
I assign the function address to var rte_memcpy_vector before main() starts to 
init the var.

static void __attribute__((constructor))
rte_memcpy_init(void)
{
if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2))
{
rte_memcpy_vector = rte_memcpy_avx2;
}
else if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_1))
{
rte_memcpy_vector = rte_memcpy_sse;
}
else
{
rte_memcpy_vector = memcpy;
}

}
I run the same virtio/vhost loopback tests without NIC.
I can see the  throughput drop  when running choosing functions at run time
compared to original code as following on the same platform(my machine is 
haswell) 
Packet size perf drop
64  -4%
256 -5.4%
1024-5%
1500-2.5%
Another thing, I run the memcpy_perf_autotest,  when N= <128, 
the rte_memcpy perf gains almost disappears
When choosing functions at run time.  For N=other numbers, the perf gains will 
become narrow.

Thanks
Zhiyong


Re: [dpdk-dev] [PATCH 17/32] net/dpaa2: dpbp based mempool hw offload driver

2016-12-14 Thread Jerin Jacob
On Thu, Dec 15, 2016 at 12:07:51PM +0530, Shreyansh Jain wrote:
> On Thursday 15 December 2016 11:39 AM, Jerin Jacob wrote:
> > On Sun, Dec 04, 2016 at 11:47:12PM +0530, Hemant Agrawal wrote:
> > > DPBP represent a buffer pool instance in DPAA2-QBMAN
> > > HW accelerator.
> > > 
> > > All buffers needs to be programmed in the HW accelerator.
> > > 
> > > Signed-off-by: Hemant Agrawal 
> > > ---
> > >  config/defconfig_arm64-dpaa2-linuxapp-gcc |   5 +
> > >  drivers/net/dpaa2/Makefile|   2 +
> > >  drivers/net/dpaa2/base/dpaa2_hw_dpbp.c| 366 
> > > ++
> > >  drivers/net/dpaa2/base/dpaa2_hw_dpbp.h| 101 +
> > >  drivers/net/dpaa2/base/dpaa2_hw_pvt.h |   7 +
> > 
> > 
> > How about moving the external mempool driver to RTE_SDK/driver/pool.
> > We are planning to push our external mempool driver to driver/pool.
> 
> I really like the idea of this separation:
> 
> So,
> ..drivers/net/
> ..drivers/crypto/
> ..drivers/bus/
> ..drivers/pool/
> 
> only concern I see for now is resolving dependency of symbols across this
> structure. for example, DPAA2 Pool would be dependent on some DPAA2 specific
> objects - which then are again used in crypto/ and net/.
> 
> It is possible to have drivers/common (which DPAA2 PMD patchset is already
> doing). How are you doing that?

Same approach. driver/common/octeontx directory for common octeontx driver code



Re: [dpdk-dev] [PATCH 16/32] net/dpaa2: dpio add support to check SOC type

2016-12-14 Thread Hemant Agrawal

On 12/15/2016 12:04 PM, Jerin Jacob wrote:

On Sun, Dec 04, 2016 at 11:47:11PM +0530, Hemant Agrawal wrote:

Signed-off-by: Hemant Agrawal 
---
 drivers/net/dpaa2/base/dpaa2_hw_dpio.c | 74 ++
 1 file changed, 74 insertions(+)

diff --git a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c 
b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
index 9c6eb96..3b8f87d 100644
--- a/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
+++ b/drivers/net/dpaa2/base/dpaa2_hw_dpio.c
@@ -70,6 +70,18 @@
 static struct dpio_device_list *dpio_dev_list; /*!< DPIO device list */
 static uint32_t io_space_count;

+#define ARM_CORTEX_A53 0xD03
+#define ARM_CORTEX_A57 0xD07
+#define ARM_CORTEX_A72 0xD08


May not be good idea to have generic ARM part number definition in driver
file.


+
+static int dpaa2_soc_core = ARM_CORTEX_A72;
+
+#define NXP_LS2085 1
+#define NXP_LS2088 2
+#define NXP_LS1088 3
+
+static int dpaa2_soc_family  = NXP_LS2088;
+
 /*Stashing Macros default for LS208x*/
 static int dpaa2_core_cluster_base = 0x04;
 static int dpaa2_cluster_sz = 2;
@@ -101,6 +113,58 @@
return dpaa2_core_cluster_base + x;
 }

+static int cpuinfo_arm(FILE *file)
+{
+   char str[128], *pos;
+   int part = -1;
+
+   #define ARM_CORTEX_A53_INFO "Cortex-A53"
+   #define ARM_CORTEX_A57_INFO "Cortex-A57"
+   #define ARM_CORTEX_A72_INFO "Cortex-A72"
+
+   while (fgets(str, sizeof(str), file) != NULL) {
+   if (part >= 0)
+   break;
+   pos = strstr(str, "CPU part");
+   if (pos != NULL) {
+   pos = strchr(pos, ':');
+   if (pos != NULL)
+   sscanf(++pos, "%x", &part);
+   }
+   }
+
+   dpaa2_soc_core = part;
+   if (part == ARM_CORTEX_A53) {
+   dpaa2_soc_family = NXP_LS1088;
+   printf("\n## Detected NXP LS108x with %s\n",
+  ARM_CORTEX_A53_INFO);
+   } else if (part == ARM_CORTEX_A57) {
+   dpaa2_soc_family = NXP_LS2085;
+   printf("\n## Detected NXP LS208x Rev1.0 with %s\n",
+  ARM_CORTEX_A57_INFO);
+   } else if (part == ARM_CORTEX_A72) {
+   dpaa2_soc_family = NXP_LS2088;
+   printf("\n## Detected NXP LS208x with %s\n",
+  ARM_CORTEX_A72_INFO);
+   }
+   return 0;
+}
+
+static void
+check_cpu_part(void)
+{
+   FILE *stream;
+
+   stream = fopen("/proc/cpuinfo", "r");
+   if (!stream) {
+   PMD_INIT_LOG(WARNING, "Unable to open /proc/cpuinfo\n");
+   return;
+   }
+   cpuinfo_arm(stream);
+
+   fclose(stream);
+}
+
 static int
 configure_dpio_qbman_swp(struct dpaa2_dpio_dev *dpio_dev)
 {
@@ -326,6 +390,16 @@ static inline struct dpaa2_dpio_dev 
*dpaa2_get_qbman_swp(void)
 {
struct dpaa2_dpio_dev *dpio_dev;
struct vfio_region_info reg_info = { .argsz = sizeof(reg_info)};
+   static int first_time;
+
+   if (!first_time) {
+   check_cpu_part();
+   if (dpaa2_soc_family == NXP_LS1088) {
+   dpaa2_core_cluster_base = 0x02;
+   dpaa2_cluster_sz = 4;

Can this device configuration information passed through dt/the means
where you are populating the fsl bus for dpio ?

if not arm64 cpu part identification code can go in arm64 common
code. Even better if we have EAL API for same. Looks like x86 similar
attribute called "model"



This is good idea to have something equivalent in EAL. let me try to 
make an attempt on it.