RE: [RFC 2/7] eal: add generic bit manipulation macros

2024-03-04 Thread Heng Wang
Hi Mattias,
  I have a comment about the _Generic. What if the user gives uint8_t * or 
uint16_t * as the address. One improvement is that we could add a default 
branch in _Generic to throw a compiler error or assert false.
  Another question is what if nr >= sizeof(type) ? What if you do, for example, 
(uint32_t)1 << 35? Maybe we could add an assert in the implementation?

Regards,
Heng

-Original Message-
From: Mattias Rönnblom  
Sent: Saturday, March 2, 2024 2:53 PM
To: dev@dpdk.org
Cc: hof...@lysator.liu.se; Heng Wang ; Mattias Rönnblom 

Subject: [RFC 2/7] eal: add generic bit manipulation macros

Add bit-level test/set/clear/assign macros operating on both 32-bit and 64-bit 
words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom 
---
 lib/eal/include/rte_bitops.h | 81 
 1 file changed, 81 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h index 
9a368724d5..afd0f11033 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -107,6 +107,87 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr) \
+   _Generic((addr),\
+uint32_t *: rte_bit_test32,\
+uint64_t *: rte_bit_test64)(addr, nr)
+
+/**
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)  \
+   _Generic((addr),\
+uint32_t *: rte_bit_set32, \
+uint64_t *: rte_bit_set64)(addr, nr)
+
+/**
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)\
+   _Generic((addr),\
+uint32_t *: rte_bit_clear32,   \
+uint64_t *: rte_bit_clear64)(addr, nr)
+
+/**
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 
+64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)\
+   _Generic((addr),\
+uint32_t *: rte_bit_assign32,  \
+uint64_t *: rte_bit_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
--
2.34.1



RE: Event device early back-pressure indication

2023-04-17 Thread Heng Wang
Hi,
  This interaction with eventdev introduces some overhead. Isn't it easier to 
just create an API to query the available credit for a certain event port?

Regards,
Heng

-Original Message-
From: Mattias Rönnblom  
Sent: Thursday, April 13, 2023 8:54 AM
To: Jerin Jacob Kollanukkaran 
Cc: timothy.mcdan...@intel.com; Hemant Agrawal ; Harry 
van Haaren ; dev@dpdk.org; Svante Järvstråt 
; Heng Wang ; Stefan 
Sundkvist ; Peter Nilsson 
; Maria Lingemark 
Subject: Event device early back-pressure indication

Hi.

Consider this situation:

An application EAL thread receives an eventdev event (or some other stimuli), 
which in turn triggers some action. This action results in a number of new 
events being prepared, and a number of associated state changes in the 
application.

On attempting to enqueue the newly created batch of RTE_EVENT_OP_NEW events, it 
turns out the system is very busy, and the event device back pressures (i.e., 
returns a short count in rte_event_enqueue_new_burst()).

The application may now be a in tough spot, in case:

A) The processing was expensive and/or difficult to reverse (e.g., destructive 
changes were made to a packet).
B) The application does not have the option to discard the events (and any 
related mbufs).

In this situation, it would be very beneficial to the application if the event 
device give could give some assurance that a future enqueue operation will 
succeed (in its entirety).

 From what I understand from today's Eventdev API, there are no good options. 
You *may* be able to do some heuristics based on a event device-specific xstat 
(to infer the event device load), but that is not even close to "good". You may 
also try some application-level buffering, but that assumes that the 
packets/state changes are going to be identical, if they are to be sent at a 
later time. It would drive complexity in the app.

One seemingly clean way to solve this issue is to allow pre-allocation of 
RTE_NEW_OP_NEW credits. The eventdev API doesn't talk about credits, but at 
least in the event device implementations I've come across use some kind of 
credit system internally.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint16_t count);

In addition to this function, the application would also need some way to 
indicate, at the point of enqueue, that the credits have already been allocated.

I don't see any need for pre-allocating credits for non-RTE_OP_NEW events. 
(Some event devices don't even use credits to track such
events.) Back pressure on RTE_OP_FORWARD usually spells disaster, in one form 
of the other.

You could use a bit in the rte_event struct for the purpose of signaling if its 
credit is pre-allocated. That would allow this change to happen, without any 
changes to the enqueue function prototypes.

However, this would require the event device to scan the event array.

I'm not sure I think there is a use case for mixing pre-allocated and 
non-pre-allocated events in the same burst.

If this burst-level separation is good enough, one could either change the 
existing rte_enqueue_new_burst() or add a new one. Something like:

uint16_t
rte_enqueue_new_burst(uint8_t dev_id, uint8_t port_id,
   const struct rte_event ev[],
   uint16_t nb_events, uint32_t flags);

#define RTE_EVENT_FLAG_PRE_CREDITS_ALLOCATED (UINT32_C(1) << 0)

A related shortcoming of the current eventdev API is that the 
new_event_threshold is tied to a port, which is impractical for applications 
which require different threshold for different kinds of events enqueued on the 
same port. One can use different ports, but that approach does not scale, since 
there may be significant memory and/or event device hardware resources tied to 
ports, and thus you cannot allow for a combinatorial explosion of ports.

This issue could be solve by allowing the application to specify the 
new_event_threshold, either per burst, or per event.

Per event doesn't make a lot of sense in practice, I think, since mixing events 
with different back pressure points will create head-of-line blocking. An early 
low-threshold event may prevent higher-indexed high threshold event in the same 
enqueue burst from being enqueued. This is the same reason it usually doesn't 
make sense to mix RTE_OP_NEW and RTE_OP_FORWARD events in the same burst.

Although the new_event_threshold seems completely orthogonal to the port to me, 
it could still serve as the default.

In case you find this a useful feature, it could be added to the credit 
allocation function.

uint16_t
rte_event_alloc_new_credits(uint8_t dev_id, uint8_t port_id, uint32_t 
new_event_threshold, uint16_t count);

If that is the only change, the user is required to pre-allocated credits to 
use a flexible new_event_threshold.

It seems to me that that might be something you can live with. Or, you add new 
enqueue_new_burst() variant wh

[dpdk-dev] Offloading L4 checksum to crypto device

2020-01-03 Thread Heng Wang
Hi,
 This is Heng from Ericsson. We are now developing a platform based on dpdk. 
The platform usually processes the ipsec packets encapsulating udp packets.
  You know, calculating udp checksum is always time consuming. We found a 
hardware solution where our crypto device can calculate the udp checksum during 
encryption.
 Unfortunately, in DPDK, I didn't find any feature flag for it. Also the API 
rte_cryptodev_configure doesn't allow us to pass any offload flag to the crypto 
device.
 I think this could be a common case to calculate something, like checksums, 
before encryption in crypto devices. Should we consider add some new feature 
flag and offload support in dpdk crypto device?

Regards,
Heng


Re: [dpdk-dev] Offloading L4 checksum to crypto device

2020-02-09 Thread Heng Wang
Hi Fiona,
  Thanks for the info. Do you have a sample application to show how it works? 
It's better to have an example of chaining udp checksum with 
encryption/decryption. I am also wondering to which version it will merge?

Regards,
Heng

-Original Message-
From: Trahe, Fiona  
Sent: den 5 februari 2020 18:23
To: Heng Wang ; Doherty, Declan 

Cc: Filip Pudak ; dev@dpdk.org; Trahe, Fiona 
; Coyle, David 
Subject: RE: Offloading L4 checksum to crypto device

Hi Heng,
This may be of interest for your use-case.
https://patches.dpdk.org/patch/65559/
Fiona

> -Original Message-
> From: dev  On Behalf Of Heng Wang
> Sent: Monday, December 23, 2019 2:28 PM
> To: Doherty, Declan 
> Cc: Filip Pudak ; dev@dpdk.org
> Subject: [dpdk-dev] Offloading L4 checksum to crypto device
> 
> Hi,
>  This is Heng from Ericsson. We are now developing a platform based on 
> dpdk. The platform usually processes the ipsec packets encapsulating udp 
> packets.
>   You know, calculating udp checksum is always time consuming. We 
> found a hardware solution where our crypto device can calculate the udp 
> checksum during encryption.
>  Unfortunately, in DPDK, I didn't find any feature flag for it. Also 
> the API rte_cryptodev_configure doesn't allow us to pass any offload flag to 
> the crypto device.
>  I think this could be a common case to calculate something, like 
> checksums, before encryption in crypto devices. Should we consider add 
> some new feature flag and offload support in dpdk crypto device?
> 
> Regards,
> Heng