date:20240325

[PATCH] doc: add tested platforms with NVIDIA NICs

2024-03-25 Thread Raslan Darawsheh

Add tested platforms with NVIDIA NICs to the 24.03 release notes.

Signed-off-by: Raslan Darawsheh 
---
 doc/guides/rel_notes/release_24_03.rst | 107 +
 1 file changed, 107 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_03.rst 
b/doc/guides/rel_notes/release_24_03.rst
index 14826ea08f..f3917c9172 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -297,3 +297,110 @@ Tested Platforms
This section is a comment. Do not overwrite or remove it.
Also, make sure to start the actual text at the margin.
===
+
+* Intel\ |reg| platforms with NVIDIA\ |reg| NICs combinations
+
+  * CPU:
+
+* Intel\ |reg| Xeon\ |reg| Gold 6154 CPU @ 3.00GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2697A v4 @ 2.60GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2697 v3 @ 2.60GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2680 v2 @ 2.80GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2670 0 @ 2.60GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2650 v4 @ 2.20GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2650 v3 @ 2.30GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2640 @ 2.50GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2650 0 @ 2.00GHz
+* Intel\ |reg| Xeon\ |reg| CPU E5-2620 v4 @ 2.10GHz
+
+  * OS:
+
+* Red Hat Enterprise Linux release 9.1 (Plow)
+* Red Hat Enterprise Linux release 8.6 (Ootpa)
+* Red Hat Enterprise Linux release 8.4 (Ootpa)
+* Red Hat Enterprise Linux Server release 7.9 (Maipo)
+* Red Hat Enterprise Linux Server release 7.6 (Maipo)
+* Ubuntu 22.04
+* Ubuntu 20.04
+* SUSE Enterprise Linux 15 SP2
+
+  * OFED:
+
+* MLNX_OFED 24.01-0.3.3.1 and above
+
+  * upstream kernel:
+
+* Linux 6.8.0 and above
+
+  * rdma-core:
+
+* rdma-core-50.0 and above
+
+  * NICs
+
+* NVIDIA\ |reg| ConnectX\ |reg|-6 Dx EN 100G MCX623106AN-CDAT (2x100G)
+
+  * Host interface: PCI Express 4.0 x16
+  * Device ID: 15b3:101d
+  * Firmware version: 22.40.1000 and above
+
+* NVIDIA\ |reg| ConnectX\ |reg|-6 Lx EN 25G MCX631102AN-ADAT (2x25G)
+
+  * Host interface: PCI Express 4.0 x8
+  * Device ID: 15b3:101f
+  * Firmware version: 26.40.1000 and above
+
+* NVIDIA\ |reg| ConnectX\ |reg|-7 200G CX713106AE-HEA_QP1_Ax (2x200G)
+
+  * Host interface: PCI Express 5.0 x16
+  * Device ID: 15b3:1021
+  * Firmware version: 28.40.1000 and above
+
+* NVIDIA\ |reg| BlueField\ |reg| SmartNIC
+
+  * NVIDIA\ |reg| BlueField\ |reg|-2 SmartNIC MT41686 - MBF2H332A-AEEOT_A1 
(2x25G)
+
+* Host interface: PCI Express 3.0 x16
+* Device ID: 15b3:a2d6
+* Firmware version: 24.40.1000 and above
+
+  * NVIDIA\ |reg| BlueField\ |reg|-3 P-Series DPU MT41692 - 900-9D3B6-00CV-AAB 
(2x200G)
+
+* Host interface: PCI Express 5.0 x16
+* Device ID: 15b3:a2dc
+* Firmware version: 32.40.1000 and above
+
+  * Embedded software:
+
+* Ubuntu 22.04
+* MLNX_OFED 24.01-0.3.3.0 and above
+* DOCA_2.6.0_BSP_4.6.0_Ubuntu_22.04-5.24-01
+* DPDK application running on ARM cores
+
+* IBM Power 9 platforms with NVIDIA\ |reg| NICs combinations
+
+  * CPU:
+
+* POWER9 2.2 (pvr 004e 1202)
+
+  * OS:
+
+* Ubuntu 20.04
+
+  * NICs:
+
+* NVIDIA\ |reg| ConnectX\ |reg|-6 Dx 100G MCX623106AN-CDAT (2x100G)
+
+  * Host interface: PCI Express 4.0 x16
+  * Device ID: 15b3:101d
+  * Firmware version: 22.40.1000 and above
+
+* NVIDIA\ |reg| ConnectX\ |reg|-7 200G CX713106AE-HEA_QP1_Ax (2x200G)
+
+  * Host interface: PCI Express 5.0 x16
+  * Device ID: 15b3:1021
+  * Firmware version: 28.40.1000 and above
+
+  * OFED:
+
+* MLNX_OFED 24.01-0.3.3.1
-- 
2.25.1

Testing report for DPDK release candidate 24.03-rc3

2024-03-25 Thread Wael Abualrub

Hi,

> -Original Message-
> From: Thomas Monjalon 
> Sent: Monday, March 18, 2024 5:46 AM
> To: annou...@dpdk.org
> Subject: release candidate 24.03-rc3
>
> A new DPDK release candidate is ready for testing:
>   https://git.dpdk.org/dpdk/tag/?id=v24.03-rc3
>
> There are 153 new patches in this snapshot.
>
> Release notes:
>   https://doc.dpdk.org/guides/rel_notes/release_24_03.html
>
> As usual, you can report any issue on https://bugs.dpdk.org/
>
> Only documentation and bug fixes should be accepted at this stage.
>
> DPDK 24.03-rc4 should be the last release candidate.
> The final release should be done on 27th if no surprise.
>
> Thank you everyone
>

The following is the testing report for 24.03-rc3.

Note: all tests are passed, and no critical issues were found.

The following is a list of tests that we ran on NVIDIA hardware this release:

- Basic functionality:
   Send and receive multiple types of traffic.
- testpmd xstats counter test.
- testpmd timestamp test.
- Changing/checking link status through testpmd.

- RTE flow tests:
 See: https://doc.dpdk.org/guides/nics/mlx5.html#supported-hardware-offloads

- RSS testing.
- VLAN filtering, stripping and insertion tests.
- Checksum and TSO tests.
- ptype reporting.
- link status interrupt using the example application link_status_interrupt 
tests.
- Interrupt mode using l3fwd-power example application tests.
- Multi process testing using multi process example applications.
- Hardware LRO tests.
- Regex tests.
- Buffer Split.
- Tx scheduling.

Kindest Regards,
Wael Abualrub

[PATCH] net/ixgbe: add param check when tx_queue or rx_queqe is null

2024-03-25 Thread keivinwang

add param check when tx_queue or rx_queqe is null.

Signed-off-by: keivinwang 
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index f6c17d4efb..245b3527db 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -2539,6 +2539,8 @@ static const struct ixgbe_txq_ops def_txq_ops = {
 void __rte_cold
 ixgbe_set_tx_function(struct rte_eth_dev *dev, struct ixgbe_tx_queue *txq)
 {
+   if (txq == NULL)
+   return;
/* Use a simple Tx queue (no offloads, no multi segs) if possible */
if ((txq->offloads == 0) &&
 #ifdef RTE_LIB_SECURITY
@@ -4953,12 +4955,13 @@ ixgbe_set_rx_function(struct rte_eth_dev *dev)
 
for (i = 0; i < dev->data->nb_rx_queues; i++) {
struct ixgbe_rx_queue *rxq = dev->data->rx_queues[i];
-
-   rxq->rx_using_sse = rx_using_sse;
+   if (rxq) {
+   rxq->rx_using_sse = rx_using_sse;
 #ifdef RTE_LIB_SECURITY
-   rxq->using_ipsec = !!(dev->data->dev_conf.rxmode.offloads &
-   RTE_ETH_RX_OFFLOAD_SECURITY);
+   rxq->using_ipsec = 
!!(dev->data->dev_conf.rxmode.offloads &
+   RTE_ETH_RX_OFFLOAD_SECURITY);
 #endif
+   }
}
 }
 
-- 
2.44.0.windows.1

RE: [PATCH] app/testpmd: fix releasing action handle flush memory

2024-03-25 Thread Bing Zhao

Hi Ferruh,

> -Original Message-
> From: Ferruh Yigit 
> Sent: Tuesday, March 19, 2024 10:21 PM
> To: Bing Zhao ; dmitry.kozl...@gmail.com;
> dev@dpdk.org
> Cc: aman.deep.si...@intel.com; yuying.zh...@intel.com; Matan Azrad
> ; sta...@dpdk.org; Dariusz Sosnowski
> 
> Subject: Re: [PATCH] app/testpmd: fix releasing action handle flush memory
> 
> External email: Use caution opening links or attachments
> 
> 
> On 3/19/2024 9:32 AM, Bing Zhao wrote:
> > The memory of the indirect action handles should be freed after being
> > destroyed in the flush. The behavior needs to be consistent with the
> > single handle destroy.
> >
> > Or else, there will be some unexpected error when the action handle is
> > destroyed for the 2nd time, for example, the port needs to be closed
> > again.
> >
> 
> Ports can be closed only once, so above reasoning is not valid, but I assume
> the purpose of this patch is to prevent memory leak, can you please clarify 
> the
> problem and impact of the patch in commit log?

To close the port twice is something I am testing internally of "errno EBUSY", 
I didn't describe it clearly.
At the same time, YES, there is some memory leak should be fixed.

> 
> 
> Also what is "single handle destroy" mentioned above?

I mean the function call port_action_handle_destroy().

> 
> The fixed commit is from a few release ago, so this is not something
> introduced in this release, do you think can this wait next release instead of
> merging for -rc4 which is more risky?

Yes, it is not something that blocking the release. The memory should be 
recycled by the system once the
application quits completely.
I will polish the commit message and send v2.

> 
> 
> > Fixes: f7352c176bbf ("app/testpmd: fix use of indirect action after
> > port close")
> > Cc: dmitry.kozl...@gmail.com
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Bing Zhao 
> > Reviewed-by: Dariusz Sosnowski 
> > ---
> >  app/test-pmd/config.c | 9 +++--
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> >
> > diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index
> > ba1007ace6..f62ba90c87 100644
> > --- a/app/test-pmd/config.c
> > +++ b/app/test-pmd/config.c
> > @@ -1918,8 +1918,7 @@ port_action_handle_flush(portid_t port_id)
> >   /* Poisoning to make sure PMDs update it in case of error. */
> >   memset(&error, 0x44, sizeof(error));
> >   if (pia->handle != NULL) {
> > - ret = pia->type ==
> > -   RTE_FLOW_ACTION_TYPE_INDIRECT_LIST ?
> > + ret = pia->type == RTE_FLOW_ACTION_TYPE_INDIRECT_LIST 
> > ?
> > rte_flow_action_list_handle_destroy
> > (port_id, pia->list_handle, &error) :
> > rte_flow_action_handle_destroy @@ -1929,11
> > +1928,9 @@ port_action_handle_flush(portid_t port_id)
> >  pia->id);
> >   ret = port_flow_complain(&error);
> >   }
> > - tmp = &pia->next;
> > - } else {
> > - *tmp = pia->next;
> > - free(pia);
> >   }
> > + *tmp = pia->next;
> > + free(pia);
> >   }
> >   return ret;
> >  }

Thanks

BR. Bing

Re: [PATCH v2 1/6] ethdev: support setting lanes

2024-03-25 Thread Thomas Monjalon

25/03/2024 07:24, huangdengdui:
> 
> On 2024/3/22 21:58, Thomas Monjalon wrote:
> > 22/03/2024 08:09, Dengdui Huang:
> >> -#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8)  /**< 10 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9)  /**< 20 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_100GRTE_BIT32(14) /**< 100 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_200GRTE_BIT32(15) /**< 200 Gbps */
> >> -#define RTE_ETH_LINK_SPEED_400GRTE_BIT32(16) /**< 400 Gbps */
> >> +#define RTE_ETH_LINK_SPEED_10GRTE_BIT32(8)  /**< 10 Gbps */
> >> +#define RTE_ETH_LINK_SPEED_20GRTE_BIT32(9)  /**< 20 Gbps 
> >> 2lanes */
> >> +#define RTE_ETH_LINK_SPEED_25GRTE_BIT32(10) /**< 25 Gbps */
> >> +#define RTE_ETH_LINK_SPEED_40GRTE_BIT32(11) /**< 40 Gbps 
> >> 4lanes */
> >> +#define RTE_ETH_LINK_SPEED_50GRTE_BIT32(12) /**< 50 Gbps */
> >> +#define RTE_ETH_LINK_SPEED_56GRTE_BIT32(13) /**< 56 Gbps 
> >> 4lanes */
> >> +#define RTE_ETH_LINK_SPEED_100G   RTE_BIT32(14) /**< 100 Gbps */
> >> +#define RTE_ETH_LINK_SPEED_200G   RTE_BIT32(15) /**< 200 Gbps 
> >> 4lanes */
> >> +#define RTE_ETH_LINK_SPEED_400G   RTE_BIT32(16) /**< 400 Gbps 
> >> 4lanes */
> >> +#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17)  /**< 10 Gbps 
> >> 4lanes */
> >> +#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 
> >> lanes */
> >> +#define RTE_ETH_LINK_SPEED_100G_2LANESRTE_BIT32(19) /**< 100 Gbps 2 
> >> lanes */
> >> +#define RTE_ETH_LINK_SPEED_100G_4LANESRTE_BIT32(20) /**< 100 Gbps 
> >> 4lanes */
> >> +#define RTE_ETH_LINK_SPEED_200G_2LANESRTE_BIT32(21) /**< 200 Gbps 
> >> 2lanes */
> >> +#define RTE_ETH_LINK_SPEED_400G_8LANESRTE_BIT32(22) /**< 400 Gbps 
> >> 8lanes */
> > 
> > I don't think it is a good idea to make this more complex.
> > It brings nothing as far as I can see, compared to having speed and lanes 
> > separated.
> > Can we have lanes information a separate value? no need for bitmask.
> > 
> Hi,Thomas, Ajit, roretzla, damodharam
> 
> I also considered the option at the beginning of the design.
> But this option is not used due to the following reasons:
> 1. For the user, ethtool couples speed and lanes.
> The result of querying the NIC capability is as follows:
> Supported link modes:
> 10baseSR4/Full
> 10baseSR2/Full
> The NIC capability is configured as follows:
> ethtool -s eth1 speed 10 lanes 4 autoneg off
> ethtool -s eth1 speed 10 lanes 2 autoneg off
> 
> Therefore, users are more accustomed to the coupling of speed and lanes.
> 
> 2. For the PHY, When the physical layer capability is configured through the 
> MDIO,
> the speed and lanes are also coupled.
> For example:
> Table 45–7—PMA/PMD control 2 register bit definitions[1]
> PMA/PMD type selection
> 1 0 0 1 0 1 0 = 100GBASE-SR2 PMA/PMD
> 0 1 0 1 1 1 1 = 100GBASE-SR4 PMA/PMD
> 
> Therefore, coupling speeds and lanes is easier to understand.
> And it is easier for the driver to report the support lanes.
> 
> In addition, the code implementation is compatible with the old version.
> When the driver does not support the lanes setting, the code does not need to 
> be modified.
> 
> So I think the speed and lanes coupling is better.

I don't think so.
You are mixing hardware implementation, user tool, and API.
Having a separate and simple API is cleaner and not more difficult to handle
in some get/set style functions.

Re: [PATCH] doc: add tested platforms with NVIDIA NICs

2024-03-25 Thread Thomas Monjalon

25/03/2024 08:28, Raslan Darawsheh:
> Add tested platforms with NVIDIA NICs to the 24.03 release notes.
> 
> Signed-off-by: Raslan Darawsheh 

Applied, thanks.

Re: meson option to customize RTE_PKTMBUF_HEADROOM patch

2024-03-25 Thread Bruce Richardson

On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
>So we right now (at WEKA) have a somewhat older version of DPDK that we
>have customized heavily, and I am going to to need to to make the
>headroom *dynamic* (passed in at run time, and per port.)
>We have this requirement because we need payload to be at a specific
>offset, but have to deal with different header lengths for IPv4 and now
>IPv6.
>My reason for pointing this out, is that I would dearly like if we
>could collaborate on this -- this change is going to touch pretty much
>every PMD (we don't need it on all of them as we only support a subset
>of PMDs, but its still a significant set.)
>I'm not sure if anyone else has considered such a need -- this
>particular message caught my eye as I'm looking specifically in this
>area right now.
> 
Hi

thanks for reaching out. Can you clarify a little more as to the need for
this requirement? Can you not just set the headroom value to the max needed
value for any port and use that? Is there an issue with having blank space
at the start of a buffer?

Thanks,
/Bruce

RE: [PATCH v2] doc: update LTS maintenance to 3 years

2024-03-25 Thread Morten Brørup

> From: Thomas Monjalon [mailto:tho...@monjalon.net]
> Sent: Monday, 25 March 2024 00.14
> 
> 17/01/2024 17:24, Kevin Traynor:
> > The existing official policy was to maintain LTS releases for 2 years.
> >
> > 19.11 and 20.11 LTS releases were maintained for 3 years and there was
> > not significant issues caused by code divergence from main etc.
> >
> > Update the policy to indicate 3 years maintenance for LTS releases, but
> > note that it depends on community support.
> >
> > Signed-off-by: Kevin Traynor 
> 
> More opinions, comments or acks?

It is an improvement.

Acked-by: Morten Brørup

[PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Robin Jarry

In some cases, the node context data is used to store two pointers
because the data is larger than the reserved 16 bytes. Having to define
intermediate structures just to be able to cast is tedious. Add two
pointers that take the same space than ctx.

Signed-off-by: Robin Jarry 
---

Notes:
v3:

* Added __extension__ to the unnamed struct inside the union.
* Fixed C++ header checks.
* Replaced alignas() with an explicit static_assert.

v2:

* Added __extension__ (not sure where it is needed, I don't have access to 
windows).
* It still fails the header check for C++. It seems not possible to align 
an unnamed union...
  Tyler, do you have an idea about how to fix that?
* Added static_assert to ensure the anonymous union is not larger than 
RTE_NODE_CTX_SZ.

 lib/graph/rte_graph_worker_common.h | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/lib/graph/rte_graph_worker_common.h 
b/lib/graph/rte_graph_worker_common.h
index 36d864e2c14e..722e9dac0d36 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -12,7 +12,9 @@
  * process, enqueue and move streams of objects to the next nodes.
  */
 
+#include 
 #include 
+#include 
 
 #include 
 #include 
@@ -112,7 +114,19 @@ struct __rte_cache_aligned rte_node {
};
/* Fast path area  */
 #define RTE_NODE_CTX_SZ 16
-   alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node 
Context. */
+   /*
+* alignas(RTE_CACHE_LINE_SIZE) cannot be used for ctx since it is part 
of an unnamed union.
+* The compiler shifts the next field on the next cache line which is 
not what we want.
+* The alignment is enforced via a explcicit static asserts below.
+*/
+   union {
+   uint8_t ctx[RTE_NODE_CTX_SZ];
+   /* Convenience aliases to store pointers without complex 
casting. */
+   __extension__ struct {
+   void *ctx_ptr;
+   void *ctx_ptr2;
+   };
+   }; /**< Node Context. */
uint16_t size;  /**< Total number of objects available. */
uint16_t idx;   /**< Number of objects used. */
rte_graph_off_t off;/**< Offset of node in the graph reel. */
@@ -130,6 +144,11 @@ struct __rte_cache_aligned rte_node {
alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next 
nodes. */
 };
 
+static_assert(offsetof(struct rte_node, ctx) % RTE_CACHE_LINE_SIZE == 0,
+   "rte_node ctx must be aligned on a cache line");
+static_assert(offsetof(struct rte_node, size) - offsetof(struct rte_node, ctx) 
== RTE_NODE_CTX_SZ,
+   "rte_node context union cannot be larger than RTE_NODE_CTX_SZ");
+
 /**
  * @internal
  *
-- 
2.44.0

[PATCH v2] app/testpmd: fix releasing action handle flush memory

2024-03-25 Thread Bing Zhao

The memory of the indirect action handles should be freed after
being destroyed in the flush. The behavior needs to be consistent
with the single handle destroy port_action_handle_destroy().

Or else, there would be some memory leak when closing / detaching a
port without quitting the application. In the meanwhile, since the
action handles are already destroyed, it makes no sense to hold the
indirect action software resources anymore.

Fixes: f7352c176bbf ("app/testpmd: fix use of indirect action after port close")
Cc: dmitry.kozl...@gmail.com
Cc: sta...@dpdk.org

Signed-off-by: Bing Zhao 
Reviewed-by: Dariusz Sosnowski 
---
v2: update the description
---
 app/test-pmd/config.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c
index ba1007ace6..f62ba90c87 100644
--- a/app/test-pmd/config.c
+++ b/app/test-pmd/config.c
@@ -1918,8 +1918,7 @@ port_action_handle_flush(portid_t port_id)
/* Poisoning to make sure PMDs update it in case of error. */
memset(&error, 0x44, sizeof(error));
if (pia->handle != NULL) {
-   ret = pia->type ==
- RTE_FLOW_ACTION_TYPE_INDIRECT_LIST ?
+   ret = pia->type == RTE_FLOW_ACTION_TYPE_INDIRECT_LIST ?
  rte_flow_action_list_handle_destroy
  (port_id, pia->list_handle, &error) :
  rte_flow_action_handle_destroy
@@ -1929,11 +1928,9 @@ port_action_handle_flush(portid_t port_id)
   pia->id);
ret = port_flow_complain(&error);
}
-   tmp = &pia->next;
-   } else {
-   *tmp = pia->next;
-   free(pia);
}
+   *tmp = pia->next;
+   free(pia);
}
return ret;
 }
-- 
2.34.1

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Jerin Jacob

On Mon, Mar 25, 2024 at 3:35 PM Robin Jarry  wrote:
>
> In some cases, the node context data is used to store two pointers
> because the data is larger than the reserved 16 bytes. Having to define
> intermediate structures just to be able to cast is tedious. Add two
> pointers that take the same space than ctx.
>
> Signed-off-by: Robin Jarry 
> ---
>
> Notes:
> v3:
>
> * Added __extension__ to the unnamed struct inside the union.
> * Fixed C++ header checks.
> * Replaced alignas() with an explicit static_assert.
>
> v2:
>
> * Added __extension__ (not sure where it is needed, I don't have access 
> to windows).
> * It still fails the header check for C++. It seems not possible to align 
> an unnamed union...
>   Tyler, do you have an idea about how to fix that?
> * Added static_assert to ensure the anonymous union is not larger than 
> RTE_NODE_CTX_SZ.
>
>  lib/graph/rte_graph_worker_common.h | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/lib/graph/rte_graph_worker_common.h 
> b/lib/graph/rte_graph_worker_common.h
> index 36d864e2c14e..722e9dac0d36 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -12,7 +12,9 @@
>   * process, enqueue and move streams of objects to the next nodes.
>   */
>
> +#include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -112,7 +114,19 @@ struct __rte_cache_aligned rte_node {
> };
> /* Fast path area  */
>  #define RTE_NODE_CTX_SZ 16
> -   alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node 
> Context. */
> +   /*
> +* alignas(RTE_CACHE_LINE_SIZE) cannot be used for ctx since it is 
> part of an unnamed union.
> +* The compiler shifts the next field on the next cache line which is 
> not what we want.
> +* The alignment is enforced via a explcicit static asserts below.
> +*/
> +   union {
> +   uint8_t ctx[RTE_NODE_CTX_SZ];
> +   /* Convenience aliases to store pointers without complex 
> casting. */
> +   __extension__ struct {
> +   void *ctx_ptr;
> +   void *ctx_ptr2;
> +   };
> +   }; /**< Node Context. */
> uint16_t size;  /**< Total number of objects available. */
> uint16_t idx;   /**< Number of objects used. */
> rte_graph_off_t off;/**< Offset of node in the graph reel. */
> @@ -130,6 +144,11 @@ struct __rte_cache_aligned rte_node {
> alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next 
> nodes. */
>  };
>
> +static_assert(offsetof(struct rte_node, ctx) % RTE_CACHE_LINE_SIZE == 0,
> +   "rte_node ctx must be aligned on a cache line");


This will fail in 32bit machine.
https://mails.dpdk.org/archives/test-report/2024-March/623806.html

I can think of following solution to add before ctx.
RTE_MARKER fastpath __rte_cache__aligned;


> +static_assert(offsetof(struct rte_node, size) - offsetof(struct rte_node, 
> ctx) == RTE_NODE_CTX_SZ,
> +   "rte_node context union cannot be larger than RTE_NODE_CTX_SZ");
> +
>  /**
>   * @internal
>   *
> --
> 2.44.0
>

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Robin Jarry


Jerin Jacob, Mar 25, 2024 at 11:59:

> +static_assert(offsetof(struct rte_node, ctx) % RTE_CACHE_LINE_SIZE == 0,
> +   "rte_node ctx must be aligned on a cache line");


This will fail in 32bit machine.
https://mails.dpdk.org/archives/test-report/2024-March/623806.html


Hi Jerin, yes I saw that :(


I can think of following solution to add before ctx.
RTE_MARKER fastpath __rte_cache__aligned;


It will not be taken into account for MSVC. Is that OK?

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Jerin Jacob

On Mon, Mar 25, 2024 at 4:32 PM Robin Jarry  wrote:
>
> Jerin Jacob, Mar 25, 2024 at 11:59:
> > > +static_assert(offsetof(struct rte_node, ctx) % RTE_CACHE_LINE_SIZE == 0,
> > > +   "rte_node ctx must be aligned on a cache line");
> >
> >
> > This will fail in 32bit machine.
> > https://mails.dpdk.org/archives/test-report/2024-March/623806.html
>
> Hi Jerin, yes I saw that :(
>
> > I can think of following solution to add before ctx.
> > RTE_MARKER fastpath __rte_cache__aligned;
>
> It will not be taken into account for MSVC. Is that OK?

Why?. rte_mbuf has a similar scheme.
RTE_MARKER cacheline1 __rte_cache_min_aligned;

>

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Robin Jarry


Jerin Jacob, Mar 25, 2024 at 12:08:

> It will not be taken into account for MSVC. Is that OK?

Why?. rte_mbuf has a similar scheme.
RTE_MARKER cacheline1 __rte_cache_min_aligned;


RTE_MARKER* types seem not defined for the MSVC toolchain.

https://github.com/DPDK/dpdk/blob/v24.03-rc4/lib/eal/include/rte_common.h#L589-L602

Maybe I am missing something.

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Jerin Jacob

On Mon, Mar 25, 2024 at 4:45 PM Robin Jarry  wrote:
>
> Jerin Jacob, Mar 25, 2024 at 12:08:
> > > It will not be taken into account for MSVC. Is that OK?
> >
> > Why?. rte_mbuf has a similar scheme.
> > RTE_MARKER cacheline1 __rte_cache_min_aligned;
>
> RTE_MARKER* types seem not defined for the MSVC toolchain.
>
> https://github.com/DPDK/dpdk/blob/v24.03-rc4/lib/eal/include/rte_common.h#L589-L602

Hmm. Not sure, how mbuf is building for MSCV tool chain then.

Another option could be to have a helper inline function/macro to take
care of casting to make app code clean of casting.

>
> Maybe I am missing something.

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Bruce Richardson

On Mon, Mar 25, 2024 at 05:05:12PM +0530, Jerin Jacob wrote:
> On Mon, Mar 25, 2024 at 4:45 PM Robin Jarry  wrote:
> >
> > Jerin Jacob, Mar 25, 2024 at 12:08:
> > > > It will not be taken into account for MSVC. Is that OK?
> > >
> > > Why?. rte_mbuf has a similar scheme.
> > > RTE_MARKER cacheline1 __rte_cache_min_aligned;
> >
> > RTE_MARKER* types seem not defined for the MSVC toolchain.
> >
> > https://github.com/DPDK/dpdk/blob/v24.03-rc4/lib/eal/include/rte_common.h#L589-L602
> 
> Hmm. Not sure, how mbuf is building for MSCV tool chain then.
> 
> Another option could be to have a helper inline function/macro to take
> care of casting to make app code clean of casting.

The markers are being removed from DPDK and being replaced by more
portable, and more standards-conforming constructs. We should not be adding
more markers to existing structures. See [1]

/Bruce

[1] https://patches.dpdk.org/project/dpdk/list/?series=31579

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread David Marchand

On Mon, Mar 25, 2024 at 12:35 PM Jerin Jacob  wrote:
>
> On Mon, Mar 25, 2024 at 4:45 PM Robin Jarry  wrote:
> >
> > Jerin Jacob, Mar 25, 2024 at 12:08:
> > > > It will not be taken into account for MSVC. Is that OK?
> > >
> > > Why?. rte_mbuf has a similar scheme.
> > > RTE_MARKER cacheline1 __rte_cache_min_aligned;
> >
> > RTE_MARKER* types seem not defined for the MSVC toolchain.

There is some work in progress to stop using those markers.
https://patchwork.dpdk.org/project/dpdk/list/?series=31579&state=*


> >
> > https://github.com/DPDK/dpdk/blob/v24.03-rc4/lib/eal/include/rte_common.h#L589-L602
>
> Hmm. Not sure, how mbuf is building for MSCV tool chain then.

Atm, MSVC builds a really small list of libraries.

http://git.dpdk.org/dpdk/tree/lib/meson.build#n71?id=v24.03-rc4

if is_ms_compiler
libraries = [
'log',
'kvargs',
'telemetry',
'eal',
'ring',
]
endif


>
> Another option could be to have a helper inline function/macro to take
> care of casting to make app code clean of casting.

That could be an option.


-- 
David Marchand

RE: [PATCH v6 01/14] examples/l3fwd: fix queue ID restriction

2024-03-25 Thread Tummala, Sivaprasad

[AMD Official Use Only - General]

Hi,

> -Original Message-
> From: David Marchand 
> Sent: Friday, March 22, 2024 9:11 PM
> To: Tummala, Sivaprasad 
> Cc: david.h...@intel.com; anatoly.bura...@intel.com; jer...@marvell.com;
> radu.nico...@intel.com; gak...@marvell.com; cristian.dumitre...@intel.com; 
> Yigit,
> Ferruh ; konstantin.anan...@huawei.com;
> step...@networkplumber.org; m...@smartsharesystems.com;
> tho...@monjalon.net; dev@dpdk.org; sta...@dpdk.org
> Subject: Re: [PATCH v6 01/14] examples/l3fwd: fix queue ID restriction
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> Hello,
>
> On Thu, Mar 21, 2024 at 7:48 PM Sivaprasad Tummala
>  wrote:
> >
> > Currently application supports queue IDs up to 255
>
> I think it only relates to Rx queue IDs.
>
> Before this patch, the Tx queue count is already stored as a uint32_t or 
> uint16_t
> and checked against RTE_MAX_LCORE.
> So no limit on the Tx queue count side.
>
> Can you just adjust the commitlog accordingly?
OK
>
>
> (One may argue that the Tx queue count should be also checked against
> RTE_MAX_QUEUES_PER_PORT, but it is a separate issue to this patch and in
> practice, we probably always have RTE_MAX_QUEUES_PER_PORT >
> RTE_MAX_LCORE).
>
>
> > and max queues of 256 irrespective of device support.
> > This limits the number of active lcores to 256.
> >
> > The patch fixes these constraints by increasing the queue IDs to
> > support up to 65535.
>
> [snip]
>
> > diff --git a/examples/l3fwd/l3fwd_acl.c b/examples/l3fwd/l3fwd_acl.c
> > index 401692bcec..2bd63181bc 100644
> > --- a/examples/l3fwd/l3fwd_acl.c
> > +++ b/examples/l3fwd/l3fwd_acl.c
> > @@ -997,7 +997,7 @@ acl_main_loop(__rte_unused void *dummy)
> > uint64_t prev_tsc, diff_tsc, cur_tsc;
> > int i, nb_rx;
> > uint16_t portid;
> > -   uint8_t queueid;
> > +   uint16_t queueid;
> > struct lcore_conf *qconf;
> > int socketid;
> > const uint64_t drain_tsc = (rte_get_tsc_hz() + US_PER_S - 1)
> > @@ -1020,7 +1020,7 @@ acl_main_loop(__rte_unused void *dummy)
> > portid = qconf->rx_queue_list[i].port_id;
> > queueid = qconf->rx_queue_list[i].queue_id;
> > RTE_LOG(INFO, L3FWD,
> > -   " -- lcoreid=%u portid=%u rxqueueid=%hhu\n",
> > +   " -- lcoreid=%u portid=%u rxqueueid=%hu\n",
>
> Nit: should be %PRIu16 (idem in other hunks formatting a queue).
OK
>
>
> > lcore_id, portid, queueid);
> > }
> >
>
> [snip]
>
>
> > diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c index
> > 8d32ae1dd5..4d4738b92b 100644
> > --- a/examples/l3fwd/main.c
> > +++ b/examples/l3fwd/main.c
>
> [snip]
>
>
> > @@ -366,7 +366,7 @@ init_lcore_rx_queues(void)
> > nb_rx_queue = lcore_conf[lcore].n_rx_queue;
> > if (nb_rx_queue >= MAX_RX_QUEUE_PER_LCORE) {
> > printf("error: too many queues (%u) for lcore: 
> > %u\n",
> > -   (unsigned)nb_rx_queue + 1, (unsigned)lcore);
> > +   (unsigned int)nb_rx_queue + 1,
> > + (unsigned int)lcore);
>
> Nit: this does not seem related to the patch (probably a split issue, as a 
> later patch
> touches this part of the code too).
Yes, this was done to avoid checkpatch error.
>
>
> > return -1;
> > } else {
> >
> > lcore_conf[lcore].rx_queue_list[nb_rx_queue].port_id = @@ -500,6 +500,8 @@
> parse_config(const char *q_arg)
> > char *str_fld[_NUM_FLD];
> > int i;
> > unsigned size;
> > +   uint16_t max_fld[_NUM_FLD] = {USHRT_MAX,
> > +   USHRT_MAX, UCHAR_MAX};
>
> Nit: no newline.
>
> This part validates user input for the rx queue used by a lcore.
> Some later check in the example (or in ethdev) may raise an error if 
> requesting too
> many queues, but I think the limit here should be RTE_MAX_QUEUES_PER_PORT.
>
> Besides, this hunk also changes the check on max port and max lcore.
> This is something that should be left untouched at this point of the series.
>
> I would expect something like:
> uint16_t max_fld[_NUM_FLD] = {255, RTE_MAX_QUEUES_PER_PORT, 255};
I agree on the RTE_MAX_QUEUES_PER_PORT, but port_id is already uint16_t and
Hence USHRT_MAX is relevant and similarly UCHAR_MAX expands to 255.
>
>
> >
> > nb_lcore_params = 0;
> >
> > @@ -518,7 +520,8 @@ parse_config(const char *q_arg)
> > for (i = 0; i < _NUM_FLD; i++){
> > errno = 0;
> > int_fld[i] = strtoul(str_fld[i], &end, 0);
> > -   if (errno != 0 || end == str_fld[i] || int_fld[i] > 
> > 255)
> > +   if (errno != 0 || end == str_fld[i] || int_fld[i] >
> > +
> > + max_fld[i])
>
> Nit: no newline.
OK
>
> >

Re: [PATCH 0/3] support setting lanes

2024-03-25 Thread Jerin Jacob

On Fri, Mar 22, 2024 at 7:21 PM Thomas Monjalon  wrote:
>
> 22/03/2024 06:51, Jerin Jacob:
> > On Fri, Mar 22, 2024 at 10:56 AM Ajit Khaparde
> >  wrote:
> > >
> > > On Thu, Mar 21, 2024 at 9:39 PM Jerin Jacob  wrote:
> > > >
> > > > On Fri, Mar 22, 2024 at 7:58 AM huangdengdui  
> > > > wrote:
> > > > >
> > > > >
> > > > >

> > > > For example, If FW configures, 100G port as 100GBASE-SR2 then two
> > > > ethdev(port 0 and port1) will show up.
> > > > Now, assume if we expose this API and Can end user configure port 1 as
> > > > 25G lines if so,
> > > > a) What happens to port0 and it states?
> > > There should be no impact to port0.
> > >
> > > > b) Will port2, port3 will show up after issuing this API(As end user
> > > > configured 25Gx4 for 100G)? Will application needs to hotplug to get
> > > > use ports.
> > > No. The port count does not change. Nor does the number of PCI
> > > functions seen by the host. Unless designed otherwise.
> > >
> > > Changing the lane count does not change anything in physical terms.
> > > What changes is the modulation or the signaling scheme.
> > > The number of lanes which can be supported is determined by
> > > the PHY itself and the cables used and needs to be negotiated 
> > > appropriately
> > > with the remote partner - which is just like using forced Ethernet Speed
> > > instead of auto-negotiated speeds.
>
> Thanks for the explanation Ajit.
>
> > OK. It looks like platform independent then. At least cnxk driver, End
> > user cannot simplify change the line config parameters
> > while traffic is active also, it looks like other drivers need to have
> > SerDes training with remote partner while reconfiguring it.
> >
> > At least on cnxk platform, 25Gx4 on 100G will show as 4 ethdev devices.
>
> That's a strange behaviour.
> Why showing 4 ports which are not independent?

I checked SerDes + NIC configuration again. It supports both modes.
Show up as One port vs four ports.

>
> > Having said that, If other NICs support this feature without
> > disturbing current port states, I don't have an objection to this API.
>
>
>

[RFC] graph/node: feedback and future improvements

2024-03-25 Thread Robin Jarry


Hi Jerin, all,

I apologize in advance for the long email :)

I am working on a project [1] that uses rte_graph extensively. In the 
course of action, I have stumbled upon a few issues. I managed to work 
around them for now, but I'd like to get more insights about long term 
solutions.


Per Rx/Tx queue nodes
=

In the in-built nodes and in the examples, there is one ethdev_rx and 
ethdev_tx node per rx/tx queue [2].


Is there a technical reason for this design? Does it make sense to have 
only one of each ethdev_rx and ethdev_tx nodes per graph? For simplicity 
and to make dynamic rxq changes possible, I chose to have a single rx 
& tx node per graph. Do you think we could change the in-built nodes to 
support both modes ?


Having multiple instances of the same node in a graph complicates 
instantiation as it requires cloning the nodes with unique names. Also, 
it makes dynamic configuration of ports even more complicated without 
shutting down everything first since some nodes will be part of an 
active graph and there may be conflicts.


Speaking of graph reloading, I found that the in-built ethdev_tx TX 
queue id is initialized to graph_id [3]. This seems odd. If there are 
multiple rounds of graph create/destroy, the id will become invalid.


Dynamic graph and nodes construction/destruction


I need to deal with reconfiguration of the graphs at runtime. This can 
happen on multiple occasions: port addition/suppression, number of rxq 
change, rxq size change, etc.


I could not manage to reuse the in-built nodes because of the issues 
raised in the previous point.


Could we change the in-built nodes to better support dynamic reloading? 
Maybe this only applies to nodes that may appear multiple times in the 
same graph (rx/tx).


Node context data
=

There is no way to prepare node data context when calling 
rte_graph_create(). The current implementation uses global variables [4] 
but this makes it very "static".


It would be nice to pass prepared context data for every node on graph 
creation, either via a config parameter (better) or via another 
mechanism. I currently do this via a hash map but it requires a global 
hash as well which may not be the best solution.


I tried patching the graph library code myself but after struggling, 
I thought it would be best to discuss things first.


Pluggable nodes
===

Currently, the declaration of next nodes is static. In order to support 
plugins (e.g. via dlopen() or similar), could we introduce a way to 
dynamically insert a node in the graph?


I have done this using a post-init callback system but we could think of 
a different way.


Also, could we allow overriding nodes with RTE_NODE_REGISTER()? So users 
can replace the default implementation with their own if they need to.


Move data pointer of packets?
=

This final point is more a global design question. In traditional 
networking stacks, each layer of the stack receives pbuffers that are 
"adjusted" to their respective network header in the packet (i.e. the IP 
lookup function will receive a buffer that points to the beginning of 
the IP header, regardless of what was the transport protocol, plain 
ethernet, Ethernet + VLAN, IP in IP, etc.).


Would it make sense to have a similar mechanism when designing an 
application with rte_graph?


Thanks in advance for your replies.

Cheers!

--
Robin

[1] https://github.com/rjarry/brouter
[2] https://github.com/DPDK/dpdk/blob/v23.11/lib/node/ethdev_rx.c#L28
[3] https://github.com/DPDK/dpdk/blob/v23.11/lib/node/ethdev_tx.c#L56
[4] https://github.com/DPDK/dpdk/blob/v23.11/lib/node/ethdev_ctrl.c

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Robin Jarry


Jerin Jacob, Mar 25, 2024 at 12:35:

Another option could be to have a helper inline function/macro to take
care of casting to make app code clean of casting.


Would something like this be suitable?

#define RTE_NODE_CTX_PTR1(n) ((void **)(n)->ctx)[0]
#define RTE_NODE_CTX_PTR2(n) ((void **)(n)->ctx)[1]

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Jerin Jacob

On Mon, Mar 25, 2024 at 8:50 PM Robin Jarry  wrote:
>
> Jerin Jacob, Mar 25, 2024 at 12:35:
> > Another option could be to have a helper inline function/macro to take
> > care of casting to make app code clean of casting.
>
> Would something like this be suitable?
>
> #define RTE_NODE_CTX_PTR1(n) ((void **)(n)->ctx)[0]
> #define RTE_NODE_CTX_PTR2(n) ((void **)(n)->ctx)[1]

Works for me. No strong opinion about the name, RTE_NODE_CTX_AS_PTR1
may be more reflecting the intent.

>

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Robin Jarry


Jerin Jacob, Mar 25, 2024 at 16:47:

> #define RTE_NODE_CTX_PTR1(n) ((void **)(n)->ctx)[0]
> #define RTE_NODE_CTX_PTR2(n) ((void **)(n)->ctx)[1]

Works for me. No strong opinion about the name, RTE_NODE_CTX_AS_PTR1
may be more reflecting the intent.


I also thought about adding inline getter/setter functions but that's 
more code. It may be cleaner:


static inline void *rte_node_ctx_ptr1_get(struct rte_node *n) {
return ((void **)node->ctx)[0];
}
static inline void *rte_node_ctx_ptr2_get(struct rte_node *n) {
return ((void **)node->ctx)[1];
}
static inline void rte_node_ctx_ptr1_set(struct rte_node *n, void *p) {
((void **)node->ctx)[0] = p;
}
static inline void rte_node_ctx_ptr2_set(struct rte_node *n, void *p) {
((void **)node->ctx)[1] = p;
}

I don't have a strong opinion. I'll go either way.

[PATCH] graph: avoid accessing graph list when getting stats

2024-03-25 Thread Robin Jarry

In rte_graph_cluster_stats_get, the walk model of the first graph is
checked to determine if multi-core dispatch specific counters should be
updated or not. This global list is accessed without any locks.

If the global list is modified by another thread while
rte_graph_cluster_stats_get is called, it can result in undefined
behaviour.

Adding a lock would make it impossible to call
rte_graph_cluster_stats_get in packet processing code paths. Avoid
accessing the global list instead by storing a bool field in the private
rte_graph_cluster_stats structure.

Also update the default callback to avoid accessing the global list and
use a different default callback depending on the graph model.

Signed-off-by: Robin Jarry 
---
 lib/graph/graph_stats.c | 60 ++---
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c
index 2fb808b21ec5..f8048e08be8a 100644
--- a/lib/graph/graph_stats.c
+++ b/lib/graph/graph_stats.c
@@ -34,6 +34,7 @@ struct __rte_cache_aligned rte_graph_cluster_stats {
uint32_t cluster_node_size; /* Size of struct cluster_node */
rte_node_t max_nodes;
int socket_id;
+   bool dispatch;
void *cookie;
size_t sz;
 
@@ -74,17 +75,16 @@ print_banner_dispatch(FILE *f)
 }
 
 static inline void
-print_banner(FILE *f)
+print_banner(FILE *f, bool dispatch)
 {
-   if 
(rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
-   RTE_GRAPH_MODEL_MCORE_DISPATCH)
+   if (dispatch)
print_banner_dispatch(f);
else
print_banner_default(f);
 }
 
 static inline void
-print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat)
+print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat, bool 
dispatch)
 {
double objs_per_call, objs_per_sec, cycles_per_call, ts_per_hz;
const uint64_t prev_calls = stat->prev_calls;
@@ -104,8 +104,7 @@ print_node(FILE *f, const struct 
rte_graph_cluster_node_stats *stat)
objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0;
objs_per_sec /= 100;
 
-   if 
(rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph) ==
-   RTE_GRAPH_MODEL_MCORE_DISPATCH) {
+   if (dispatch) {
fprintf(f,
"|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64
"|%-15" PRIu64 "|%-15" PRIu64
@@ -123,20 +122,17 @@ print_node(FILE *f, const struct 
rte_graph_cluster_node_stats *stat)
 }
 
 static int
-graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie,
+graph_cluster_stats_cb(bool dispatch, bool is_first, bool is_last, void 
*cookie,
   const struct rte_graph_cluster_node_stats *stat)
 {
FILE *f = cookie;
-   int model;
-
-   model = 
rte_graph_worker_model_get(STAILQ_FIRST(graph_list_head_get())->graph);
 
if (unlikely(is_first))
-   print_banner(f);
+   print_banner(f, dispatch);
if (stat->objs)
-   print_node(f, stat);
+   print_node(f, stat, dispatch);
if (unlikely(is_last)) {
-   if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+   if (dispatch)
boarder_model_dispatch();
else
boarder();
@@ -145,6 +141,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void 
*cookie,
return 0;
 };
 
+static int
+graph_cluster_stats_cb_rtc(bool is_first, bool is_last, void *cookie,
+  const struct rte_graph_cluster_node_stats *stat)
+{
+   return graph_cluster_stats_cb(false, is_first, is_last, cookie, stat);
+};
+
+static int
+graph_cluster_stats_cb_dispatch(bool is_first, bool is_last, void *cookie,
+   const struct rte_graph_cluster_node_stats *stat)
+{
+   return graph_cluster_stats_cb(true, is_first, is_last, cookie, stat);
+};
+
 static struct rte_graph_cluster_stats *
 stats_mem_init(struct cluster *cluster,
   const struct rte_graph_cluster_stats_param *prm)
@@ -157,8 +167,16 @@ stats_mem_init(struct cluster *cluster,
 
/* Fix up callback */
fn = prm->fn;
-   if (fn == NULL)
-   fn = graph_cluster_stats_cb;
+   if (fn == NULL) {
+   for (int i = 0; i < cluster->nb_graphs; i++) {
+   const struct rte_graph *graph = 
cluster->graphs[i]->graph;
+   if (graph->model == RTE_GRAPH_MODEL_MCORE_DISPATCH)
+   fn = graph_cluster_stats_cb_dispatch;
+   else
+   fn = graph_cluster_stats_cb_rtc;
+   break;
+   }
+   }
 
cluster_node_size = sizeof(struct cluster_node);
/* For a given cluster, max nodes will be the max number of graphs */
@@ -350,6 +368,8 @@ rte_graph_cluster_stat

[PATCH] graph: avoid id collisions

2024-03-25 Thread Robin Jarry

The graph id is determined based on a global variable that is
incremented every time a graph is created, and decremented every time
a graph is destroyed. This only works if graphs are destroyed in the
reverse order in which they have been created.

The following code produces duplicate graph IDs which can lead to
use-after-free bugs and other undefined behaviours:

  a = rte_graph_create(...); // id=0 graph_id=1
  b = rte_graph_create(...); // id=1 graph_id=2
  rte_graph_destroy(a);  // graph_id=1
  c = rte_graph_create(...); // id=1 graph_id=2 (duplicate with b)
  rte_graph_destroy(c);  // frees memory still used by b

Remove the global counter. Make sure that the graph list is always
ordered by increasing graph ids. When creating a new graph, pick a free
id which is not allocated.

Signed-off-by: Robin Jarry 
---
 lib/graph/graph.c | 86 +--
 1 file changed, 69 insertions(+), 17 deletions(-)

diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 147bc6c685c5..50616580d06b 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -19,11 +19,54 @@
 
 static struct graph_head graph_list = STAILQ_HEAD_INITIALIZER(graph_list);
 static rte_spinlock_t graph_lock = RTE_SPINLOCK_INITIALIZER;
-static rte_graph_t graph_id;
-
-#define GRAPH_ID_CHECK(id) ID_CHECK(id, graph_id)
 
 /* Private functions */
+static struct graph *
+graph_from_id(rte_graph_t id)
+{
+   struct graph *graph;
+   STAILQ_FOREACH(graph, &graph_list, next) {
+   if (graph->id == id)
+   return graph;
+   }
+   rte_errno = EINVAL;
+   return NULL;
+}
+
+static rte_graph_t
+graph_next_free_id(void)
+{
+   struct graph *graph;
+   rte_graph_t id = 0;
+
+   STAILQ_FOREACH(graph, &graph_list, next) {
+   if (id < graph->id)
+   break;
+   id = graph->id + 1;
+   }
+
+   return id;
+}
+
+static void
+graph_insert_ordered(struct graph *graph)
+{
+   struct graph *after, *g;
+
+   after = NULL;
+   STAILQ_FOREACH(g, &graph_list, next) {
+   if (g->id < graph->id) {
+   after = g;
+   break;
+   }
+   }
+   if (after == NULL) {
+   STAILQ_INSERT_TAIL(&graph_list, graph, next);
+   } else {
+   STAILQ_INSERT_AFTER(&graph_list, after, graph, next);
+   }
+}
+
 struct graph_head *
 graph_list_head_get(void)
 {
@@ -279,7 +322,8 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, 
int lcore)
 {
struct graph *graph;
 
-   GRAPH_ID_CHECK(id);
+   if (graph_from_id(id) == NULL)
+   goto fail;
if (!rte_lcore_is_enabled(lcore))
SET_ERR_JMP(ENOLINK, fail, "lcore %d not enabled", lcore);
 
@@ -309,7 +353,8 @@ rte_graph_model_mcore_dispatch_core_unbind(rte_graph_t id)
 {
struct graph *graph;
 
-   GRAPH_ID_CHECK(id);
+   if (graph_from_id(id) == NULL)
+   goto fail;
STAILQ_FOREACH(graph, &graph_list, next)
if (graph->id == id)
break;
@@ -406,7 +451,7 @@ rte_graph_create(const char *name, struct rte_graph_param 
*prm)
graph->socket = prm->socket_id;
graph->src_node_count = src_node_count;
graph->node_count = graph_nodes_count(graph);
-   graph->id = graph_id;
+   graph->id = graph_next_free_id();
graph->parent_id = RTE_GRAPH_ID_INVALID;
graph->lcore_id = RTE_MAX_LCORE;
graph->num_pkt_to_capture = prm->num_pkt_to_capture;
@@ -422,8 +467,7 @@ rte_graph_create(const char *name, struct rte_graph_param 
*prm)
goto graph_mem_destroy;
 
/* All good, Lets add the graph to the list */
-   graph_id++;
-   STAILQ_INSERT_TAIL(&graph_list, graph, next);
+   graph_insert_ordered(graph);
 
graph_spinlock_unlock();
return graph->id;
@@ -467,7 +511,6 @@ rte_graph_destroy(rte_graph_t id)
graph_cleanup(graph);
STAILQ_REMOVE(&graph_list, graph, graph, next);
free(graph);
-   graph_id--;
goto done;
}
graph = tmp;
@@ -520,7 +563,7 @@ graph_clone(struct graph *parent_graph, const char *name, 
struct rte_graph_param
graph->parent_id = parent_graph->id;
graph->lcore_id = parent_graph->lcore_id;
graph->socket = parent_graph->socket;
-   graph->id = graph_id;
+   graph->id = graph_next_free_id();
 
/* Allocate the Graph fast path memory and populate the data */
if (graph_fp_mem_create(graph))
@@ -539,8 +582,7 @@ graph_clone(struct graph *parent_graph, const char *name, 
struct rte_graph_param
goto graph_mem_destroy;
 
/* All good, Lets add the graph to the list */
-   graph_id++;
-   STAILQ_INSERT_TAIL(&graph_list, graph, next);
+   graph_insert_o

Re: [PATCH v3] graph: expose node context as pointers

2024-03-25 Thread Jerin Jacob

On Mon, Mar 25, 2024 at 9:21 PM Robin Jarry  wrote:
>
> Jerin Jacob, Mar 25, 2024 at 16:47:
> > > #define RTE_NODE_CTX_PTR1(n) ((void **)(n)->ctx)[0]
> > > #define RTE_NODE_CTX_PTR2(n) ((void **)(n)->ctx)[1]
> >
> > Works for me. No strong opinion about the name, RTE_NODE_CTX_AS_PTR1
> > may be more reflecting the intent.
>
> I also thought about adding inline getter/setter functions but that's
> more code. It may be cleaner:
>
>  static inline void *rte_node_ctx_ptr1_get(struct rte_node *n) {
>  return ((void **)node->ctx)[0];
>  }
>  static inline void *rte_node_ctx_ptr2_get(struct rte_node *n) {
>  return ((void **)node->ctx)[1];
>  }
>  static inline void rte_node_ctx_ptr1_set(struct rte_node *n, void *p) {
>  ((void **)node->ctx)[0] = p;
>  }
>  static inline void rte_node_ctx_ptr2_set(struct rte_node *n, void *p) {
>  ((void **)node->ctx)[1] = p;
>  }
>
> I don't have a strong opinion. I'll go either way.

Inline is better.

>

[PATCH v4] graph: expose node context as pointers

2024-03-25 Thread Robin Jarry

In some cases, the node context data is used to store two pointers
because the data is larger than the reserved 16 bytes. Having to define
intermediate structures just to be able to cast is tedious. Add helper
inline functions to cast the node context to opaque pointers.

Signed-off-by: Robin Jarry 
---

Notes:
v4:

* Replaced the unnamed union with helper inline functions.

v3:

* Added __extension__ to the unnamed struct inside the union.
* Fixed C++ header checks.
* Replaced alignas() with an explicit static_assert.

 lib/graph/rte_graph_worker_common.h | 54 +
 1 file changed, 54 insertions(+)

diff --git a/lib/graph/rte_graph_worker_common.h 
b/lib/graph/rte_graph_worker_common.h
index 36d864e2c14e..f54f65598193 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -130,6 +130,60 @@ struct __rte_cache_aligned rte_node {
alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next 
nodes. */
 };
 
+/**
+ * Cast the first 8 bytes of node context as an opaque pointer.
+ *
+ * @param node
+ *   Pointer to the node object.
+ *
+ * @return
+ *   The opaque pointer value.
+ */
+static inline void *rte_node_ctx_ptr1_get(struct rte_node *node)
+{
+   return ((void **)node->ctx)[0];
+}
+
+/**
+ * Cast the last 8 bytes of node context as an opaque pointer.
+ *
+ * @param node
+ *   Pointer to the node object.
+ *
+ * @return
+ *   The opaque pointer value.
+ */
+static inline void *rte_node_ctx_ptr2_get(struct rte_node *node)
+{
+   return ((void **)node->ctx)[1];
+}
+
+/**
+ * Set the first 8 bytes of node context to an opaque pointer value.
+ *
+ * @param node
+ *   Pointer to the node object.
+ * @param ptr
+ *   The opaque pointer value.
+ */
+static inline void rte_node_ctx_ptr1_set(struct rte_node *node, void *ptr)
+{
+   ((void **)node->ctx)[0] = ptr;
+}
+
+/**
+ * Set the last 8 bytes of node context to an opaque pointer value.
+ *
+ * @param node
+ *   Pointer to the node object.
+ * @param ptr
+ *   The opaque pointer value.
+ */
+static inline void rte_node_ctx_ptr2_set(struct rte_node *node, void *ptr)
+{
+   ((void **)node->ctx)[1] = ptr;
+}
+
 /**
  * @internal
  *
-- 
2.44.0

Re: [PATCH v2 38/45] bus/vmbus: use rte stdatomic API

2024-03-25 Thread Tyler Retzlaff

On Fri, Mar 22, 2024 at 07:34:46PM +, Long Li wrote:
> >  static inline void
> >  vmbus_set_monitor(const struct vmbus_channel *channel, uint32_t monitor_id)
> > {
> > -   uint32_t *monitor_addr, monitor_mask;
> > +   RTE_ATOMIC(uint32_t) *monitor_addr, monitor_mask;
> 
> Does this mean monitor_mask will also change to RTE_ATOMIC(uint32_t)? 
> 
> Seems not necessary.

looks like a mistake, i will review and make clear in next revision.

thanks for spotting it.

> 
> > unsigned int trigger_index;
> > 
> > trigger_index = monitor_id / HV_MON_TRIG_LEN;
> > monitor_mask = 1u << (monitor_id % HV_MON_TRIG_LEN);
> > 
> > -   monitor_addr = &channel->monitor_page->trigs[trigger_index].pending;
> > +   monitor_addr =
> > +   (uint32_t __rte_atomic
> > +*)&channel->monitor_page->trigs[trigger_index].pending;
> > vmbus_sync_set_bit(monitor_addr, monitor_mask);  }
> > 
> > --
> > 1.8.3.1

Re: [PATCH v4] graph: expose node context as pointers

2024-03-25 Thread Jerin Jacob

On Mon, Mar 25, 2024 at 10:11 PM Robin Jarry  wrote:
>
> In some cases, the node context data is used to store two pointers
> because the data is larger than the reserved 16 bytes. Having to define
> intermediate structures just to be able to cast is tedious. Add helper
> inline functions to cast the node context to opaque pointers.
>
> Signed-off-by: Robin Jarry 
> ---
>
> Notes:
> v4:
>
> * Replaced the unnamed union with helper inline functions.
>
> v3:
>
> * Added __extension__ to the unnamed struct inside the union.
> * Fixed C++ header checks.
> * Replaced alignas() with an explicit static_assert.
>
>  lib/graph/rte_graph_worker_common.h | 54 +
>  1 file changed, 54 insertions(+)
>
> diff --git a/lib/graph/rte_graph_worker_common.h 
> b/lib/graph/rte_graph_worker_common.h
> index 36d864e2c14e..f54f65598193 100644
> --- a/lib/graph/rte_graph_worker_common.h
> +++ b/lib/graph/rte_graph_worker_common.h
> @@ -130,6 +130,60 @@ struct __rte_cache_aligned rte_node {
> alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next 
> nodes. */
>  };
>
> +/**
> + * Cast the first 8 bytes of node context as an opaque pointer.
> + *
> + * @param node
> + *   Pointer to the node object.
> + *
> + * @return
> + *   The opaque pointer value.
> + */
> +static inline void *rte_node_ctx_ptr1_get(struct rte_node *node)

New line after void*. Same for all new functions.

With that change:

Acked-by: Jerin Jacob 



> +{
> +   return ((void **)node->ctx)[0];
> +}
> +
> +/**
> + * Cast the last 8 bytes of node context as an opaque pointer.
> + *
> + * @param node
> + *   Pointer to the node object.
> + *
> + * @return
> + *   The opaque pointer value.
> + */
> +static inline void *rte_node_ctx_ptr2_get(struct rte_node *node)
> +{
> +   return ((void **)node->ctx)[1];
> +}
> +
> +/**
> + * Set the first 8 bytes of node context to an opaque pointer value.
> + *
> + * @param node
> + *   Pointer to the node object.
> + * @param ptr
> + *   The opaque pointer value.
> + */
> +static inline void rte_node_ctx_ptr1_set(struct rte_node *node, void *ptr)
> +{
> +   ((void **)node->ctx)[0] = ptr;
> +}
> +
> +/**
> + * Set the last 8 bytes of node context to an opaque pointer value.
> + *
> + * @param node
> + *   Pointer to the node object.
> + * @param ptr
> + *   The opaque pointer value.
> + */
> +static inline void rte_node_ctx_ptr2_set(struct rte_node *node, void *ptr)
> +{
> +   ((void **)node->ctx)[1] = ptr;
> +}
> +
>  /**
>   * @internal
>   *
> --
> 2.44.0
>

[RFC PATCH v3] net/iavf: support rte flow with mask for FDIR

2024-03-25 Thread Ian Stokes

From: Ananth S 

This patch supports rte flow with mask for FDIR, including
eth/ipv4/ipv6/tcp/udp flow items, where src/dst for ipv4/ipv6
and sport/dport for tcp/udp are realized by switch filter.

This patch additionally contains the fixes for the issues
identified in the patch [21.11.5 v2].

This patch is based on DPDK v21.11.5 LTS
[4e50ad4469f7c037e32de5aa3535d1cd25de0741], for customer cherry-pick.

Signed-off-by: Zhichao Zeng 
Signed-off-by: Ananth S 
---
 drivers/common/iavf/virtchnl.h |  40 -
 drivers/net/iavf/iavf_ethdev.c |  26 +++
 drivers/net/iavf/iavf_fdir.c   | 305 ++---
 drivers/net/iavf/iavf_hash.c   | 112 ++--
 drivers/net/ice/ice_ethdev.c   |  26 +++
 lib/ethdev/rte_ethdev.h|   3 +
 6 files changed, 320 insertions(+), 192 deletions(-)

diff --git a/drivers/common/iavf/virtchnl.h b/drivers/common/iavf/virtchnl.h
index 80e754a1b2..bc8f355db1 100644
--- a/drivers/common/iavf/virtchnl.h
+++ b/drivers/common/iavf/virtchnl.h
@@ -1482,6 +1482,8 @@ enum virtchnl_vfr_states {
 };
 
 #define VIRTCHNL_MAX_NUM_PROTO_HDRS32
+#define VIRTCHNL_MAX_NUM_PROTO_HDRS_W_MSK 16
+#define VIRTCHNL_MAX_SIZE_RAW_PACKET 1024
 #define PROTO_HDR_SHIFT5
 #define PROTO_HDR_FIELD_START(proto_hdr_type) \
(proto_hdr_type << PROTO_HDR_SHIFT)
@@ -1669,6 +1671,22 @@ struct virtchnl_proto_hdr {
 
 VIRTCHNL_CHECK_STRUCT_LEN(72, virtchnl_proto_hdr);
 
+struct virtchnl_proto_hdr_w_msk {
+   /* see enum virtchnl_proto_hdr_type */
+   s32 type;
+   u32 pad;
+   /**
+* binary buffer in network order for specific header type.
+* For example, if type = VIRTCHNL_PROTO_HDR_IPV4, a IPv4
+* header is expected to be copied into the buffer.
+*/
+   u8 buffer_spec[64];
+   /* binary buffer for bit-mask applied to specific header type */
+   u8 buffer_mask[64];
+};
+
+VIRTCHNL_CHECK_STRUCT_LEN(136, virtchnl_proto_hdr_w_msk);
+
 struct virtchnl_proto_hdrs {
u8 tunnel_level;
/**
@@ -1678,8 +1696,26 @@ struct virtchnl_proto_hdrs {
 * 2 - from the second inner layer
 * 
 **/
-   int count; /* the proto layers must < VIRTCHNL_MAX_NUM_PROTO_HDRS */
-   struct virtchnl_proto_hdr proto_hdr[VIRTCHNL_MAX_NUM_PROTO_HDRS];
+   int count;
+   /**
+* count must <=
+* VIRTCHNL_MAX_NUM_PROTO_HDRS + VIRTCHNL_MAX_NUM_PROTO_HDRS_W_MSK
+* count = 0 :  select raw
+* 1 < count <= VIRTCHNL_MAX_NUM_PROTO_HDRS :   select proto_hdr
+* count > VIRTCHNL_MAX_NUM_PROTO_HDRS :select proto_hdr_w_msk
+* last valid index = count - VIRTCHNL_MAX_NUM_PROTO_HDRS
+*/
+   union {
+   struct virtchnl_proto_hdr
+   proto_hdr[VIRTCHNL_MAX_NUM_PROTO_HDRS];
+   struct virtchnl_proto_hdr_w_msk
+   proto_hdr_w_msk[VIRTCHNL_MAX_NUM_PROTO_HDRS_W_MSK];
+   struct {
+   u16 pkt_len;
+   u8 spec[VIRTCHNL_MAX_SIZE_RAW_PACKET];
+   u8 mask[VIRTCHNL_MAX_SIZE_RAW_PACKET];
+   } raw;
+   };
 };
 
 VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
diff --git a/drivers/net/iavf/iavf_ethdev.c b/drivers/net/iavf/iavf_ethdev.c
index e40af1316d..93d4d03eb9 100644
--- a/drivers/net/iavf/iavf_ethdev.c
+++ b/drivers/net/iavf/iavf_ethdev.c
@@ -500,6 +500,11 @@ iavf_init_rss(struct iavf_adapter *adapter)
uint16_t i, j, nb_q;
int ret;
 
+#ifdef RTE_ETHDEV_EXCL_Q
+   uint16_t q_index = 0;
+   uint16_t excl_q = RSS_RETA_EXCL_Q_ID;
+#endif /* RTE_ETHDEV_EXCL_Q */
+
rss_conf = &adapter->dev_data->dev_conf.rx_adv_conf.rss_conf;
nb_q = RTE_MIN(adapter->dev_data->nb_rx_queues,
   vf->max_rss_qregion);
@@ -525,6 +530,27 @@ iavf_init_rss(struct iavf_adapter *adapter)
j = 0;
vf->rss_lut[i] = j;
}
+
+#ifdef RTE_ETHDEV_EXCL_Q
+   if (nb_q > 1) {
+   if (excl_q > nb_q) {
+   /* if excl_q index is higher than the max_queue, assign 
default
+* RSS LUT
+*/
+   for (i = 0; i < vf->vf_res->rss_lut_size; i++)
+   vf->rss_lut[i] = i % nb_q;
+
+   } else {
+   for (i = 0, q_index = 0; i < vf->vf_res->rss_lut_size; 
i++, q_index++) {
+   /* Increment q_index to skip excl_q */
+   if (q_index % nb_q == excl_q)
+   q_index++;
+
+   vf->rss_lut[i] = q_index % (nb_q);
+   }
+   }
+   }
+#endif /* RTE_ETHDEV_EXCL_Q */
/* send virtchnl ops to configure RSS */
ret = iavf_configure_rss_lut(adapter);
if (ret)
diff

Re: meson option to customize RTE_PKTMBUF_HEADROOM patch

2024-03-25 Thread Stephen Hemminger

On Mon, 25 Mar 2024 10:01:52 +
Bruce Richardson  wrote:

> On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> >So we right now (at WEKA) have a somewhat older version of DPDK that we
> >have customized heavily, and I am going to to need to to make the
> >headroom *dynamic* (passed in at run time, and per port.)
> >We have this requirement because we need payload to be at a specific
> >offset, but have to deal with different header lengths for IPv4 and now
> >IPv6.
> >My reason for pointing this out, is that I would dearly like if we
> >could collaborate on this -- this change is going to touch pretty much
> >every PMD (we don't need it on all of them as we only support a subset
> >of PMDs, but its still a significant set.)
> >I'm not sure if anyone else has considered such a need -- this
> >particular message caught my eye as I'm looking specifically in this
> >area right now.
> >   
> Hi
> 
> thanks for reaching out. Can you clarify a little more as to the need for
> this requirement? Can you not just set the headroom value to the max needed
> value for any port and use that? Is there an issue with having blank space
> at the start of a buffer?
> 
> Thanks,
> /Bruce

If you have to make such a deep change across all PMD's then maybe
it is not the best solution. What about being able to do some form of buffer
chaining or pullup.

[PATCH v12 00/14] Logging unification and enhancements

2024-03-25 Thread Stephen Hemminger

Improvements and unification of logging library (for 24.07 release).
This version works on all platforms: Linux, Windows and FreeBSD.

This is update to rework patch set. It adds several new features
to the console log output.

  * Putting a timestamp on console output which is useful for
analyzing performance of startup codes. Timestamp is optional
and must be enabled on command line.

  * Displaying console output with colors.
It uses the standard conventions used by many other Linux commands
for colorized display.  The default is to enable color if the
console output is going to a terminal. But it can be always
on or disabled by command line flag. This default was chosen
based on what dmesg(1) command does.

I find color helpful because DPDK drivers and libraries print
lots of not very useful messages. And having error messages
highlighted in bold face helps. This might also get users to
pay more attention to error messages. Many bug reports have
earlier messages that are lost because there are so many
info messages.

  * Add support for automatic detection of systemd journal
protocol. If running as systemd service will get enhanced
logging.

  * Use of syslog is optional and the meaning of the
--syslog flag has changed. The default is *not* to use
syslog. 

Add myself as maintainer for log because by now have added
more than previous authors...
Will add a release note in next release (after this is merged)

v12 - add back syslog but make it optional.
  better shims for windows (thread safe)
  split out more of the eal core bits
  fix build warnings on FreeBSD and Ubuntu

Stephen Hemminger (14):
  windows: make getopt functions have const properties
  windows: add os shim for localtime_r
  eal: make eal_log_level_parse common
  eal: do not duplicate rte_init_alert() messages
  eal: change rte_exit() output to match rte_log()
  log: move handling of syslog facility out of eal
  eal: initialize log before everything else
  log: drop syslog support, and make code common
  log: add hook for printing log messages
  log: add timestamp option
  log: add optional support of syslog
  log: add support for systemd journal
  log: colorize log output
  maintainers: add for log library

 MAINTAINERS   |   1 +
 app/test/test_eal_flags.c |  17 +
 doc/guides/linux_gsg/linux_eal_parameters.rst |  27 -
 doc/guides/prog_guide/log_lib.rst |  57 ++
 lib/eal/common/eal_common_debug.c |  11 +-
 lib/eal/common/eal_common_options.c   | 114 ++--
 lib/eal/common/eal_options.h  |   5 +
 lib/eal/freebsd/eal.c |  60 +-
 lib/eal/linux/eal.c   |  64 +-
 lib/eal/windows/eal.c |  44 +-
 lib/eal/windows/getopt.c  |  23 +-
 lib/eal/windows/include/getopt.h  |   8 +-
 lib/eal/windows/include/rte_os_shim.h |  10 +
 lib/log/log.c | 594 +-
 lib/log/log_freebsd.c |   5 +-
 lib/log/log_internal.h|  25 +-
 lib/log/log_linux.c   |  61 --
 lib/log/log_windows.c |  18 -
 lib/log/meson.build   |   5 +-
 lib/log/version.map   |   4 +-
 20 files changed, 803 insertions(+), 350 deletions(-)
 delete mode 100644 lib/log/log_linux.c
 delete mode 100644 lib/log/log_windows.c

-- 
2.43.0

[PATCH v12 01/14] windows: make getopt functions have const properties

2024-03-25 Thread Stephen Hemminger

Having different prototypes on different platforms can lead
to lots of unnecessary workarounds.  Looks like the version of
getopt used from windows was based on an older out of date
version from FreeBSD.

This patch changes getopt, getopt_long, etc to have the same const
attributes as Linux and FreeBSD. The changes are derived from
the current FreeBSD version of getopt_long.

Signed-off-by: Stephen Hemminger 
Acked-by: Tyler Retzlaff 
Acked-by: Dmitry Kozlyuk 
---
 lib/eal/windows/getopt.c | 23 ---
 lib/eal/windows/include/getopt.h |  8 
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/lib/eal/windows/getopt.c b/lib/eal/windows/getopt.c
index a1f51c6c23..50ff71b930 100644
--- a/lib/eal/windows/getopt.c
+++ b/lib/eal/windows/getopt.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 
-const char*optarg; /* argument associated with option */
+char*optarg;   /* argument associated with option */
 intopterr = 1; /* if error message should be printed */
 intoptind = 1; /* index into parent argv vector */
 intoptopt = '?';   /* character checked for validity */
@@ -39,9 +39,9 @@ static void pass(const char *a) {(void) a; }
 #defineBADARG  ((*options == ':') ? (int)':' : (int)'?')
 #defineINORDER 1
 
-#defineEMSG""
+static char EMSG[] = "";
 
-static const char *place = EMSG; /* option letter processing */
+static char *place = EMSG; /* option letter processing */
 
 /* XXX: set optreset to 1 rather than these two */
 static int nonopt_start = -1; /* first non option argument (for permute) */
@@ -80,7 +80,7 @@ gcd(int a, int b)
  */
 static void
 permute_args(int panonopt_start, int panonopt_end, int opt_end,
-   char **nargv)
+   char * const *nargv)
 {
int cstart, cyclelen, i, j, ncycle, nnonopts, nopts, pos;
char *swap;
@@ -101,11 +101,12 @@ permute_args(int panonopt_start, int panonopt_end, int 
opt_end,
pos -= nnonopts;
else
pos += nopts;
+
swap = nargv[pos];
/* LINTED const cast */
-   ((char **) nargv)[pos] = nargv[cstart];
+   ((char **)(uintptr_t)nargv)[pos] = nargv[cstart];
/* LINTED const cast */
-   ((char **)nargv)[cstart] = swap;
+   ((char **)(uintptr_t)nargv)[cstart] = swap;
}
}
 }
@@ -116,7 +117,7 @@ permute_args(int panonopt_start, int panonopt_end, int 
opt_end,
  * Returns -1 if short_too is set and the option does not match long_options.
  */
 static int
-parse_long_options(char **nargv, const char *options,
+parse_long_options(char * const *nargv, const char *options,
const struct option *long_options, int *idx, int short_too)
 {
const char *current_argv;
@@ -236,7 +237,7 @@ parse_long_options(char **nargv, const char *options,
  * Parse argc/argv argument vector.  Called by user level routines.
  */
 static int
-getopt_internal(int nargc, char **nargv, const char *options,
+getopt_internal(int nargc, char *const nargv[], const char *options,
const struct option *long_options, int *idx, int flags)
 {
char *oli;  /* option letter list index */
@@ -434,7 +435,7 @@ getopt_internal(int nargc, char **nargv, const char 
*options,
  * Parse argc/argv argument vector.
  */
 int
-getopt(int nargc, char *nargv[], const char *options)
+getopt(int nargc, char *const nargv[], const char *options)
 {
return getopt_internal(nargc, nargv, options, NULL, NULL,
   FLAG_PERMUTE);
@@ -445,7 +446,7 @@ getopt(int nargc, char *nargv[], const char *options)
  * Parse argc/argv argument vector.
  */
 int
-getopt_long(int nargc, char *nargv[], const char *options,
+getopt_long(int nargc, char *const nargv[], const char *options,
const struct option *long_options, int *idx)
 {
 
@@ -458,7 +459,7 @@ getopt_long(int nargc, char *nargv[], const char *options,
  * Parse argc/argv argument vector.
  */
 int
-getopt_long_only(int nargc, char *nargv[], const char *options,
+getopt_long_only(int nargc, char *const nargv[], const char *options,
const struct option *long_options, int *idx)
 {
 
diff --git a/lib/eal/windows/include/getopt.h b/lib/eal/windows/include/getopt.h
index 6f57af454b..e4cf6873cb 100644
--- a/lib/eal/windows/include/getopt.h
+++ b/lib/eal/windows/include/getopt.h
@@ -44,7 +44,7 @@
 
 
 /** argument to current option, or NULL if it has none */
-extern const char *optarg;
+extern char *optarg;
 /** Current position in arg string.  Starts from 1.
  * Setting to 0 resets state.
  */
@@ -80,14 +80,14 @@ struct option {
 };
 
 /** Compat: getopt */
-int getopt(int argc, char *argv[], const char *options);
+int getopt(int argc, char *const

[PATCH v12 02/14] windows: add os shim for localtime_r

2024-03-25 Thread Stephen Hemminger

Windows does not have localtime_r but it does have a similar
function that can be used instead.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/windows/include/rte_os_shim.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/eal/windows/include/rte_os_shim.h 
b/lib/eal/windows/include/rte_os_shim.h
index eda8113662..e9741a9df2 100644
--- a/lib/eal/windows/include/rte_os_shim.h
+++ b/lib/eal/windows/include/rte_os_shim.h
@@ -110,4 +110,14 @@ rte_clock_gettime(clockid_t clock_id, struct timespec *tp)
 }
 #define clock_gettime(clock_id, tp) rte_clock_gettime(clock_id, tp)
 
+static inline struct tm *
+rte_localtime_r(const time_t *timer, struct tm *buf)
+{
+   if (localtime_s(buf, timer) == 0)
+   return buf;
+   else
+   return NULL;
+}
+#define localtime_r(timer, buf) rte_localtime_r(timer, buf)
+
 #endif /* _RTE_OS_SHIM_ */
-- 
2.43.0

[PATCH v12 03/14] eal: make eal_log_level_parse common

2024-03-25 Thread Stephen Hemminger

The code to parse for log-level option should be same on
all OS variants.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/common/eal_common_options.c | 46 +
 lib/eal/common/eal_options.h|  1 +
 lib/eal/freebsd/eal.c   | 42 --
 lib/eal/linux/eal.c | 39 
 lib/eal/windows/eal.c   | 35 --
 5 files changed, 47 insertions(+), 116 deletions(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index e541f07939..5435399b85 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -1640,6 +1640,51 @@ eal_parse_huge_unlink(const char *arg, struct 
hugepage_file_discipline *out)
return -1;
 }
 
+/* Parse the all arguments looking for log related ones */
+int
+eal_log_level_parse(int argc, char * const argv[])
+{
+   struct internal_config *internal_conf = 
eal_get_internal_configuration();
+   int option_index, opt;
+   const int old_optind = optind;
+   const int old_optopt = optopt;
+   const int old_opterr = opterr;
+   char *old_optarg = optarg;
+#ifdef RTE_EXEC_ENV_FREEBSD
+   const int old_optreset = optreset;
+   optreset = 1;
+#endif
+
+   optind = 1;
+   opterr = 0;
+
+   while ((opt = getopt_long(argc, argv, eal_short_options,
+ eal_long_options, &option_index)) != EOF) {
+
+   switch (opt) {
+   case OPT_LOG_LEVEL_NUM:
+   if (eal_parse_common_option(opt, optarg, internal_conf) 
< 0)
+   return -1;
+   break;
+   case '?':
+   /* getopt is not happy, stop right now */
+   goto out;
+   default:
+   continue;
+   }
+   }
+out:
+   /* restore getopt lib */
+   optind = old_optind;
+   optopt = old_optopt;
+   optarg = old_optarg;
+   opterr = old_opterr;
+#ifdef RTE_EXEC_ENV_FREEBSD
+   optreset = old_optreset;
+#endif
+   return 0;
+}
+
 int
 eal_parse_common_option(int opt, const char *optarg,
struct internal_config *conf)
@@ -2173,6 +2218,7 @@ rte_vect_set_max_simd_bitwidth(uint16_t bitwidth)
return 0;
 }
 
+
 void
 eal_common_usage(void)
 {
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index 3cc9cb6412..f3f2e104f6 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -96,6 +96,7 @@ enum {
 extern const char eal_short_options[];
 extern const struct option eal_long_options[];
 
+int eal_log_level_parse(int argc, char * const argv[]);
 int eal_parse_common_option(int opt, const char *argv,
struct internal_config *conf);
 int eal_option_device_parse(void);
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index bab77118e9..9825bcea0b 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -363,48 +363,6 @@ eal_get_hugepage_mem_size(void)
return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX;
 }
 
-/* Parse the arguments for --log-level only */
-static void
-eal_log_level_parse(int argc, char **argv)
-{
-   int opt;
-   char **argvopt;
-   int option_index;
-   const int old_optind = optind;
-   const int old_optopt = optopt;
-   const int old_optreset = optreset;
-   char * const old_optarg = optarg;
-   struct internal_config *internal_conf =
-   eal_get_internal_configuration();
-
-   argvopt = argv;
-   optind = 1;
-   optreset = 1;
-
-   while ((opt = getopt_long(argc, argvopt, eal_short_options,
- eal_long_options, &option_index)) != EOF) {
-
-   int ret;
-
-   /* getopt is not happy, stop right now */
-   if (opt == '?')
-   break;
-
-   ret = (opt == OPT_LOG_LEVEL_NUM) ?
-   eal_parse_common_option(opt, optarg, internal_conf) : 0;
-
-   /* common parser is not happy */
-   if (ret < 0)
-   break;
-   }
-
-   /* restore getopt lib */
-   optind = old_optind;
-   optopt = old_optopt;
-   optreset = old_optreset;
-   optarg = old_optarg;
-}
-
 /* Parse the argument given in the command line of the application */
 static int
 eal_parse_args(int argc, char **argv)
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index fd422f1f62..bffeb1f34e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -546,45 +546,6 @@ eal_parse_vfio_vf_token(const char *vf_token)
return -1;
 }
 
-/* Parse the arguments for --log-level only */
-static void
-eal_log_level_parse(int argc, char **argv)
-{
-   int opt;
-   char **argvopt;
-   int option_index;
-   const int old_optind = optind;
-   const

[PATCH v12 04/14] eal: do not duplicate rte_init_alert() messages

2024-03-25 Thread Stephen Hemminger

The message already goes through logging, and does not need
to be printed on stderr. Message level should be ALERT
to match function name.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/freebsd/eal.c | 3 +--
 lib/eal/linux/eal.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 9825bcea0b..17b56f38aa 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -529,8 +529,7 @@ rte_eal_iopl_init(void)
 
 static void rte_eal_init_alert(const char *msg)
 {
-   fprintf(stderr, "EAL: FATAL: %s\n", msg);
-   EAL_LOG(ERR, "%s", msg);
+   EAL_LOG(ALERT, "%s", msg);
 }
 
 /* Launch threads, called at application init(). */
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index bffeb1f34e..23dc26b124 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -840,8 +840,7 @@ static int rte_eal_vfio_setup(void)
 
 static void rte_eal_init_alert(const char *msg)
 {
-   fprintf(stderr, "EAL: FATAL: %s\n", msg);
-   EAL_LOG(ERR, "%s", msg);
+   EAL_LOG(ALERT, "%s", msg);
 }
 
 /*
-- 
2.43.0

[PATCH v12 05/14] eal: change rte_exit() output to match rte_log()

2024-03-25 Thread Stephen Hemminger

The rte_exit() output format confuses the timestamp and coloring
options. Change it to use be a single line with proper prefix.

Before:
[ 0.006481] EAL: Error - exiting with code: 1
  Cause: [ 0.006489] Cannot init EAL: Permission denied

After:
[ 0.006238] EAL: Error - exiting with code: 1
[ 0.006250] EAL: Cause - Cannot init EAL: Permission denied

Signed-off-by: Stephen Hemminger 
---
 lib/eal/common/eal_common_debug.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/eal/common/eal_common_debug.c 
b/lib/eal/common/eal_common_debug.c
index 3e77995896..3f37879144 100644
--- a/lib/eal/common/eal_common_debug.c
+++ b/lib/eal/common/eal_common_debug.c
@@ -34,17 +34,18 @@ void
 rte_exit(int exit_code, const char *format, ...)
 {
va_list ap;
+   char *msg = NULL;
 
if (exit_code != 0)
-   RTE_LOG(CRIT, EAL, "Error - exiting with code: %d\n"
-   "  Cause: ", exit_code);
+   EAL_LOG(CRIT, "Error - exiting with code: %d", exit_code);
 
va_start(ap, format);
-   rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
+   vasprintf(&msg, format, ap);
va_end(ap);
 
+   rte_log(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, "EAL: Cause - %s", msg);
+
if (rte_eal_cleanup() != 0 && rte_errno != EALREADY)
-   EAL_LOG(CRIT,
-   "EAL could not release all resources");
+   EAL_LOG(CRIT, "EAL could not release all resources");
exit(exit_code);
 }
-- 
2.43.0

[PATCH v12 07/14] eal: initialize log before everything else

2024-03-25 Thread Stephen Hemminger

In order for all log messages (including CPU mismatch) to
come out through the logging library, it must be initialized
as early in rte_eal_init() as possible on all platforms.

Where it was done before was likely historical based on
the support of non-OS isolated CPU's which required a shared
memory buffer; that support was dropped before DPDK was
publicly released.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/freebsd/eal.c  |  8 +---
 lib/eal/linux/eal.c| 15 +--
 lib/eal/windows/eal.c  |  3 +--
 lib/log/log_freebsd.c  |  3 +--
 lib/log/log_internal.h |  2 +-
 lib/log/log_linux.c| 14 ++
 lib/log/log_windows.c  |  4 +---
 7 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 6552f9c138..ec3a0c3244 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -52,6 +52,7 @@
 #include "eal_options.h"
 #include "eal_memcfg.h"
 #include "eal_trace.h"
+#include "log_internal.h"
 
 #define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)
 
@@ -546,6 +547,10 @@ rte_eal_init(int argc, char **argv)
bool has_phys_addr;
enum rte_iova_mode iova_mode;
 
+   /* setup log as early as possible */
+   eal_log_level_parse(argc, argv);
+   eal_log_init(getprogname());
+
/* checks if the machine is adequate */
if (!rte_cpu_is_supported()) {
rte_eal_init_alert("unsupported cpu type.");
@@ -565,9 +570,6 @@ rte_eal_init(int argc, char **argv)
/* clone argv to report out later in telemetry */
eal_save_args(argc, argv);
 
-   /* set log level as early as possible */
-   eal_log_level_parse(argc, argv);
-
if (rte_eal_cpu_init() < 0) {
rte_eal_init_alert("Cannot detect lcores.");
rte_errno = ENOTSUP;
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3d0c34063e..0a488ee567 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -936,6 +936,11 @@ rte_eal_init(int argc, char **argv)
struct internal_config *internal_conf =
eal_get_internal_configuration();
 
+
+   /* setup log as early as possible */
+   eal_log_level_parse(argc, argv);
+   eal_log_init(program_invocation_short_name);
+
/* checks if the machine is adequate */
if (!rte_cpu_is_supported()) {
rte_eal_init_alert("unsupported cpu type.");
@@ -952,9 +957,6 @@ rte_eal_init(int argc, char **argv)
 
eal_reset_internal_config(internal_conf);
 
-   /* set log level as early as possible */
-   eal_log_level_parse(argc, argv);
-
/* clone argv to report out later in telemetry */
eal_save_args(argc, argv);
 
@@ -1106,13 +1108,6 @@ rte_eal_init(int argc, char **argv)
 #endif
}
 
-   if (eal_log_init(program_invocation_short_name) < 0) {
-   rte_eal_init_alert("Cannot init logging.");
-   rte_errno = ENOMEM;
-   rte_atomic_store_explicit(&run_once, 0, 
rte_memory_order_relaxed);
-   return -1;
-   }
-
 #ifdef VFIO_PRESENT
if (rte_eal_vfio_setup() < 0) {
rte_eal_init_alert("Cannot init VFIO");
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 2519a30017..fb2920e1b8 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -250,9 +250,8 @@ rte_eal_init(int argc, char **argv)
char cpuset[RTE_CPU_AFFINITY_STR_LEN];
char thread_name[RTE_THREAD_NAME_SIZE];
 
-   eal_log_init(NULL);
-
eal_log_level_parse(argc, argv);
+   eal_log_init(NULL);
 
if (eal_create_cpu_map() < 0) {
rte_eal_init_alert("Cannot discover CPU and NUMA.");
diff --git a/lib/log/log_freebsd.c b/lib/log/log_freebsd.c
index 953e371bee..33a0925c43 100644
--- a/lib/log/log_freebsd.c
+++ b/lib/log/log_freebsd.c
@@ -5,8 +5,7 @@
 #include 
 #include "log_internal.h"
 
-int
+void
 eal_log_init(__rte_unused const char *id)
 {
-   return 0;
 }
diff --git a/lib/log/log_internal.h b/lib/log/log_internal.h
index cb15cdff08..d5fabd7ef7 100644
--- a/lib/log/log_internal.h
+++ b/lib/log/log_internal.h
@@ -14,7 +14,7 @@
  * Initialize the default log stream.
  */
 __rte_internal
-int eal_log_init(const char *id);
+void eal_log_init(const char *id);
 
 /*
  * Determine where log data is written when no call to rte_openlog_stream.
diff --git a/lib/log/log_linux.c b/lib/log/log_linux.c
index 47aa074da2..6d7dc8f3ab 100644
--- a/lib/log/log_linux.c
+++ b/lib/log/log_linux.c
@@ -87,18 +87,16 @@ static cookie_io_functions_t console_log_func = {
  * set the log to default function, called during eal init process,
  * once memzones are available.
  */
-int
+void
 eal_log_init(const char *id)
 {
FILE *log_stream;
 
-   log_stream = fopencookie(NULL, "w+", console_log_func);
-   if (log_stream == NULL)
-   return -1;
-
openlog(id, LOG_NDELAY | LOG_PID, log_facility);
 
-   eal_log_set_default(log_stream)

[PATCH v12 06/14] log: move handling of syslog facility out of eal

2024-03-25 Thread Stephen Hemminger

The syslog facility property is better handled in lib/log
rather than in eal. This also allows for changes to what
syslog flag means in later steps.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/common/eal_common_options.c | 51 ++---
 lib/eal/freebsd/eal.c   |  5 ++-
 lib/eal/linux/eal.c |  7 ++--
 lib/eal/windows/eal.c   |  6 ++--
 lib/log/log_freebsd.c   |  2 +-
 lib/log/log_internal.h  |  5 ++-
 lib/log/log_linux.c | 47 --
 lib/log/log_windows.c   |  8 -
 lib/log/version.map |  1 +
 9 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 5435399b85..661b2db211 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -6,9 +6,6 @@
 #include 
 #include 
 #include 
-#ifndef RTE_EXEC_ENV_WINDOWS
-#include 
-#endif
 #include 
 #include 
 #include 
@@ -349,10 +346,6 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
}
internal_cfg->base_virtaddr = 0;
 
-#ifdef LOG_DAEMON
-   internal_cfg->syslog_facility = LOG_DAEMON;
-#endif
-
/* if set to NONE, interrupt mode is determined automatically */
internal_cfg->vfio_intr_mode = RTE_INTR_MODE_NONE;
memset(internal_cfg->vfio_vf_token, 0,
@@ -1297,47 +1290,6 @@ eal_parse_lcores(const char *lcores)
return ret;
 }
 
-#ifndef RTE_EXEC_ENV_WINDOWS
-static int
-eal_parse_syslog(const char *facility, struct internal_config *conf)
-{
-   int i;
-   static const struct {
-   const char *name;
-   int value;
-   } map[] = {
-   { "auth", LOG_AUTH },
-   { "cron", LOG_CRON },
-   { "daemon", LOG_DAEMON },
-   { "ftp", LOG_FTP },
-   { "kern", LOG_KERN },
-   { "lpr", LOG_LPR },
-   { "mail", LOG_MAIL },
-   { "news", LOG_NEWS },
-   { "syslog", LOG_SYSLOG },
-   { "user", LOG_USER },
-   { "uucp", LOG_UUCP },
-   { "local0", LOG_LOCAL0 },
-   { "local1", LOG_LOCAL1 },
-   { "local2", LOG_LOCAL2 },
-   { "local3", LOG_LOCAL3 },
-   { "local4", LOG_LOCAL4 },
-   { "local5", LOG_LOCAL5 },
-   { "local6", LOG_LOCAL6 },
-   { "local7", LOG_LOCAL7 },
-   { NULL, 0 }
-   };
-
-   for (i = 0; map[i].name; i++) {
-   if (!strcmp(facility, map[i].name)) {
-   conf->syslog_facility = map[i].value;
-   return 0;
-   }
-   }
-   return -1;
-}
-#endif
-
 static void
 eal_log_usage(void)
 {
@@ -1663,6 +1615,7 @@ eal_log_level_parse(int argc, char * const argv[])
 
switch (opt) {
case OPT_LOG_LEVEL_NUM:
+   case OPT_SYSLOG_NUM:
if (eal_parse_common_option(opt, optarg, internal_conf) 
< 0)
return -1;
break;
@@ -1882,7 +1835,7 @@ eal_parse_common_option(int opt, const char *optarg,
 
 #ifndef RTE_EXEC_ENV_WINDOWS
case OPT_SYSLOG_NUM:
-   if (eal_parse_syslog(optarg, conf) < 0) {
+   if (eal_log_syslog(optarg) < 0) {
EAL_LOG(ERR, "invalid parameters for --"
OPT_SYSLOG);
return -1;
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 17b56f38aa..6552f9c138 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -392,8 +391,8 @@ eal_parse_args(int argc, char **argv)
goto out;
}
 
-   /* eal_log_level_parse() already handled this option */
-   if (opt == OPT_LOG_LEVEL_NUM)
+   /* eal_log_level_parse() already handled these */
+   if (opt == OPT_LOG_LEVEL_NUM || opt == OPT_LOG_SYSLOG_NUM)
continue;
 
ret = eal_parse_common_option(opt, optarg, internal_conf);
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 23dc26b124..3d0c34063e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -610,8 +610,8 @@ eal_parse_args(int argc, char **argv)
goto out;
}
 
-   /* eal_log_level_parse() already handled this option */
-   if (opt == OPT_LOG_LEVEL_NUM)
+   /* eal_log_level_parse() already handled these options */
+   if (opt == OPT_LOG_LEVEL_NUM || opt == OPT_SYSLOG_NUM)
continue;
 
ret = eal_parse_common_option(opt, optarg, internal_conf);
@@ -1106,8 +1106,7 @@ rte_eal_init(int argc, ch

[PATCH v12 08/14] log: drop syslog support, and make code common

2024-03-25 Thread Stephen Hemminger

This patch makes the log setup code common across all platforms.

Drops syslog support for now, will come back in later patch.

Signed-off-by: Stephen Hemminger 
---
 lib/log/log.c  |  29 
 lib/log/log_internal.h |   6 ---
 lib/log/log_linux.c| 102 -
 lib/log/log_windows.c  |  22 -
 lib/log/meson.build|   5 +-
 lib/log/version.map|   1 -
 6 files changed, 21 insertions(+), 144 deletions(-)
 delete mode 100644 lib/log/log_linux.c
 delete mode 100644 lib/log/log_windows.c

diff --git a/lib/log/log.c b/lib/log/log.c
index 255f757d94..f597da2e39 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -70,12 +70,13 @@ struct log_cur_msg {
  /* per core log */
 static RTE_DEFINE_PER_LCORE(struct log_cur_msg, log_cur_msg);
 
-/* default logs */
-
 /* Change the stream that will be used by logging system */
 int
 rte_openlog_stream(FILE *f)
 {
+   if (rte_logs.file != NULL)
+   fclose(rte_logs.file);
+
rte_logs.file = f;
return 0;
 }
@@ -505,13 +506,20 @@ rte_log(uint32_t level, uint32_t logtype, const char 
*format, ...)
return ret;
 }
 
+/* Placeholder */
+int
+eal_log_syslog(const char *mode __rte_unused)
+{
+   return -1;
+}
+
 /*
- * Called by environment-specific initialization functions.
+ * Called by rte_eal_init
  */
 void
-eal_log_set_default(FILE *default_log)
+eal_log_init(const char *id __rte_unused)
 {
-   default_log_stream = default_log;
+   default_log_stream = stderr;
 
 #if RTE_LOG_DP_LEVEL >= RTE_LOG_DEBUG
RTE_LOG(NOTICE, EAL,
@@ -525,8 +533,11 @@ eal_log_set_default(FILE *default_log)
 void
 rte_eal_log_cleanup(void)
 {
-   if (default_log_stream) {
-   fclose(default_log_stream);
-   default_log_stream = NULL;
-   }
+   FILE *log_stream = rte_log_get_stream();
+
+   /* don't close stderr on the application */
+   if (log_stream != stderr)
+   fclose(log_stream);
+
+   rte_logs.file = NULL;
 }
diff --git a/lib/log/log_internal.h b/lib/log/log_internal.h
index d5fabd7ef7..3c46328e7b 100644
--- a/lib/log/log_internal.h
+++ b/lib/log/log_internal.h
@@ -16,12 +16,6 @@
 __rte_internal
 void eal_log_init(const char *id);
 
-/*
- * Determine where log data is written when no call to rte_openlog_stream.
- */
-__rte_internal
-void eal_log_set_default(FILE *default_log);
-
 /*
  * Save a log option for later.
  */
diff --git a/lib/log/log_linux.c b/lib/log/log_linux.c
deleted file mode 100644
index 6d7dc8f3ab..00
--- a/lib/log/log_linux.c
+++ /dev/null
@@ -1,102 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2010-2014 Intel Corporation
- */
-
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-#include "log_internal.h"
-
-static int log_facility = LOG_DAEMON;
-
-static const struct {
-   const char *name;
-   int value;
-} facilitys[] = {
-   { "auth", LOG_AUTH },
-   { "cron", LOG_CRON },
-   { "daemon", LOG_DAEMON },
-   { "ftp", LOG_FTP },
-   { "kern", LOG_KERN },
-   { "lpr", LOG_LPR },
-   { "mail", LOG_MAIL },
-   { "news", LOG_NEWS },
-   { "syslog", LOG_SYSLOG },
-   { "user", LOG_USER },
-   { "uucp", LOG_UUCP },
-   { "local0", LOG_LOCAL0 },
-   { "local1", LOG_LOCAL1 },
-   { "local2", LOG_LOCAL2 },
-   { "local3", LOG_LOCAL3 },
-   { "local4", LOG_LOCAL4 },
-   { "local5", LOG_LOCAL5 },
-   { "local6", LOG_LOCAL6 },
-   { "local7", LOG_LOCAL7 },
-};
-
-int
-eal_log_syslog(const char *name)
-{
-   unsigned int i;
-
-   for (i = 0; i < RTE_DIM(facilitys); i++) {
-   if (!strcmp(name, facilitys[i].name)) {
-   log_facility = facilitys[i].value;
-   return 0;
-   }
-   }
-   return -1;
-}
-
-/*
- * default log function
- */
-static ssize_t
-console_log_write(__rte_unused void *c, const char *buf, size_t size)
-{
-   ssize_t ret;
-
-   /* write on stderr */
-   ret = fwrite(buf, 1, size, stderr);
-   fflush(stderr);
-
-   /* Syslog error levels are from 0 to 7, so subtract 1 to convert */
-   syslog(rte_log_cur_msg_loglevel() - 1, "%.*s", (int)size, buf);
-
-   return ret;
-}
-
-static int
-console_log_close(__rte_unused void *c)
-{
-   closelog();
-   return 0;
-}
-
-static cookie_io_functions_t console_log_func = {
-   .write = console_log_write,
-   .close = console_log_close,
-};
-
-/*
- * set the log to default function, called during eal init process,
- * once memzones are available.
- */
-void
-eal_log_init(const char *id)
-{
-   FILE *log_stream;
-
-   openlog(id, LOG_NDELAY | LOG_PID, log_facility);
-
-   log_stream = fopencookie(NULL, "w+", console_log_func);
-   if (log_stream != NULL)
-   eal_log_set_default(log_stream);
-   else
-   eal_log_set_default(stderr);
-}
diff --git a/lib/log/log_windows.c

[PATCH v12 09/14] log: add hook for printing log messages

2024-03-25 Thread Stephen Hemminger

This is useful for when decorating log output for console
or journal. Provide basic version in this patch.

Signed-off-by: Stephen Hemminger 
---
 lib/log/log.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/lib/log/log.c b/lib/log/log.c
index f597da2e39..acd4c320b6 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -26,16 +26,21 @@ struct rte_log_dynamic_type {
uint32_t loglevel;
 };
 
+typedef int (*log_print_t)(FILE *f, uint32_t level, const char *fmt, va_list 
ap);
+static int log_print(FILE *f, uint32_t level, const char *format, va_list ap);
+
 /** The rte_log structure. */
 static struct rte_logs {
uint32_t type;  /**< Bitfield with enabled logs. */
uint32_t level; /**< Log level. */
FILE *file; /**< Output file set by rte_openlog_stream, or NULL. */
+   log_print_t print_func;
size_t dynamic_types_len;
struct rte_log_dynamic_type *dynamic_types;
 } rte_logs = {
.type = UINT32_MAX,
.level = RTE_LOG_DEBUG,
+   .print_func = log_print,
 };
 
 struct rte_eal_opt_loglevel {
@@ -78,6 +83,7 @@ rte_openlog_stream(FILE *f)
fclose(rte_logs.file);
 
rte_logs.file = f;
+   rte_logs.print_func = log_print;
return 0;
 }
 
@@ -484,7 +490,7 @@ rte_vlog(uint32_t level, uint32_t logtype, const char 
*format, va_list ap)
RTE_PER_LCORE(log_cur_msg).loglevel = level;
RTE_PER_LCORE(log_cur_msg).logtype = logtype;
 
-   ret = vfprintf(f, format, ap);
+   ret = (*rte_logs.print_func)(f, level, format, ap);
fflush(f);
return ret;
 }
@@ -513,6 +519,15 @@ eal_log_syslog(const char *mode __rte_unused)
return -1;
 }
 
+/* default log print function */
+__rte_format_printf(3, 0)
+static int
+log_print(FILE *f, uint32_t level __rte_unused,
+ const char *format, va_list ap)
+{
+   return vfprintf(f, format, ap);
+}
+
 /*
  * Called by rte_eal_init
  */
-- 
2.43.0

[PATCH v12 10/14] log: add timestamp option

2024-03-25 Thread Stephen Hemminger

When debugging driver or startup issues, it is useful to have
a timestamp on each message printed. The messages in syslog
already have a timestamp, but often syslog is not available
during testing.

There are multiple timestamp formats similar to Linux dmesg.
The default is time relative since startup (when first
step of logging initialization is done by constructor).
Other alternative formats are delta, ctime, reltime and iso formats.

Example:
$ dpdk-testpmd --log-timestamp -- -i
[ 0.008610] EAL: Detected CPU lcores: 8
[ 0.008634] EAL: Detected NUMA nodes: 1
[ 0.008792] EAL: Detected static linkage of DPDK
[ 0.010620] EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[ 0.012618] EAL: Selected IOVA mode 'VA'
[ 0.016675] testpmd: No probed ethernet devices
Interactive-mode selected

Signed-off-by: Stephen Hemminger 
---
 app/test/test_eal_flags.c   |  17 +++
 doc/guides/prog_guide/log_lib.rst   |  26 
 lib/eal/common/eal_common_options.c |  14 ++-
 lib/eal/common/eal_options.h|   2 +
 lib/eal/freebsd/eal.c   |   6 +-
 lib/eal/linux/eal.c |   4 +-
 lib/eal/windows/eal.c   |   4 +-
 lib/log/log.c   | 183 +++-
 lib/log/log_internal.h  |   9 ++
 lib/log/version.map |   1 +
 10 files changed, 259 insertions(+), 7 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 6cb4b06757..eeb1799381 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -1055,6 +1055,14 @@ test_misc_flags(void)
const char * const argv22[] = {prgname, prefix, mp_flag,
   "--huge-worker-stack=512"};
 
+   /* Try running with --log-timestamp */
+   const char * const argv23[] = {prgname, prefix, mp_flag,
+  "--log-timestamp" };
+
+   /* Try running with --log-timestamp=iso */
+   const char * const argv24[] = {prgname, prefix, mp_flag,
+  "--log-timestamp=iso" };
+
/* run all tests also applicable to FreeBSD first */
 
if (launch_proc(argv0) == 0) {
@@ -1162,6 +1170,15 @@ test_misc_flags(void)
printf("Error - process did not run ok with 
--huge-worker-stack=size parameter\n");
goto fail;
}
+   if (launch_proc(argv23) != 0) {
+   printf("Error - process did not run ok with --log-timestamp 
parameter\n");
+   goto fail;
+   }
+   if (launch_proc(argv24) != 0) {
+   printf("Error - process did not run ok with --log-timestamp=iso 
parameter\n");
+   goto fail;
+   }
+
 
rmdir(hugepath_dir3);
rmdir(hugepath_dir2);
diff --git a/doc/guides/prog_guide/log_lib.rst 
b/doc/guides/prog_guide/log_lib.rst
index ff9d1b54a2..504eefe1d2 100644
--- a/doc/guides/prog_guide/log_lib.rst
+++ b/doc/guides/prog_guide/log_lib.rst
@@ -59,6 +59,32 @@ For example::
 
 Within an application, the same result can be got using the 
``rte_log_set_level_pattern()`` or ``rte_log_set_level_regex()`` APIs.
 
+Log timestamp
+~
+
+An optional timestamp can be added before each message
+by adding the ``--log-timestamp`` option.
+For example::
+
+   /path/to/app --log-level=lib.*:debug --log-timestamp
+
+Multiple timestamp alternative timestamp formats are available:
+
+.. csv-table:: Log time stamp format
+   :header: "Format", "Description", "Example"
+   :widths: 6, 30, 32
+
+   "ctime", "Unix ctime", "``[Wed Mar 20 07:26:12 2024]``"
+   "delta", "Offset since last", "``[<3.162373>]``"
+   "reltime", "Seconds since last or time if minute changed", "``[  
+3.001791]`` or ``[Mar20 07:26:12]``"
+   "iso", "ISO-8601", "``[2024-03-20T07:26:12−07:00]``"
+
+To prefix all console messages with ISO format time the syntax is::
+
+   /path/to/app --log-timestamp=iso
+
+
+
 Using Logging APIs to Generate Log Messages
 ---
 
diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 661b2db211..aa1faad45c 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -74,6 +74,7 @@ eal_long_options[] = {
{OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM},
{OPT_LCORES,1, NULL, OPT_LCORES_NUM   },
{OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM},
+   {OPT_LOG_TIMESTAMP, 2, NULL, OPT_LOG_TIMESTAMP_NUM},
{OPT_TRACE, 1, NULL, OPT_TRACE_NUM},
{OPT_TRACE_DIR, 1, NULL, OPT_TRACE_DIR_NUM},
{OPT_TRACE_BUF_SIZE,1, NULL, OPT_TRACE_BUF_SIZE_NUM   },
@@ -1616,6 +1617,7 @@ eal_log_level_parse(int argc, char * const argv[])
switch (opt) {
case OPT_LOG_LEVEL_NUM:
case OPT_SYSLOG_NUM:
+   case OPT_LOG_TIMESTAMP

[PATCH v12 11/14] log: add optional support of syslog

2024-03-25 Thread Stephen Hemminger

Log to syslog only if option is specified. And if syslog is used
then normally only log to syslog, don't duplicate output.
Also enables syslog support on FreeBSD.

Signed-off-by: Stephen Hemminger 
---
 doc/guides/linux_gsg/linux_eal_parameters.rst |  27 -
 doc/guides/prog_guide/log_lib.rst |  17 +++
 lib/eal/common/eal_common_options.c   |   2 +-
 lib/log/log.c | 101 --
 4 files changed, 111 insertions(+), 36 deletions(-)

diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst 
b/doc/guides/linux_gsg/linux_eal_parameters.rst
index ea8f381391..d86f94d8a8 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -108,30 +108,3 @@ Memory-related options
 *   ``--match-allocations``
 
 Free hugepages back to system exactly as they were originally allocated.
-
-Other options
-~
-
-*   ``--syslog ``
-
-Set syslog facility. Valid syslog facilities are::
-
-auth
-cron
-daemon
-ftp
-kern
-lpr
-mail
-news
-syslog
-user
-uucp
-local0
-local1
-local2
-local3
-local4
-local5
-local6
-local7
diff --git a/doc/guides/prog_guide/log_lib.rst 
b/doc/guides/prog_guide/log_lib.rst
index 504eefe1d2..abaedc7212 100644
--- a/doc/guides/prog_guide/log_lib.rst
+++ b/doc/guides/prog_guide/log_lib.rst
@@ -83,6 +83,23 @@ To prefix all console messages with ISO format time the 
syntax is::
 
/path/to/app --log-timestamp=iso
 
+Log output
+~~
+
+If desired, messages can be redirected to syslog (on Linux and FreeBSD) with 
the ``--syslog``
+option. There are three possible settings for this option:
+
+*always*
+Redirect all log output to syslog.
+
+*auto*
+Use console if it is a terminal, and use syslog if is not.
+
+*both*
+Print to both console and syslog.
+
+If ``--syslog`` option is not specified, then only console (stderr) will be 
used.
+
 
 
 Using Logging APIs to Generate Log Messages
diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index aa1faad45c..6f0fd151c1 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -91,7 +91,7 @@ eal_long_options[] = {
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM},
{OPT_SOCKET_MEM,1, NULL, OPT_SOCKET_MEM_NUM   },
{OPT_SOCKET_LIMIT,  1, NULL, OPT_SOCKET_LIMIT_NUM },
-   {OPT_SYSLOG,1, NULL, OPT_SYSLOG_NUM   },
+   {OPT_SYSLOG,2, NULL, OPT_SYSLOG_NUM   },
{OPT_VDEV,  1, NULL, OPT_VDEV_NUM },
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
{OPT_VFIO_VF_TOKEN, 1, NULL, OPT_VFIO_VF_TOKEN_NUM},
diff --git a/lib/log/log.c b/lib/log/log.c
index 2dca91306e..d8974c66db 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -13,15 +13,17 @@
 #include 
 #include 
 
+#ifdef RTE_EXEC_ENV_WINDOWS
+#include 
+#else
+#include 
+#endif
+
 #include 
 #include 
 
 #include "log_internal.h"
 
-#ifdef RTE_EXEC_ENV_WINDOWS
-#include 
-#endif
-
 struct rte_log_dynamic_type {
const char *name;
uint32_t loglevel;
@@ -36,14 +38,25 @@ enum eal_log_time_format {
EAL_LOG_TIMESTAMP_ISO,
 };
 
+enum eal_log_syslog {
+   EAL_LOG_SYSLOG_NONE = 0,/* do not use syslog */
+   EAL_LOG_SYSLOG_AUTO,/* use syslog only if not a terminal */
+   EAL_LOG_SYSLOG_ALWAYS,  /* always use syslog */
+   EAL_LOG_SYSLOG_BOTH,/* log to both syslog and stderr */
+};
+
 typedef int (*log_print_t)(FILE *f, uint32_t level, const char *fmt, va_list 
ap);
 static int log_print(FILE *f, uint32_t level, const char *format, va_list ap);
 
+
 /** The rte_log structure. */
 static struct rte_logs {
uint32_t type;  /**< Bitfield with enabled logs. */
uint32_t level; /**< Log level. */
FILE *file; /**< Output file set by rte_openlog_stream, or NULL. */
+#ifndef RTE_EXEC_ENV_WINDOWS
+   enum eal_log_syslog syslog_opt;
+#endif
log_print_t print_func;
 
enum eal_log_time_format time_format;
@@ -532,9 +545,23 @@ rte_log(uint32_t level, uint32_t logtype, const char 
*format, ...)
 
 /* Placeholder */
 int
-eal_log_syslog(const char *mode __rte_unused)
+eal_log_syslog(const char *str)
 {
+#ifdef RTE_EXEC_ENV_WINDOWS
+   RTE_SET_USED(str);
return -1;
+#else
+   if (str == NULL || strcmp(str, "auto") == 0)
+   /* log to syslog only if stderr is not a terminal */
+   rte_logs.syslog_opt = EAL_LOG_SYSLOG_AUTO;
+   else if (strcmp(str, "both") == 0)
+   rte_logs.syslog_opt = EAL_LOG_SYSLOG_BOTH;
+   else if (strcmp(str, "always") == 0)
+   rte_logs.syslog_opt = EAL_LOG_SYSLOG_ALWAYS;
+   else
+

[PATCH v12 12/14] log: add support for systemd journal

2024-03-25 Thread Stephen Hemminger

If DPDK application is being run as a systemd service, then
it can use the journal protocol which allows putting more information
in the log such as priority and other information.

The use of journal protocol is automatically detected and
handled.  Rather than having a dependency on libsystemd,
just use the protocol directly as defined in:
https://systemd.io/JOURNAL_NATIVE_PROTOCOL/

Signed-off-by: Stephen Hemminger 
---
 lib/log/log.c | 128 +-
 1 file changed, 126 insertions(+), 2 deletions(-)

diff --git a/lib/log/log.c b/lib/log/log.c
index d8974c66db..0f7bdb3f25 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -17,6 +17,10 @@
 #include 
 #else
 #include 
+#include 
+#include 
+#include 
+#include 
 #endif
 
 #include 
@@ -56,6 +60,7 @@ static struct rte_logs {
FILE *file; /**< Output file set by rte_openlog_stream, or NULL. */
 #ifndef RTE_EXEC_ENV_WINDOWS
enum eal_log_syslog syslog_opt;
+   int journal_fd;
 #endif
log_print_t print_func;
 
@@ -758,6 +763,112 @@ static cookie_io_functions_t syslog_log_func = {
.write = syslog_log_write,
.close = syslog_log_close,
 };
+
+
+/*
+ * send message using journal protocol to journald
+ */
+__rte_format_printf(3, 0)
+static int
+journal_print(FILE *f __rte_unused, uint32_t level, const char *format, 
va_list ap)
+{
+   struct iovec iov[3];
+   char *buf = NULL;
+   size_t len;
+   char msg[] = "MESSAGE=";
+   char *prio;
+
+   iov[0].iov_base = msg;
+   iov[0].iov_len = strlen(msg);
+
+   len = vasprintf(&buf, format, ap);
+   if (len == 0)
+   return 0;
+
+   /* check that message ends with newline, if not add one */
+   if (buf[len - 1] != '\n') {
+   char *clone  = alloca(len + 1);
+   if (clone == NULL)
+   return 0;
+   memcpy(clone, buf, len);
+   clone[len++] = '\n';
+   buf = clone;
+   }
+
+   iov[1].iov_base = buf;
+   iov[1].iov_len = len;
+
+   /* priority value between 0 ("emerg") and 7 ("debug") */
+   len = asprintf(&prio, "PRIORITY=%i\n", level - 1);
+   iov[2].iov_base = prio;
+   iov[2].iov_len = len;
+
+   return writev(rte_logs.journal_fd, iov, 3);
+}
+
+/*
+ * Check if stderr is going to system journal.
+ * This is the documented way to handle systemd journal
+ *
+ * See: https://systemd.io/JOURNAL_NATIVE_PROTOCOL/
+ *
+ */
+static bool
+using_journal(void)
+{
+   char *jenv, *endp = NULL;
+   struct stat st;
+   unsigned long dev, ino;
+
+   jenv = getenv("JOURNAL_STREAM");
+   if (jenv == NULL)
+   return false;
+
+   if (fstat(STDERR_FILENO, &st) < 0)
+   return false;
+
+   /* systemd sets colon-separated list of device and inode number */
+   dev = strtoul(jenv, &endp, 10);
+   if (endp == NULL || *endp != ':')
+   return false;   /* missing colon */
+
+   ino = strtoul(endp + 1, NULL, 10);
+
+   return dev == st.st_dev && ino == st.st_ino;
+}
+
+/* Connect to systemd's journal service */
+static int
+open_journal(const char *id)
+{
+   char *syslog_id = NULL;
+   struct sockaddr_un sun = {
+   .sun_family = AF_UNIX,
+   .sun_path = "/run/systemd/journal/socket",
+   };
+   ssize_t len;
+   int s;
+
+   s = socket(AF_UNIX, SOCK_DGRAM, 0);
+   if (s < 0)
+   return -1;
+
+   if (connect(s, (struct sockaddr *)&sun, sizeof(sun)) < 0)
+   goto error;
+
+   /* Send syslog identifier as first message */
+   len = asprintf(&syslog_id, "SYSLOG_IDENTIFIER=%s\n", id);
+   if (len == 0)
+   goto error;
+
+   if (write(s, syslog_id, len) != len)
+   goto error;
+
+   return s;
+error:
+   close(s);
+   return -1;
+}
 #endif
 
 
@@ -765,11 +876,24 @@ static cookie_io_functions_t syslog_log_func = {
 static void
 log_output_selection(const char *id)
 {
+#ifdef RTE_EXEC_ENV_WINDOWS
RTE_SET_USED(id);
-
-#ifndef RTE_EXEC_ENV_WINDOWS
+#else
bool is_terminal = isatty(STDERR_FILENO);
 
+   /* If stderr is redirected to systemd journal then upgrade */
+   if (using_journal()) {
+   int jfd = open_journal(id);
+
+   if (jfd < 0) {
+   RTE_LOG_LINE(NOTICE, EAL, "Cannot connect to journal: 
%s",
+strerror(errno));
+   } else {
+   rte_logs.print_func = journal_print;
+   return;
+   }
+   }
+
if (!(rte_logs.syslog_opt == EAL_LOG_SYSLOG_NONE ||
  (rte_logs.syslog_opt == EAL_LOG_SYSLOG_AUTO && is_terminal))) {
int flags = LOG_NDELAY | LOG_PID;
-- 
2.43.0

[PATCH v12 13/14] log: colorize log output

2024-03-25 Thread Stephen Hemminger

Like dmesg, colorize the log output (unless redirected to file).
Timestamp is green, the subsystem is in yellow and the message
is red if urgent, boldface if an error, and normal for info and
debug messages.

Signed-off-by: Stephen Hemminger 
---
 doc/guides/prog_guide/log_lib.rst   |  16 ++-
 lib/eal/common/eal_common_options.c |   1 +
 lib/eal/common/eal_options.h|   2 +
 lib/log/log.c   | 158 +++-
 lib/log/log_internal.h  |   5 +
 lib/log/version.map |   1 +
 6 files changed, 178 insertions(+), 5 deletions(-)

diff --git a/doc/guides/prog_guide/log_lib.rst 
b/doc/guides/prog_guide/log_lib.rst
index abaedc7212..40727ebaae 100644
--- a/doc/guides/prog_guide/log_lib.rst
+++ b/doc/guides/prog_guide/log_lib.rst
@@ -59,6 +59,21 @@ For example::
 
 Within an application, the same result can be got using the 
``rte_log_set_level_pattern()`` or ``rte_log_set_level_regex()`` APIs.
 
+Color output
+
+
+The log library will highlight important messages.
+This is controlled by the ``--log-color`` option.
+he optional argument ``when`` can be ``auto``, ``never``, or ``always``.
+The default setting is ``auto`` which enables color when the output to
+``stderr`` is a terminal.
+If the ``when`` argument is omitted, it defaults to ``always``.
+
+For example to turn off all coloring::
+
+   /path/to/app --log-color=none
+
+
 Log timestamp
 ~
 
@@ -101,7 +116,6 @@ option. There are three possible settings for this option:
 If ``--syslog`` option is not specified, then only console (stderr) will be 
used.
 
 
-
 Using Logging APIs to Generate Log Messages
 ---
 
diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 6f0fd151c1..23b536b7a0 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -75,6 +75,7 @@ eal_long_options[] = {
{OPT_LCORES,1, NULL, OPT_LCORES_NUM   },
{OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM},
{OPT_LOG_TIMESTAMP, 2, NULL, OPT_LOG_TIMESTAMP_NUM},
+   {OPT_LOG_COLOR, 1, NULL, OPT_LOG_COLOR_NUM},
{OPT_TRACE, 1, NULL, OPT_TRACE_NUM},
{OPT_TRACE_DIR, 1, NULL, OPT_TRACE_DIR_NUM},
{OPT_TRACE_BUF_SIZE,1, NULL, OPT_TRACE_BUF_SIZE_NUM   },
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index e24c9eca53..5a63c1dd3a 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -37,6 +37,8 @@ enum {
OPT_LOG_LEVEL_NUM,
 #define OPT_LOG_TIMESTAMP "log-timestamp"
OPT_LOG_TIMESTAMP_NUM,
+#define OPT_LOG_COLOR"log-color"
+   OPT_LOG_COLOR_NUM,
 #define OPT_TRACE "trace"
OPT_TRACE_NUM,
 #define OPT_TRACE_DIR "trace-dir"
diff --git a/lib/log/log.c b/lib/log/log.c
index 0f7bdb3f25..26a63024be 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -23,6 +23,7 @@
 #include 
 #endif
 
+#include 
 #include 
 #include 
 
@@ -49,6 +50,12 @@ enum eal_log_syslog {
EAL_LOG_SYSLOG_BOTH,/* log to both syslog and stderr */
 };
 
+enum eal_log_color {
+   EAL_LOG_COLOR_AUTO = 0, /* default */
+   EAL_LOG_COLOR_NEVER,
+   EAL_LOG_COLOR_ALWAYS,
+};
+
 typedef int (*log_print_t)(FILE *f, uint32_t level, const char *fmt, va_list 
ap);
 static int log_print(FILE *f, uint32_t level, const char *format, va_list ap);
 
@@ -64,6 +71,7 @@ static struct rte_logs {
 #endif
log_print_t print_func;
 
+   enum eal_log_color color_mode;
enum eal_log_time_format time_format;
struct timespec started;   /* when log was initialized */
struct timespec previous;  /* when last msg was printed */
@@ -715,6 +723,74 @@ format_timestamp(char *tsbuf, size_t tsbuflen)
return 0;
 }
 
+enum color {
+   COLOR_NONE,
+   COLOR_RED,
+   COLOR_GREEN,
+   COLOR_YELLOW,
+   COLOR_BLUE,
+   COLOR_MAGENTA,
+   COLOR_CYAN,
+   COLOR_WHITE,
+   COLOR_BOLD,
+   COLOR_CLEAR
+};
+
+static const char * const color_code[] = {
+   [COLOR_NONE]= "",
+   [COLOR_RED] = "\e[31m",
+   [COLOR_GREEN]   = "\e[32m",
+   [COLOR_YELLOW]  = "\e[33m",
+   [COLOR_BLUE]= "\e[34m",
+   [COLOR_MAGENTA] = "\e[35m",
+   [COLOR_CYAN]= "\e[36m",
+   [COLOR_WHITE]   = "\e[37m",
+   [COLOR_BOLD]= "\e[1m",
+   [COLOR_CLEAR]   = "\e[0m",
+};
+
+__rte_format_printf(3, 4)
+static int color_fprintf(FILE *out, enum color color, const char *fmt, ...)
+{
+   va_list args;
+   int ret = 0;
+
+   va_start(args, fmt);
+   ret = fprintf(out, "%s", color_code[color]);
+   ret += vfprintf(out, fmt, args);
+   ret += fprintf(out, "%s", color_code[COLOR_CLEAR]);
+
+   return ret;
+}
+
+static ssize_t
+color_log_write(FILE *f, int level, char *msg)
+{

[PATCH v12 14/14] maintainers: add for log library

2024-03-25 Thread Stephen Hemminger

"You touch it you own it"
Add myself as maintainer for log library.

Signed-off-by: Stephen Hemminger 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7abb3aee49..54c28a601d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -180,6 +180,7 @@ F: app/test/test_threads.c
 F: app/test/test_version.c
 
 Logging
+M: Stephen Hemminger 
 F: lib/log/
 F: doc/guides/prog_guide/log_lib.rst
 F: app/test/test_logs.c
-- 
2.43.0

Re: [PATCH v2 1/6] ethdev: support setting lanes

2024-03-25 Thread Damodharam Ammepalli

On Mon, Mar 25, 2024 at 2:30 AM Thomas Monjalon  wrote:
>
> 25/03/2024 07:24, huangdengdui:
> >
> > On 2024/3/22 21:58, Thomas Monjalon wrote:
> > > 22/03/2024 08:09, Dengdui Huang:
> > >> -#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8)  /**< 10 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9)  /**< 20 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_100GRTE_BIT32(14) /**< 100 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_200GRTE_BIT32(15) /**< 200 Gbps */
> > >> -#define RTE_ETH_LINK_SPEED_400GRTE_BIT32(16) /**< 400 Gbps */
> > >> +#define RTE_ETH_LINK_SPEED_10GRTE_BIT32(8)  /**< 10 Gbps */
> > >> +#define RTE_ETH_LINK_SPEED_20GRTE_BIT32(9)  /**< 20 Gbps 
> > >> 2lanes */
> > >> +#define RTE_ETH_LINK_SPEED_25GRTE_BIT32(10) /**< 25 Gbps */
> > >> +#define RTE_ETH_LINK_SPEED_40GRTE_BIT32(11) /**< 40 Gbps 
> > >> 4lanes */
> > >> +#define RTE_ETH_LINK_SPEED_50GRTE_BIT32(12) /**< 50 Gbps */
> > >> +#define RTE_ETH_LINK_SPEED_56GRTE_BIT32(13) /**< 56 Gbps 
> > >> 4lanes */
> > >> +#define RTE_ETH_LINK_SPEED_100G   RTE_BIT32(14) /**< 100 Gbps */
> > >> +#define RTE_ETH_LINK_SPEED_200G   RTE_BIT32(15) /**< 200 Gbps 
> > >> 4lanes */
> > >> +#define RTE_ETH_LINK_SPEED_400G   RTE_BIT32(16) /**< 400 Gbps 
> > >> 4lanes */
> > >> +#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17)  /**< 10 Gbps 
> > >> 4lanes */
> > >> +#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 
> > >> lanes */
> > >> +#define RTE_ETH_LINK_SPEED_100G_2LANESRTE_BIT32(19) /**< 100 Gbps 2 
> > >> lanes */
> > >> +#define RTE_ETH_LINK_SPEED_100G_4LANESRTE_BIT32(20) /**< 100 Gbps 
> > >> 4lanes */
> > >> +#define RTE_ETH_LINK_SPEED_200G_2LANESRTE_BIT32(21) /**< 200 Gbps 
> > >> 2lanes */
> > >> +#define RTE_ETH_LINK_SPEED_400G_8LANESRTE_BIT32(22) /**< 400 Gbps 
> > >> 8lanes */
> > >
> > > I don't think it is a good idea to make this more complex.
> > > It brings nothing as far as I can see, compared to having speed and lanes 
> > > separated.
> > > Can we have lanes information a separate value? no need for bitmask.
> > >
> > Hi,Thomas, Ajit, roretzla, damodharam
> >
> > I also considered the option at the beginning of the design.
> > But this option is not used due to the following reasons:
> > 1. For the user, ethtool couples speed and lanes.
> > The result of querying the NIC capability is as follows:
> > Supported link modes:
> > 10baseSR4/Full
> > 10baseSR2/Full
> > The NIC capability is configured as follows:
> > ethtool -s eth1 speed 10 lanes 4 autoneg off
> > ethtool -s eth1 speed 10 lanes 2 autoneg off
> >
> > Therefore, users are more accustomed to the coupling of speed and lanes.
> >
> > 2. For the PHY, When the physical layer capability is configured through 
> > the MDIO,
> > the speed and lanes are also coupled.
> > For example:
> > Table 45–7—PMA/PMD control 2 register bit definitions[1]
> > PMA/PMD type selection
> > 1 0 0 1 0 1 0 = 100GBASE-SR2 PMA/PMD
> > 0 1 0 1 1 1 1 = 100GBASE-SR4 PMA/PMD
> >
> > Therefore, coupling speeds and lanes is easier to understand.
> > And it is easier for the driver to report the support lanes.
> >
> > In addition, the code implementation is compatible with the old version.
> > When the driver does not support the lanes setting, the code does not need 
> > to be modified.
> >
> > So I think the speed and lanes coupling is better.
>
> I don't think so.
> You are mixing hardware implementation, user tool, and API.
> Having a separate and simple API is cleaner and not more difficult to handle
> in some get/set style functions.
>
>
>

Agree with Thomas. DPDK lib/apps should be independent of HW implementation.

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.


smime.p7s
Description: S/MIME Cryptographic Signature

Re: meson option to customize RTE_PKTMBUF_HEADROOM patch

2024-03-25 Thread Garrett D'Amore

So we need (for reasons that I don't want to get to into in too much detail) 
that our UDP payload headers are at a specific offset in the packet.

This was not a problem as long as we only used IPv4.  (We have configured 40 
bytes of headroom, which is more than any of our PMDs need by a hefty margin.)

Now that we're extending to support IPv6, we need to reduce that headroom by 20 
bytes, to preserve our UDP payload offset.

This has big ramifications for how we fragment our own upper layer messages, 
and it has been determined that updating the PMDs to allow us to change the 
headroom for this use case (on a per port basis, as we will have some ports on 
IPv4 and others on IPv6) is the least effort, but a large margin.  (Well, 
copying the frames via memcpy would be less development effort, but would be a 
performance catastrophe.)

For transmit side we don't need this, as we can simply adjust the packet as 
needed.  But for the receive side, we are kind of stuck, as the PMDs rely on 
the hard coded RTE_PKTMBUF_HEADROOM to program receive locations.

As far as header splitting, that would indeed be a much much nicer solution.

I haven't looked in the latest code to see if header splitting is even an 
option -- the version of the DPDK I'm working with is a little older (20.11) -- 
we have to update but we have other local changes and so updating is one of the 
things that we still have to do.

At any rate, the version I did look at doesn't seem to support header splits on 
any device other than FM10K.  That's not terrifically interesting for us.  We 
use Mellanox, E810 (ICE), bnxt, cloud NICs (all of them really -- ENA, 
virtio-net, etc.)   We also have a fair amount of ixgbe and i40e on client 
systems in the field.

We also, unfortunately, have an older DPDK 18 with Mellanox contributions for 
IPoverIB though I'm not sure we will try to support IPv6 there.  (We are 
working towards replacing that part of stack with UCX.)

Unless header splitting will work on all of this (excepting the IPoIB piece), 
then it's not something we can really use.
On Mar 25, 2024 at 10:20 AM -0700, Stephen Hemminger 
, wrote:
> On Mon, 25 Mar 2024 10:01:52 +
> Bruce Richardson  wrote:
>
> > On Sat, Mar 23, 2024 at 01:51:25PM -0700, Garrett D'Amore wrote:
> > > > So we right now (at WEKA) have a somewhat older version of DPDK that we
> > > > have customized heavily, and I am going to to need to to make the
> > > > headroom *dynamic* (passed in at run time, and per port.)
> > > > We have this requirement because we need payload to be at a specific
> > > > offset, but have to deal with different header lengths for IPv4 and now
> > > > IPv6.
> > > > My reason for pointing this out, is that I would dearly like if we
> > > > could collaborate on this -- this change is going to touch pretty much
> > > > every PMD (we don't need it on all of them as we only support a subset
> > > > of PMDs, but its still a significant set.)
> > > > I'm not sure if anyone else has considered such a need -- this
> > > > particular message caught my eye as I'm looking specifically in this
> > > > area right now.
> > > >
> > Hi
> >
> > thanks for reaching out. Can you clarify a little more as to the need for
> > this requirement? Can you not just set the headroom value to the max needed
> > value for any port and use that? Is there an issue with having blank space
> > at the start of a buffer?
> >
> > Thanks,
> > /Bruce
>
> If you have to make such a deep change across all PMD's then maybe
> it is not the best solution. What about being able to do some form of buffer
> chaining or pullup.

Re: [PATCH v2 1/6] ethdev: support setting lanes

2024-03-25 Thread lihuisong (C)




在 2024/3/25 17:30, Thomas Monjalon 写道:

25/03/2024 07:24, huangdengdui:

On 2024/3/22 21:58, Thomas Monjalon wrote:

22/03/2024 08:09, Dengdui Huang:

-#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8)  /**< 10 Gbps */
-#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9)  /**< 20 Gbps */
-#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
-#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */
-#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
-#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */
-#define RTE_ETH_LINK_SPEED_100GRTE_BIT32(14) /**< 100 Gbps */
-#define RTE_ETH_LINK_SPEED_200GRTE_BIT32(15) /**< 200 Gbps */
-#define RTE_ETH_LINK_SPEED_400GRTE_BIT32(16) /**< 400 Gbps */
+#define RTE_ETH_LINK_SPEED_10GRTE_BIT32(8)  /**< 10 Gbps */
+#define RTE_ETH_LINK_SPEED_20GRTE_BIT32(9)  /**< 20 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_25GRTE_BIT32(10) /**< 25 Gbps */
+#define RTE_ETH_LINK_SPEED_40GRTE_BIT32(11) /**< 40 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50GRTE_BIT32(12) /**< 50 Gbps */
+#define RTE_ETH_LINK_SPEED_56GRTE_BIT32(13) /**< 56 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_100G   RTE_BIT32(14) /**< 100 Gbps */
+#define RTE_ETH_LINK_SPEED_200G   RTE_BIT32(15) /**< 200 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_400G   RTE_BIT32(16) /**< 400 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17)  /**< 10 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_2LANESRTE_BIT32(19) /**< 100 Gbps 2 lanes 
*/
+#define RTE_ETH_LINK_SPEED_100G_4LANESRTE_BIT32(20) /**< 100 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_200G_2LANESRTE_BIT32(21) /**< 200 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_400G_8LANESRTE_BIT32(22) /**< 400 Gbps 8lanes */

I don't think it is a good idea to make this more complex.
It brings nothing as far as I can see, compared to having speed and lanes 
separated.
Can we have lanes information a separate value? no need for bitmask.


Hi,Thomas, Ajit, roretzla, damodharam

I also considered the option at the beginning of the design.
But this option is not used due to the following reasons:
1. For the user, ethtool couples speed and lanes.
The result of querying the NIC capability is as follows:
Supported link modes:
 10baseSR4/Full
 10baseSR2/Full
The NIC capability is configured as follows:
ethtool -s eth1 speed 10 lanes 4 autoneg off
ethtool -s eth1 speed 10 lanes 2 autoneg off

Therefore, users are more accustomed to the coupling of speed and lanes.

2. For the PHY, When the physical layer capability is configured through the 
MDIO,
the speed and lanes are also coupled.
For example:
Table 45–7—PMA/PMD control 2 register bit definitions[1]
PMA/PMD type selection
 1 0 0 1 0 1 0 = 100GBASE-SR2 PMA/PMD
 0 1 0 1 1 1 1 = 100GBASE-SR4 PMA/PMD

Therefore, coupling speeds and lanes is easier to understand.
And it is easier for the driver to report the support lanes.

In addition, the code implementation is compatible with the old version.
When the driver does not support the lanes setting, the code does not need to 
be modified.

So I think the speed and lanes coupling is better.

I don't think so.
You are mixing hardware implementation, user tool, and API.
Having a separate and simple API is cleaner and not more difficult to handle
in some get/set style functions.

Having a separate and simple API is cleaner. It's good.
But supported lane capabilities have a lot to do with the specified 
speed. This is determined by releated specification.
If we add a separate API for speed lanes, it probably is hard to check 
the validity of the configuration for speed and lanes.
And the setting lane API sepparated from speed is not good for 
uniforming all PMD's behavior in ethdev layer.


The patch[1] is an example for this separate API.
I think it is not very good. It cannot tell user and PMD the follow points:
1) user don't know what lanes should or can be set for a specified speed 
on one NIC.

2) how should PMD do for a supported lanes in their HW?

Anyway, if we add setting speed lanes feature, we must report and set 
speed and lanes capabilities for user well.

otherwise, user will be more confused.

[1] https://patchwork.dpdk.org/project/dpdk/list/?series=31606

BR,
/Huisong




.

[PATCH v13 00/11] Logging unification and improvements

2024-03-25 Thread Stephen Hemminger

Improvements and unification of logging library (for 24.07 release).
This version works on all platforms: Linux, Windows and FreeBSD.

This is update to rework patch set. It adds several new features
to the console log output.

  * Putting a timestamp on console output which is useful for
analyzing performance of startup codes. Timestamp is optional
and must be enabled on command line.

  * Displaying console output with colors.
It uses the standard conventions used by many other Linux commands
for colorized display.  The default is to enable color if the
console output is going to a terminal. But it can be always
on or disabled by command line flag. This default was chosen
based on what dmesg(1) command does.

I find color helpful because DPDK drivers and libraries print
lots of not very useful messages. And having error messages
highlighted in bold face helps. This might also get users to
pay more attention to error messages. Many bug reports have
earlier messages that are lost because there are so many
info messages.

  * Add support for automatic detection of systemd journal
protocol. If running as systemd service will get enhanced
logging.

  * Use of syslog is optional and the meaning of the
--syslog flag has changed. The default is *not* to use
syslog. 

Add myself as maintainer for log because by now have added
more than previous authors...
Will add a release note in next release (after this is merged)

v13 - fix functional tests and warnings about unused result

v12 - add back syslog but make it optional.
  better shims for windows (thread safe)
  split out more of the eal core bits
  fix build warnings on FreeBSD and Ubuntu

Stephen Hemminger (11):
  windows: make getopt functions have const properties
  windows: add os shim for localtime_r
  eal: make eal_log_level_parse common
  eal: do not duplicate rte_init_alert() messages
  eal: change rte_exit() output to match rte_log()
  log: move handling of syslog facility out of eal
  eal: initialize log before everything else
  log: drop syslog support, and make code common
  log: add hook for printing log messages
  log: add timestamp option
  log: add optional support of syslog

 app/test/test_eal_flags.c |  40 ++-
 doc/guides/linux_gsg/linux_eal_parameters.rst |  27 --
 doc/guides/prog_guide/log_lib.rst |  43 +++
 lib/eal/common/eal_common_debug.c |  10 +-
 lib/eal/common/eal_common_options.c   | 116 ---
 lib/eal/common/eal_options.h  |   3 +
 lib/eal/freebsd/eal.c |  64 +---
 lib/eal/linux/eal.c   |  68 +---
 lib/eal/windows/eal.c |  49 +--
 lib/eal/windows/getopt.c  |  23 +-
 lib/eal/windows/include/getopt.h  |   8 +-
 lib/eal/windows/include/rte_os_shim.h |  10 +
 lib/log/log.c | 320 +-
 lib/log/log_freebsd.c |   5 +-
 lib/log/log_internal.h|  20 +-
 lib/log/log_linux.c   |  61 
 lib/log/log_windows.c |  18 -
 lib/log/meson.build   |   5 +-
 lib/log/version.map   |   3 +-
 19 files changed, 533 insertions(+), 360 deletions(-)
 delete mode 100644 lib/log/log_linux.c
 delete mode 100644 lib/log/log_windows.c

-- 
2.43.0

[PATCH v13 01/11] windows: make getopt functions have const properties

2024-03-25 Thread Stephen Hemminger

Having different prototypes on different platforms can lead
to lots of unnecessary workarounds.  Looks like the version of
getopt used from windows was based on an older out of date
version from FreeBSD.

This patch changes getopt, getopt_long, etc to have the same const
attributes as Linux and FreeBSD. The changes are derived from
the current FreeBSD version of getopt_long.

Signed-off-by: Stephen Hemminger 
Acked-by: Tyler Retzlaff 
Acked-by: Dmitry Kozlyuk 
---
 lib/eal/windows/getopt.c | 23 ---
 lib/eal/windows/include/getopt.h |  8 
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/lib/eal/windows/getopt.c b/lib/eal/windows/getopt.c
index a1f51c6c23..50ff71b930 100644
--- a/lib/eal/windows/getopt.c
+++ b/lib/eal/windows/getopt.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 
-const char*optarg; /* argument associated with option */
+char*optarg;   /* argument associated with option */
 intopterr = 1; /* if error message should be printed */
 intoptind = 1; /* index into parent argv vector */
 intoptopt = '?';   /* character checked for validity */
@@ -39,9 +39,9 @@ static void pass(const char *a) {(void) a; }
 #defineBADARG  ((*options == ':') ? (int)':' : (int)'?')
 #defineINORDER 1
 
-#defineEMSG""
+static char EMSG[] = "";
 
-static const char *place = EMSG; /* option letter processing */
+static char *place = EMSG; /* option letter processing */
 
 /* XXX: set optreset to 1 rather than these two */
 static int nonopt_start = -1; /* first non option argument (for permute) */
@@ -80,7 +80,7 @@ gcd(int a, int b)
  */
 static void
 permute_args(int panonopt_start, int panonopt_end, int opt_end,
-   char **nargv)
+   char * const *nargv)
 {
int cstart, cyclelen, i, j, ncycle, nnonopts, nopts, pos;
char *swap;
@@ -101,11 +101,12 @@ permute_args(int panonopt_start, int panonopt_end, int 
opt_end,
pos -= nnonopts;
else
pos += nopts;
+
swap = nargv[pos];
/* LINTED const cast */
-   ((char **) nargv)[pos] = nargv[cstart];
+   ((char **)(uintptr_t)nargv)[pos] = nargv[cstart];
/* LINTED const cast */
-   ((char **)nargv)[cstart] = swap;
+   ((char **)(uintptr_t)nargv)[cstart] = swap;
}
}
 }
@@ -116,7 +117,7 @@ permute_args(int panonopt_start, int panonopt_end, int 
opt_end,
  * Returns -1 if short_too is set and the option does not match long_options.
  */
 static int
-parse_long_options(char **nargv, const char *options,
+parse_long_options(char * const *nargv, const char *options,
const struct option *long_options, int *idx, int short_too)
 {
const char *current_argv;
@@ -236,7 +237,7 @@ parse_long_options(char **nargv, const char *options,
  * Parse argc/argv argument vector.  Called by user level routines.
  */
 static int
-getopt_internal(int nargc, char **nargv, const char *options,
+getopt_internal(int nargc, char *const nargv[], const char *options,
const struct option *long_options, int *idx, int flags)
 {
char *oli;  /* option letter list index */
@@ -434,7 +435,7 @@ getopt_internal(int nargc, char **nargv, const char 
*options,
  * Parse argc/argv argument vector.
  */
 int
-getopt(int nargc, char *nargv[], const char *options)
+getopt(int nargc, char *const nargv[], const char *options)
 {
return getopt_internal(nargc, nargv, options, NULL, NULL,
   FLAG_PERMUTE);
@@ -445,7 +446,7 @@ getopt(int nargc, char *nargv[], const char *options)
  * Parse argc/argv argument vector.
  */
 int
-getopt_long(int nargc, char *nargv[], const char *options,
+getopt_long(int nargc, char *const nargv[], const char *options,
const struct option *long_options, int *idx)
 {
 
@@ -458,7 +459,7 @@ getopt_long(int nargc, char *nargv[], const char *options,
  * Parse argc/argv argument vector.
  */
 int
-getopt_long_only(int nargc, char *nargv[], const char *options,
+getopt_long_only(int nargc, char *const nargv[], const char *options,
const struct option *long_options, int *idx)
 {
 
diff --git a/lib/eal/windows/include/getopt.h b/lib/eal/windows/include/getopt.h
index 6f57af454b..e4cf6873cb 100644
--- a/lib/eal/windows/include/getopt.h
+++ b/lib/eal/windows/include/getopt.h
@@ -44,7 +44,7 @@
 
 
 /** argument to current option, or NULL if it has none */
-extern const char *optarg;
+extern char *optarg;
 /** Current position in arg string.  Starts from 1.
  * Setting to 0 resets state.
  */
@@ -80,14 +80,14 @@ struct option {
 };
 
 /** Compat: getopt */
-int getopt(int argc, char *argv[], const char *options);
+int getopt(int argc, char *const

[PATCH v13 02/11] windows: add os shim for localtime_r

2024-03-25 Thread Stephen Hemminger

Windows does not have localtime_r but it does have a similar
function that can be used instead.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/windows/include/rte_os_shim.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/lib/eal/windows/include/rte_os_shim.h 
b/lib/eal/windows/include/rte_os_shim.h
index eda8113662..e9741a9df2 100644
--- a/lib/eal/windows/include/rte_os_shim.h
+++ b/lib/eal/windows/include/rte_os_shim.h
@@ -110,4 +110,14 @@ rte_clock_gettime(clockid_t clock_id, struct timespec *tp)
 }
 #define clock_gettime(clock_id, tp) rte_clock_gettime(clock_id, tp)
 
+static inline struct tm *
+rte_localtime_r(const time_t *timer, struct tm *buf)
+{
+   if (localtime_s(buf, timer) == 0)
+   return buf;
+   else
+   return NULL;
+}
+#define localtime_r(timer, buf) rte_localtime_r(timer, buf)
+
 #endif /* _RTE_OS_SHIM_ */
-- 
2.43.0

[PATCH v13 03/11] eal: make eal_log_level_parse common

2024-03-25 Thread Stephen Hemminger

The code to parse for log-level option should be same on
all OS variants.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/common/eal_common_options.c | 46 +
 lib/eal/common/eal_options.h|  1 +
 lib/eal/freebsd/eal.c   | 42 --
 lib/eal/linux/eal.c | 39 
 lib/eal/windows/eal.c   | 35 --
 5 files changed, 47 insertions(+), 116 deletions(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index e541f07939..5435399b85 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -1640,6 +1640,51 @@ eal_parse_huge_unlink(const char *arg, struct 
hugepage_file_discipline *out)
return -1;
 }
 
+/* Parse the all arguments looking for log related ones */
+int
+eal_log_level_parse(int argc, char * const argv[])
+{
+   struct internal_config *internal_conf = 
eal_get_internal_configuration();
+   int option_index, opt;
+   const int old_optind = optind;
+   const int old_optopt = optopt;
+   const int old_opterr = opterr;
+   char *old_optarg = optarg;
+#ifdef RTE_EXEC_ENV_FREEBSD
+   const int old_optreset = optreset;
+   optreset = 1;
+#endif
+
+   optind = 1;
+   opterr = 0;
+
+   while ((opt = getopt_long(argc, argv, eal_short_options,
+ eal_long_options, &option_index)) != EOF) {
+
+   switch (opt) {
+   case OPT_LOG_LEVEL_NUM:
+   if (eal_parse_common_option(opt, optarg, internal_conf) 
< 0)
+   return -1;
+   break;
+   case '?':
+   /* getopt is not happy, stop right now */
+   goto out;
+   default:
+   continue;
+   }
+   }
+out:
+   /* restore getopt lib */
+   optind = old_optind;
+   optopt = old_optopt;
+   optarg = old_optarg;
+   opterr = old_opterr;
+#ifdef RTE_EXEC_ENV_FREEBSD
+   optreset = old_optreset;
+#endif
+   return 0;
+}
+
 int
 eal_parse_common_option(int opt, const char *optarg,
struct internal_config *conf)
@@ -2173,6 +2218,7 @@ rte_vect_set_max_simd_bitwidth(uint16_t bitwidth)
return 0;
 }
 
+
 void
 eal_common_usage(void)
 {
diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
index 3cc9cb6412..f3f2e104f6 100644
--- a/lib/eal/common/eal_options.h
+++ b/lib/eal/common/eal_options.h
@@ -96,6 +96,7 @@ enum {
 extern const char eal_short_options[];
 extern const struct option eal_long_options[];
 
+int eal_log_level_parse(int argc, char * const argv[]);
 int eal_parse_common_option(int opt, const char *argv,
struct internal_config *conf);
 int eal_option_device_parse(void);
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index bab77118e9..9825bcea0b 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -363,48 +363,6 @@ eal_get_hugepage_mem_size(void)
return (size < SIZE_MAX) ? (size_t)(size) : SIZE_MAX;
 }
 
-/* Parse the arguments for --log-level only */
-static void
-eal_log_level_parse(int argc, char **argv)
-{
-   int opt;
-   char **argvopt;
-   int option_index;
-   const int old_optind = optind;
-   const int old_optopt = optopt;
-   const int old_optreset = optreset;
-   char * const old_optarg = optarg;
-   struct internal_config *internal_conf =
-   eal_get_internal_configuration();
-
-   argvopt = argv;
-   optind = 1;
-   optreset = 1;
-
-   while ((opt = getopt_long(argc, argvopt, eal_short_options,
- eal_long_options, &option_index)) != EOF) {
-
-   int ret;
-
-   /* getopt is not happy, stop right now */
-   if (opt == '?')
-   break;
-
-   ret = (opt == OPT_LOG_LEVEL_NUM) ?
-   eal_parse_common_option(opt, optarg, internal_conf) : 0;
-
-   /* common parser is not happy */
-   if (ret < 0)
-   break;
-   }
-
-   /* restore getopt lib */
-   optind = old_optind;
-   optopt = old_optopt;
-   optreset = old_optreset;
-   optarg = old_optarg;
-}
-
 /* Parse the argument given in the command line of the application */
 static int
 eal_parse_args(int argc, char **argv)
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index fd422f1f62..bffeb1f34e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -546,45 +546,6 @@ eal_parse_vfio_vf_token(const char *vf_token)
return -1;
 }
 
-/* Parse the arguments for --log-level only */
-static void
-eal_log_level_parse(int argc, char **argv)
-{
-   int opt;
-   char **argvopt;
-   int option_index;
-   const int old_optind = optind;
-   const

[PATCH v13 04/11] eal: do not duplicate rte_init_alert() messages

2024-03-25 Thread Stephen Hemminger

The message already goes through logging, and does not need
to be printed on stderr. Message level should be ALERT
to match function name.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/freebsd/eal.c | 3 +--
 lib/eal/linux/eal.c   | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 9825bcea0b..17b56f38aa 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -529,8 +529,7 @@ rte_eal_iopl_init(void)
 
 static void rte_eal_init_alert(const char *msg)
 {
-   fprintf(stderr, "EAL: FATAL: %s\n", msg);
-   EAL_LOG(ERR, "%s", msg);
+   EAL_LOG(ALERT, "%s", msg);
 }
 
 /* Launch threads, called at application init(). */
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index bffeb1f34e..23dc26b124 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -840,8 +840,7 @@ static int rte_eal_vfio_setup(void)
 
 static void rte_eal_init_alert(const char *msg)
 {
-   fprintf(stderr, "EAL: FATAL: %s\n", msg);
-   EAL_LOG(ERR, "%s", msg);
+   EAL_LOG(ALERT, "%s", msg);
 }
 
 /*
-- 
2.43.0

[PATCH v13 05/11] eal: change rte_exit() output to match rte_log()

2024-03-25 Thread Stephen Hemminger

The rte_exit() output format confuses the timestamp and coloring
options. Change it to use be a single line with proper prefix.

Before:
[ 0.006481] EAL: Error - exiting with code: 1
  Cause: [ 0.006489] Cannot init EAL: Permission denied

After:
[ 0.006238] EAL: Error - exiting with code: 1
[ 0.006250] EAL: Cause - Cannot init EAL: Permission denied

Signed-off-by: Stephen Hemminger 
---
 lib/eal/common/eal_common_debug.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/lib/eal/common/eal_common_debug.c 
b/lib/eal/common/eal_common_debug.c
index 3e77995896..71bb5a7d34 100644
--- a/lib/eal/common/eal_common_debug.c
+++ b/lib/eal/common/eal_common_debug.c
@@ -34,17 +34,17 @@ void
 rte_exit(int exit_code, const char *format, ...)
 {
va_list ap;
+   char *msg = NULL;
 
if (exit_code != 0)
-   RTE_LOG(CRIT, EAL, "Error - exiting with code: %d\n"
-   "  Cause: ", exit_code);
+   EAL_LOG(CRIT, "Error - exiting with code: %d", exit_code);
 
va_start(ap, format);
-   rte_vlog(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, format, ap);
+   if (vasprintf(&msg, format, ap) > 0)
+   rte_log(RTE_LOG_CRIT, RTE_LOGTYPE_EAL, "EAL: Cause - %s", msg);
va_end(ap);
 
if (rte_eal_cleanup() != 0 && rte_errno != EALREADY)
-   EAL_LOG(CRIT,
-   "EAL could not release all resources");
+   EAL_LOG(CRIT, "EAL could not release all resources");
exit(exit_code);
 }
-- 
2.43.0

[PATCH v13 06/11] log: move handling of syslog facility out of eal

2024-03-25 Thread Stephen Hemminger

The syslog facility property is better handled in lib/log
rather than in eal. This also allows for changes to what
syslog flag means in later steps.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/common/eal_common_options.c | 51 ++---
 lib/eal/freebsd/eal.c   |  5 ++-
 lib/eal/linux/eal.c |  7 ++--
 lib/eal/windows/eal.c   |  6 ++--
 lib/log/log_freebsd.c   |  2 +-
 lib/log/log_internal.h  |  5 ++-
 lib/log/log_linux.c | 47 --
 lib/log/log_windows.c   |  8 -
 lib/log/version.map |  1 +
 9 files changed, 68 insertions(+), 64 deletions(-)

diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 5435399b85..661b2db211 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -6,9 +6,6 @@
 #include 
 #include 
 #include 
-#ifndef RTE_EXEC_ENV_WINDOWS
-#include 
-#endif
 #include 
 #include 
 #include 
@@ -349,10 +346,6 @@ eal_reset_internal_config(struct internal_config 
*internal_cfg)
}
internal_cfg->base_virtaddr = 0;
 
-#ifdef LOG_DAEMON
-   internal_cfg->syslog_facility = LOG_DAEMON;
-#endif
-
/* if set to NONE, interrupt mode is determined automatically */
internal_cfg->vfio_intr_mode = RTE_INTR_MODE_NONE;
memset(internal_cfg->vfio_vf_token, 0,
@@ -1297,47 +1290,6 @@ eal_parse_lcores(const char *lcores)
return ret;
 }
 
-#ifndef RTE_EXEC_ENV_WINDOWS
-static int
-eal_parse_syslog(const char *facility, struct internal_config *conf)
-{
-   int i;
-   static const struct {
-   const char *name;
-   int value;
-   } map[] = {
-   { "auth", LOG_AUTH },
-   { "cron", LOG_CRON },
-   { "daemon", LOG_DAEMON },
-   { "ftp", LOG_FTP },
-   { "kern", LOG_KERN },
-   { "lpr", LOG_LPR },
-   { "mail", LOG_MAIL },
-   { "news", LOG_NEWS },
-   { "syslog", LOG_SYSLOG },
-   { "user", LOG_USER },
-   { "uucp", LOG_UUCP },
-   { "local0", LOG_LOCAL0 },
-   { "local1", LOG_LOCAL1 },
-   { "local2", LOG_LOCAL2 },
-   { "local3", LOG_LOCAL3 },
-   { "local4", LOG_LOCAL4 },
-   { "local5", LOG_LOCAL5 },
-   { "local6", LOG_LOCAL6 },
-   { "local7", LOG_LOCAL7 },
-   { NULL, 0 }
-   };
-
-   for (i = 0; map[i].name; i++) {
-   if (!strcmp(facility, map[i].name)) {
-   conf->syslog_facility = map[i].value;
-   return 0;
-   }
-   }
-   return -1;
-}
-#endif
-
 static void
 eal_log_usage(void)
 {
@@ -1663,6 +1615,7 @@ eal_log_level_parse(int argc, char * const argv[])
 
switch (opt) {
case OPT_LOG_LEVEL_NUM:
+   case OPT_SYSLOG_NUM:
if (eal_parse_common_option(opt, optarg, internal_conf) 
< 0)
return -1;
break;
@@ -1882,7 +1835,7 @@ eal_parse_common_option(int opt, const char *optarg,
 
 #ifndef RTE_EXEC_ENV_WINDOWS
case OPT_SYSLOG_NUM:
-   if (eal_parse_syslog(optarg, conf) < 0) {
+   if (eal_log_syslog(optarg) < 0) {
EAL_LOG(ERR, "invalid parameters for --"
OPT_SYSLOG);
return -1;
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 17b56f38aa..6552f9c138 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -392,8 +391,8 @@ eal_parse_args(int argc, char **argv)
goto out;
}
 
-   /* eal_log_level_parse() already handled this option */
-   if (opt == OPT_LOG_LEVEL_NUM)
+   /* eal_log_level_parse() already handled these */
+   if (opt == OPT_LOG_LEVEL_NUM || opt == OPT_LOG_SYSLOG_NUM)
continue;
 
ret = eal_parse_common_option(opt, optarg, internal_conf);
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 23dc26b124..3d0c34063e 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -610,8 +610,8 @@ eal_parse_args(int argc, char **argv)
goto out;
}
 
-   /* eal_log_level_parse() already handled this option */
-   if (opt == OPT_LOG_LEVEL_NUM)
+   /* eal_log_level_parse() already handled these options */
+   if (opt == OPT_LOG_LEVEL_NUM || opt == OPT_SYSLOG_NUM)
continue;
 
ret = eal_parse_common_option(opt, optarg, internal_conf);
@@ -1106,8 +1106,7 @@ rte_eal_init(int argc, ch

[PATCH v13 08/11] log: drop syslog support, and make code common

2024-03-25 Thread Stephen Hemminger

This patch makes the log setup code common across all platforms.

Drops syslog support for now, will come back in later patch.

Signed-off-by: Stephen Hemminger 
---
 app/test/test_eal_flags.c   |  11 ++-
 lib/eal/common/eal_common_options.c |   3 -
 lib/log/log.c   |  29 +---
 lib/log/log_internal.h  |   6 --
 lib/log/log_linux.c | 102 
 lib/log/log_windows.c   |  22 --
 lib/log/meson.build |   5 +-
 lib/log/version.map |   1 -
 8 files changed, 26 insertions(+), 153 deletions(-)
 delete mode 100644 lib/log/log_linux.c
 delete mode 100644 lib/log/log_windows.c

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 6cb4b06757..36e3185a10 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -984,11 +984,10 @@ test_misc_flags(void)
const char *argv1[] = {prgname, prefix, mp_flag, "--no-pci"};
/* With -v */
const char *argv2[] = {prgname, prefix, mp_flag, "-v"};
+   /* With empty --syslog */
+   const char *argv3[] = {prgname, prefix, mp_flag, "--syslog"};
/* With valid --syslog */
-   const char *argv3[] = {prgname, prefix, mp_flag,
-   "--syslog", "syslog"};
-   /* With empty --syslog (should fail) */
-   const char *argv4[] = {prgname, prefix, mp_flag, "--syslog"};
+   const char *argv4[] = {prgname, prefix, mp_flag, "--syslog", "always"};
/* With invalid --syslog */
const char *argv5[] = {prgname, prefix, mp_flag, "--syslog", "error"};
/* With no-sh-conf, also use no-huge to ensure this test runs on BSD */
@@ -1083,8 +1082,8 @@ test_misc_flags(void)
printf("Error - process did not run ok with --syslog flag\n");
goto fail;
}
-   if (launch_proc(argv4) == 0) {
-   printf("Error - process run ok with empty --syslog flag\n");
+   if (launch_proc(argv4) != 0) {
+   printf("Error - process did not with --syslog always flag\n");
goto fail;
}
if (launch_proc(argv5) == 0) {
diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 661b2db211..9ab512e8a1 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -2212,9 +2212,6 @@ eal_common_usage(void)
   "  (can be used multiple times)\n"
   "  --"OPT_VMWARE_TSC_MAP"Use VMware TSC map instead of 
native RDTSC\n"
   "  --"OPT_PROC_TYPE" Type of this process 
(primary|secondary|auto)\n"
-#ifndef RTE_EXEC_ENV_WINDOWS
-  "  --"OPT_SYSLOG"Set syslog facility\n"
-#endif
   "  --"OPT_LOG_LEVEL"= Set global log level\n"
   "  --"OPT_LOG_LEVEL"=:\n"
   "  Set specific log level\n"
diff --git a/lib/log/log.c b/lib/log/log.c
index 255f757d94..f597da2e39 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -70,12 +70,13 @@ struct log_cur_msg {
  /* per core log */
 static RTE_DEFINE_PER_LCORE(struct log_cur_msg, log_cur_msg);
 
-/* default logs */
-
 /* Change the stream that will be used by logging system */
 int
 rte_openlog_stream(FILE *f)
 {
+   if (rte_logs.file != NULL)
+   fclose(rte_logs.file);
+
rte_logs.file = f;
return 0;
 }
@@ -505,13 +506,20 @@ rte_log(uint32_t level, uint32_t logtype, const char 
*format, ...)
return ret;
 }
 
+/* Placeholder */
+int
+eal_log_syslog(const char *mode __rte_unused)
+{
+   return -1;
+}
+
 /*
- * Called by environment-specific initialization functions.
+ * Called by rte_eal_init
  */
 void
-eal_log_set_default(FILE *default_log)
+eal_log_init(const char *id __rte_unused)
 {
-   default_log_stream = default_log;
+   default_log_stream = stderr;
 
 #if RTE_LOG_DP_LEVEL >= RTE_LOG_DEBUG
RTE_LOG(NOTICE, EAL,
@@ -525,8 +533,11 @@ eal_log_set_default(FILE *default_log)
 void
 rte_eal_log_cleanup(void)
 {
-   if (default_log_stream) {
-   fclose(default_log_stream);
-   default_log_stream = NULL;
-   }
+   FILE *log_stream = rte_log_get_stream();
+
+   /* don't close stderr on the application */
+   if (log_stream != stderr)
+   fclose(log_stream);
+
+   rte_logs.file = NULL;
 }
diff --git a/lib/log/log_internal.h b/lib/log/log_internal.h
index d5fabd7ef7..3c46328e7b 100644
--- a/lib/log/log_internal.h
+++ b/lib/log/log_internal.h
@@ -16,12 +16,6 @@
 __rte_internal
 void eal_log_init(const char *id);
 
-/*
- * Determine where log data is written when no call to rte_openlog_stream.
- */
-__rte_internal
-void eal_log_set_default(FILE *default_log);
-
 /*
  * Save a log option for later.
  */
diff --git a/lib/log/log_linux.c b/lib/log/log_linux.c
deleted file mode 100644
index 6d7dc8f3ab..00
--- a/lib/log/log_linux.c
+++ /dev/null

[PATCH v13 07/11] eal: initialize log before everything else

2024-03-25 Thread Stephen Hemminger

In order for all log messages (including CPU mismatch) to
come out through the logging library, it must be initialized
as early in rte_eal_init() as possible on all platforms.

Where it was done before was likely historical based on
the support of non-OS isolated CPU's which required a shared
memory buffer; that support was dropped before DPDK was
publicly released.

Signed-off-by: Stephen Hemminger 
---
 lib/eal/freebsd/eal.c  | 12 +---
 lib/eal/linux/eal.c| 19 +--
 lib/eal/windows/eal.c  |  8 ++--
 lib/log/log_freebsd.c  |  3 +--
 lib/log/log_internal.h |  2 +-
 lib/log/log_linux.c| 14 ++
 lib/log/log_windows.c  |  4 +---
 7 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 6552f9c138..55ff27a4da 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -52,6 +52,7 @@
 #include "eal_options.h"
 #include "eal_memcfg.h"
 #include "eal_trace.h"
+#include "log_internal.h"
 
 #define MEMSIZE_IF_NO_HUGE_PAGE (64ULL * 1024ULL * 1024ULL)
 
@@ -546,6 +547,14 @@ rte_eal_init(int argc, char **argv)
bool has_phys_addr;
enum rte_iova_mode iova_mode;
 
+   /* setup log as early as possible */
+   if (eal_log_level_parse(argc, argv) < 0) {
+   rte_eal_init_alert("invalid log arguments.");
+   rte_errno = EINVAL;
+   return -1;
+   }
+   eal_log_init(getprogname());
+
/* checks if the machine is adequate */
if (!rte_cpu_is_supported()) {
rte_eal_init_alert("unsupported cpu type.");
@@ -565,9 +574,6 @@ rte_eal_init(int argc, char **argv)
/* clone argv to report out later in telemetry */
eal_save_args(argc, argv);
 
-   /* set log level as early as possible */
-   eal_log_level_parse(argc, argv);
-
if (rte_eal_cpu_init() < 0) {
rte_eal_init_alert("Cannot detect lcores.");
rte_errno = ENOTSUP;
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 3d0c34063e..b9a0fb1742 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -936,6 +936,15 @@ rte_eal_init(int argc, char **argv)
struct internal_config *internal_conf =
eal_get_internal_configuration();
 
+   /* setup log as early as possible */
+   if (eal_log_level_parse(argc, argv) < 0) {
+   rte_eal_init_alert("invalid log arguments.");
+   rte_errno = EINVAL;
+   return -1;
+   }
+
+   eal_log_init(program_invocation_short_name);
+
/* checks if the machine is adequate */
if (!rte_cpu_is_supported()) {
rte_eal_init_alert("unsupported cpu type.");
@@ -952,9 +961,6 @@ rte_eal_init(int argc, char **argv)
 
eal_reset_internal_config(internal_conf);
 
-   /* set log level as early as possible */
-   eal_log_level_parse(argc, argv);
-
/* clone argv to report out later in telemetry */
eal_save_args(argc, argv);
 
@@ -1106,13 +1112,6 @@ rte_eal_init(int argc, char **argv)
 #endif
}
 
-   if (eal_log_init(program_invocation_short_name) < 0) {
-   rte_eal_init_alert("Cannot init logging.");
-   rte_errno = ENOMEM;
-   rte_atomic_store_explicit(&run_once, 0, 
rte_memory_order_relaxed);
-   return -1;
-   }
-
 #ifdef VFIO_PRESENT
if (rte_eal_vfio_setup() < 0) {
rte_eal_init_alert("Cannot init VFIO");
diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 2519a30017..74b3ece30c 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -250,9 +250,13 @@ rte_eal_init(int argc, char **argv)
char cpuset[RTE_CPU_AFFINITY_STR_LEN];
char thread_name[RTE_THREAD_NAME_SIZE];
 
-   eal_log_init(NULL);
+   if (eal_log_level_parse(argc, argv) < 0) {
+   rte_eal_init_alert("invalid log arguments.");
+   rte_errno = EINVAL;
+   return -1;
+   }
 
-   eal_log_level_parse(argc, argv);
+   eal_log_init(NULL);
 
if (eal_create_cpu_map() < 0) {
rte_eal_init_alert("Cannot discover CPU and NUMA.");
diff --git a/lib/log/log_freebsd.c b/lib/log/log_freebsd.c
index 953e371bee..33a0925c43 100644
--- a/lib/log/log_freebsd.c
+++ b/lib/log/log_freebsd.c
@@ -5,8 +5,7 @@
 #include 
 #include "log_internal.h"
 
-int
+void
 eal_log_init(__rte_unused const char *id)
 {
-   return 0;
 }
diff --git a/lib/log/log_internal.h b/lib/log/log_internal.h
index cb15cdff08..d5fabd7ef7 100644
--- a/lib/log/log_internal.h
+++ b/lib/log/log_internal.h
@@ -14,7 +14,7 @@
  * Initialize the default log stream.
  */
 __rte_internal
-int eal_log_init(const char *id);
+void eal_log_init(const char *id);
 
 /*
  * Determine where log data is written when no call to rte_openlog_stream.
diff --git a/lib/log/log_linux.c b/lib/log/log_linux.c
index 47aa074da2..6d7dc8f3ab 100644
--- a/lib/log/log_linux.c
+++ b/li

[PATCH v13 09/11] log: add hook for printing log messages

2024-03-25 Thread Stephen Hemminger

This is useful for when decorating log output for console
or journal. Provide basic version in this patch.

Signed-off-by: Stephen Hemminger 
---
 lib/log/log.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/lib/log/log.c b/lib/log/log.c
index f597da2e39..acd4c320b6 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -26,16 +26,21 @@ struct rte_log_dynamic_type {
uint32_t loglevel;
 };
 
+typedef int (*log_print_t)(FILE *f, uint32_t level, const char *fmt, va_list 
ap);
+static int log_print(FILE *f, uint32_t level, const char *format, va_list ap);
+
 /** The rte_log structure. */
 static struct rte_logs {
uint32_t type;  /**< Bitfield with enabled logs. */
uint32_t level; /**< Log level. */
FILE *file; /**< Output file set by rte_openlog_stream, or NULL. */
+   log_print_t print_func;
size_t dynamic_types_len;
struct rte_log_dynamic_type *dynamic_types;
 } rte_logs = {
.type = UINT32_MAX,
.level = RTE_LOG_DEBUG,
+   .print_func = log_print,
 };
 
 struct rte_eal_opt_loglevel {
@@ -78,6 +83,7 @@ rte_openlog_stream(FILE *f)
fclose(rte_logs.file);
 
rte_logs.file = f;
+   rte_logs.print_func = log_print;
return 0;
 }
 
@@ -484,7 +490,7 @@ rte_vlog(uint32_t level, uint32_t logtype, const char 
*format, va_list ap)
RTE_PER_LCORE(log_cur_msg).loglevel = level;
RTE_PER_LCORE(log_cur_msg).logtype = logtype;
 
-   ret = vfprintf(f, format, ap);
+   ret = (*rte_logs.print_func)(f, level, format, ap);
fflush(f);
return ret;
 }
@@ -513,6 +519,15 @@ eal_log_syslog(const char *mode __rte_unused)
return -1;
 }
 
+/* default log print function */
+__rte_format_printf(3, 0)
+static int
+log_print(FILE *f, uint32_t level __rte_unused,
+ const char *format, va_list ap)
+{
+   return vfprintf(f, format, ap);
+}
+
 /*
  * Called by rte_eal_init
  */
-- 
2.43.0

[PATCH v13 10/11] log: add timestamp option

2024-03-25 Thread Stephen Hemminger

When debugging driver or startup issues, it is useful to have
a timestamp on each message printed. The messages in syslog
already have a timestamp, but often syslog is not available
during testing.

There are multiple timestamp formats similar to Linux dmesg.
The default is time relative since startup (when first
step of logging initialization is done by constructor).
Other alternative formats are delta, ctime, reltime and iso formats.

Example:
$ dpdk-testpmd --log-timestamp -- -i
[ 0.008610] EAL: Detected CPU lcores: 8
[ 0.008634] EAL: Detected NUMA nodes: 1
[ 0.008792] EAL: Detected static linkage of DPDK
[ 0.010620] EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
[ 0.012618] EAL: Selected IOVA mode 'VA'
[ 0.016675] testpmd: No probed ethernet devices
Interactive-mode selected

Signed-off-by: Stephen Hemminger 
---
 app/test/test_eal_flags.c   |  26 
 doc/guides/prog_guide/log_lib.rst   |  26 
 lib/eal/common/eal_common_options.c |  14 ++-
 lib/eal/common/eal_options.h|   2 +
 lib/eal/freebsd/eal.c   |   6 +-
 lib/eal/linux/eal.c |   4 +-
 lib/eal/windows/eal.c   |   4 +-
 lib/log/log.c   | 183 +++-
 lib/log/log_internal.h  |   9 ++
 lib/log/version.map |   1 +
 10 files changed, 268 insertions(+), 7 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index 36e3185a10..e54f6e8b7f 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -1054,6 +1054,19 @@ test_misc_flags(void)
const char * const argv22[] = {prgname, prefix, mp_flag,
   "--huge-worker-stack=512"};
 
+   /* Try running with --log-timestamp */
+   const char * const argv23[] = {prgname, prefix, mp_flag,
+  "--log-timestamp" };
+
+   /* Try running with --log-timestamp=iso */
+   const char * const argv24[] = {prgname, prefix, mp_flag,
+  "--log-timestamp=iso" };
+
+   /* Try running with invalid timestamp */
+   const char * const argv25[] = {prgname, prefix, mp_flag,
+  "--log-timestamp=invalid" };
+
+
/* run all tests also applicable to FreeBSD first */
 
if (launch_proc(argv0) == 0) {
@@ -1161,6 +1174,19 @@ test_misc_flags(void)
printf("Error - process did not run ok with 
--huge-worker-stack=size parameter\n");
goto fail;
}
+   if (launch_proc(argv23) != 0) {
+   printf("Error - process did not run ok with --log-timestamp 
parameter\n");
+   goto fail;
+   }
+   if (launch_proc(argv24) != 0) {
+   printf("Error - process did not run ok with --log-timestamp=iso 
parameter\n");
+   goto fail;
+   }
+   if (launch_proc(argv25) == 0) {
+   printf("Error - process did run ok with --log-timestamp=invalid 
parameter\n");
+   goto fail;
+   }
+
 
rmdir(hugepath_dir3);
rmdir(hugepath_dir2);
diff --git a/doc/guides/prog_guide/log_lib.rst 
b/doc/guides/prog_guide/log_lib.rst
index ff9d1b54a2..504eefe1d2 100644
--- a/doc/guides/prog_guide/log_lib.rst
+++ b/doc/guides/prog_guide/log_lib.rst
@@ -59,6 +59,32 @@ For example::
 
 Within an application, the same result can be got using the 
``rte_log_set_level_pattern()`` or ``rte_log_set_level_regex()`` APIs.
 
+Log timestamp
+~
+
+An optional timestamp can be added before each message
+by adding the ``--log-timestamp`` option.
+For example::
+
+   /path/to/app --log-level=lib.*:debug --log-timestamp
+
+Multiple timestamp alternative timestamp formats are available:
+
+.. csv-table:: Log time stamp format
+   :header: "Format", "Description", "Example"
+   :widths: 6, 30, 32
+
+   "ctime", "Unix ctime", "``[Wed Mar 20 07:26:12 2024]``"
+   "delta", "Offset since last", "``[<3.162373>]``"
+   "reltime", "Seconds since last or time if minute changed", "``[  
+3.001791]`` or ``[Mar20 07:26:12]``"
+   "iso", "ISO-8601", "``[2024-03-20T07:26:12−07:00]``"
+
+To prefix all console messages with ISO format time the syntax is::
+
+   /path/to/app --log-timestamp=iso
+
+
+
 Using Logging APIs to Generate Log Messages
 ---
 
diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 9ab512e8a1..5173835c2c 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -74,6 +74,7 @@ eal_long_options[] = {
{OPT_IOVA_MODE, 1, NULL, OPT_IOVA_MODE_NUM},
{OPT_LCORES,1, NULL, OPT_LCORES_NUM   },
{OPT_LOG_LEVEL, 1, NULL, OPT_LOG_LEVEL_NUM},
+   {OPT_LOG_TIMESTAMP, 2, NULL, OPT_LOG_TIMESTAMP_NUM},
{OPT_TRACE, 1, NULL, OPT_TRACE_NUM

[PATCH v13 11/11] log: add optional support of syslog

2024-03-25 Thread Stephen Hemminger

Log to syslog only if option is specified. And if syslog is used
then normally only log to syslog, don't duplicate output.
Also enables syslog support on FreeBSD.

Signed-off-by: Stephen Hemminger 
---
 app/test/test_eal_flags.c |   5 +-
 doc/guides/linux_gsg/linux_eal_parameters.rst |  27 -
 doc/guides/prog_guide/log_lib.rst |  17 +++
 lib/eal/common/eal_common_options.c   |   2 +-
 lib/log/log.c | 101 --
 5 files changed, 114 insertions(+), 38 deletions(-)

diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c
index e54f6e8b7f..08f4866461 100644
--- a/app/test/test_eal_flags.c
+++ b/app/test/test_eal_flags.c
@@ -987,9 +987,10 @@ test_misc_flags(void)
/* With empty --syslog */
const char *argv3[] = {prgname, prefix, mp_flag, "--syslog"};
/* With valid --syslog */
-   const char *argv4[] = {prgname, prefix, mp_flag, "--syslog", "always"};
+   const char *argv4[] = {prgname, prefix, mp_flag, "--syslog=both"};
/* With invalid --syslog */
-   const char *argv5[] = {prgname, prefix, mp_flag, "--syslog", "error"};
+   const char *argv5[] = {prgname, prefix, mp_flag, "--syslog=invalid"};
+
/* With no-sh-conf, also use no-huge to ensure this test runs on BSD */
const char *argv6[] = {prgname, "-m", DEFAULT_MEM_SIZE,
no_shconf, nosh_prefix, no_huge};
diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst 
b/doc/guides/linux_gsg/linux_eal_parameters.rst
index ea8f381391..d86f94d8a8 100644
--- a/doc/guides/linux_gsg/linux_eal_parameters.rst
+++ b/doc/guides/linux_gsg/linux_eal_parameters.rst
@@ -108,30 +108,3 @@ Memory-related options
 *   ``--match-allocations``
 
 Free hugepages back to system exactly as they were originally allocated.
-
-Other options
-~
-
-*   ``--syslog ``
-
-Set syslog facility. Valid syslog facilities are::
-
-auth
-cron
-daemon
-ftp
-kern
-lpr
-mail
-news
-syslog
-user
-uucp
-local0
-local1
-local2
-local3
-local4
-local5
-local6
-local7
diff --git a/doc/guides/prog_guide/log_lib.rst 
b/doc/guides/prog_guide/log_lib.rst
index 504eefe1d2..abaedc7212 100644
--- a/doc/guides/prog_guide/log_lib.rst
+++ b/doc/guides/prog_guide/log_lib.rst
@@ -83,6 +83,23 @@ To prefix all console messages with ISO format time the 
syntax is::
 
/path/to/app --log-timestamp=iso
 
+Log output
+~~
+
+If desired, messages can be redirected to syslog (on Linux and FreeBSD) with 
the ``--syslog``
+option. There are three possible settings for this option:
+
+*always*
+Redirect all log output to syslog.
+
+*auto*
+Use console if it is a terminal, and use syslog if is not.
+
+*both*
+Print to both console and syslog.
+
+If ``--syslog`` option is not specified, then only console (stderr) will be 
used.
+
 
 
 Using Logging APIs to Generate Log Messages
diff --git a/lib/eal/common/eal_common_options.c 
b/lib/eal/common/eal_common_options.c
index 5173835c2c..16884c5aa3 100644
--- a/lib/eal/common/eal_common_options.c
+++ b/lib/eal/common/eal_common_options.c
@@ -91,7 +91,7 @@ eal_long_options[] = {
{OPT_PROC_TYPE, 1, NULL, OPT_PROC_TYPE_NUM},
{OPT_SOCKET_MEM,1, NULL, OPT_SOCKET_MEM_NUM   },
{OPT_SOCKET_LIMIT,  1, NULL, OPT_SOCKET_LIMIT_NUM },
-   {OPT_SYSLOG,1, NULL, OPT_SYSLOG_NUM   },
+   {OPT_SYSLOG,2, NULL, OPT_SYSLOG_NUM   },
{OPT_VDEV,  1, NULL, OPT_VDEV_NUM },
{OPT_VFIO_INTR, 1, NULL, OPT_VFIO_INTR_NUM},
{OPT_VFIO_VF_TOKEN, 1, NULL, OPT_VFIO_VF_TOKEN_NUM},
diff --git a/lib/log/log.c b/lib/log/log.c
index 2dca91306e..d8974c66db 100644
--- a/lib/log/log.c
+++ b/lib/log/log.c
@@ -13,15 +13,17 @@
 #include 
 #include 
 
+#ifdef RTE_EXEC_ENV_WINDOWS
+#include 
+#else
+#include 
+#endif
+
 #include 
 #include 
 
 #include "log_internal.h"
 
-#ifdef RTE_EXEC_ENV_WINDOWS
-#include 
-#endif
-
 struct rte_log_dynamic_type {
const char *name;
uint32_t loglevel;
@@ -36,14 +38,25 @@ enum eal_log_time_format {
EAL_LOG_TIMESTAMP_ISO,
 };
 
+enum eal_log_syslog {
+   EAL_LOG_SYSLOG_NONE = 0,/* do not use syslog */
+   EAL_LOG_SYSLOG_AUTO,/* use syslog only if not a terminal */
+   EAL_LOG_SYSLOG_ALWAYS,  /* always use syslog */
+   EAL_LOG_SYSLOG_BOTH,/* log to both syslog and stderr */
+};
+
 typedef int (*log_print_t)(FILE *f, uint32_t level, const char *fmt, va_list 
ap);
 static int log_print(FILE *f, uint32_t level, const char *format, va_list ap);
 
+
 /** The rte_log structure. */
 static struct rte_logs {
uint32_t type;  /**< Bitfield with enabled logs. */
uint32_t

Re: [PATCH 0/2] introduce PM QoS interface

2024-03-25 Thread lihuisong (C)




在 2024/3/22 20:35, Morten Brørup 写道:

From: lihuisong (C) [mailto:lihuis...@huawei.com]
Sent: Friday, 22 March 2024 09.54

+Tyler, +Alan, +Wei, +Long for asking this similar feature on Windows.

在 2024/3/21 21:30, Morten Brørup 写道:

From: lihuisong (C) [mailto:lihuis...@huawei.com]
Sent: Thursday, 21 March 2024 04.04

Hi Moren,

Thanks for your revew.

在 2024/3/20 22:05, Morten Brørup 写道:

From: Huisong Li [mailto:lihuis...@huawei.com]
Sent: Wednesday, 20 March 2024 11.55

The system-wide CPU latency QoS limit has a positive impact on the idle
state selection in cpuidle governor.

Linux creates a cpu_dma_latency device under '/dev' directory to obtain

the

CPU latency QoS limit on system and send the QoS request for userspace.
Please see the PM QoS framework in the following link:
https://docs.kernel.org/power/pm_qos_interface.html?highlight=qos
This feature is supported by kernel-v2.6.25.

The deeper the idle state, the lower the power consumption, but the

longer

the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

So this series introduce PM QoS interface.

This looks like a 1:1 wrapper for a Linux kernel feature.

right

Does Windows or BSD offer something similar?

How do we know Windows or BSD support this similar feature?

Ask Windows experts or research using Google.

I download freebsd source code, I didn't find this similar feature.
They don't even support cpuidle feature(this QoS feature affects cpuilde.).
I don't find any useful about this on Windows from google.


@Tyler, @Alan, @Wei and @Long

Do you know windows support that userspace read and send CPU latency
which has an impact on deep level of CPU idle?


The DPDK power lib just work on Linux according to the meson.build under
lib/power.
If they support this features, they can open it.

The DPDK power lib currently only works on Linux, yes.
But its API should still be designed to be platform agnostic, so the

functions can be implemented on other platforms in the future.

DPDK is on track to work across multiple platforms, including Windows.
We must always consider other platforms, and not design DPDK APIs as if they

are for Linux/BSD only.
totally understand you.

Furthermore, any high-res timing should use nanoseconds, not microseconds

or

milliseconds.

I realize that the Linux kernel only uses microseconds for these APIs, but

the DPDK API should use nanoseconds.
Nanoseconds is more precise, it's good.
But DPDK API how use nanoseconds as you said the the Linux kernel only
uses microseconds for these APIs.
Kernel interface just know an integer value with microseconds unit.

One solution is to expose nanoseconds in the DPDK API, and in the Linux

specific implementation convert from/to microseconds.
If so, we have to modify the implementation interface on Linux. This
change the input/output unit about the interface.
And DPDK also has to do this based on kernel version. It is not good.
The cpuidle governor select which idle state based on the worst-case
latency of idle state.
These the worst-case latency of Cstate reported by ACPI table is in
microseconds as the section 8.4.1.1. _CST (C States) and 8.4.3.3. _LPI
(Low Power Idle States) in ACPI spec [1].
So it is probably not meaning to change this interface implementation.

OK... Since microsecond resolution is good enough for ACPI and Linux, you have 
me convinced that it's also good enough for DPDK (for this specific topic).

Thank you for the detailed reply!


For the case need PM QoS in DPDK, I think, it is better to set cpu
latency to zero to prevent service thread from the deeper the idle state.

It would defeat the purpose (i.e. not saving sufficient amounts of power) if 
the CPU cannot enter a deeper idle state.

Yes, it is not good for power.
AFAIS, PM QoS is just to decrease the influence for performance.
Anyway, if we set to zero, system can be into Cstates-0 at least.


Personally, I would think a wake-up latency of up to 10 microseconds should be 
fine for must purposes.
Default Linux timerslack is 50 microseconds, so you could also use that value.
How much CPU latency is ok. Maybe, we can give the decision to the 
application.

Linux will collect all these QoS request and use the minimum latency.
what do you think, Morten?



You might also want to add a note to the in-line documentation of the

relevant functions that the Linux implementation only uses microsecond
resolution.
[1]
https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html

Re: [PATCH 0/2] introduce PM QoS interface

2024-03-25 Thread lihuisong (C)


Hi Tyler,

在 2024/3/23 1:55, Tyler Retzlaff 写道:

On Fri, Mar 22, 2024 at 04:54:01PM +0800, lihuisong (C) wrote:

+Tyler, +Alan, +Wei, +Long for asking this similar feature on Windows.

在 2024/3/21 21:30, Morten Brørup 写道:

From: lihuisong (C) [mailto:lihuis...@huawei.com]
Sent: Thursday, 21 March 2024 04.04

Hi Moren,

Thanks for your revew.

在 2024/3/20 22:05, Morten Brørup 写道:

From: Huisong Li [mailto:lihuis...@huawei.com]
Sent: Wednesday, 20 March 2024 11.55

The system-wide CPU latency QoS limit has a positive impact on the idle
state selection in cpuidle governor.

Linux creates a cpu_dma_latency device under '/dev' directory to obtain the
CPU latency QoS limit on system and send the QoS request for userspace.
Please see the PM QoS framework in the following link:
https://docs.kernel.org/power/pm_qos_interface.html?highlight=qos
This feature is supported by kernel-v2.6.25.

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

So this series introduce PM QoS interface.

This looks like a 1:1 wrapper for a Linux kernel feature.

right

Does Windows or BSD offer something similar?

How do we know Windows or BSD support this similar feature?

Ask Windows experts or research using Google.

I download freebsd source code, I didn't find this similar feature.
They don't even support cpuidle feature(this QoS feature affects cpuilde.).
I don't find any useful about this on Windows from google.


@Tyler, @Alan, @Wei and @Long

Do you know windows support that userspace read and send CPU latency
which has an impact on deep level of CPU idle?

it is unlikely you'll find an api that let's you manage things in terms
of raw latency values as the linux knobs here do. windows more often employs
policy centric schemes to permit the system to abstract implementation detail.

powercfg is probably the closest thing you can use to tune the same
things on windows. where you select e.g. the 'performance' scheme but it
won't allow you to pick specific latency numbers.

https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/powercfg-command-line-options


Thanks for your feedback. I will take a look at this tool.




The DPDK power lib just work on Linux according to the meson.build under
lib/power.
If they support this features, they can open it.

The DPDK power lib currently only works on Linux, yes.
But its API should still be designed to be platform agnostic, so the functions 
can be implemented on other platforms in the future.

DPDK is on track to work across multiple platforms, including Windows.
We must always consider other platforms, and not design DPDK APIs as if they 
are for Linux/BSD only.

totally understand you.

since lib/power isn't built for windows at this time i don't think it's
appropriate to constrain your innovation. i do appreciate the engagement
though and would just offer general guidance that if you can design your
api with some kind of abstraction in mind that would be great and by all
means if you can figure out how to wrangle powercfg /Qh into satisfying the
api in a policy centric way it might be kind of nice.

Testing this by using powercfg on Windows creates a very challenge for me.
So I don't plan to do this on Windows. If you need, you can add it, ok?


i'll let other windows experts chime in here if they choose.

thanks!


Furthermore, any high-res timing should use nanoseconds, not microseconds or

milliseconds.

I realize that the Linux kernel only uses microseconds for these APIs, but

the DPDK API should use nanoseconds.
Nanoseconds is more precise, it's good.
But DPDK API how use nanoseconds as you said the the Linux kernel only
uses microseconds for these APIs.
Kernel interface just know an integer value with microseconds unit.

One solution is to expose nanoseconds in the DPDK API, and in the Linux 
specific implementation convert from/to microseconds.

If so, we have to modify the implementation interface on Linux. This
change the input/output unit about the interface.
And DPDK also has to do this based on kernel version. It is not good.
The cpuidle governor select which idle state based on the worst-case
latency of idle state.
These the worst-case latency of Cstate reported by ACPI table is in
microseconds as the section 8.4.1.1. _CST (C States) and 8.4.3.3.
_LPI (Low Power Idle States) in ACPI spec [1].
So it is probably not meaning to change this interface implementation.

For the case need PM QoS in DPDK, I think, it is better to set cpu
latency to zero to prevent service thread from the deeper the idle
state.

You might also want to add a note to the in-line documentation of the relevant 
functions that the Linux implementation only uses microsecond resolution.


[1] https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html

.

Re: [PATCH v2 1/6] ethdev: support setting lanes

2024-03-25 Thread Ajit Khaparde

On Mon, Mar 25, 2024 at 6:42 PM lihuisong (C)  wrote:
>
>
> 在 2024/3/25 17:30, Thomas Monjalon 写道:
> > 25/03/2024 07:24, huangdengdui:
> >> On 2024/3/22 21:58, Thomas Monjalon wrote:
> >>> 22/03/2024 08:09, Dengdui Huang:
>  -#define RTE_ETH_LINK_SPEED_10G RTE_BIT32(8)  /**< 10 Gbps */
>  -#define RTE_ETH_LINK_SPEED_20G RTE_BIT32(9)  /**< 20 Gbps */
>  -#define RTE_ETH_LINK_SPEED_25G RTE_BIT32(10) /**< 25 Gbps */
>  -#define RTE_ETH_LINK_SPEED_40G RTE_BIT32(11) /**< 40 Gbps */
>  -#define RTE_ETH_LINK_SPEED_50G RTE_BIT32(12) /**< 50 Gbps */
>  -#define RTE_ETH_LINK_SPEED_56G RTE_BIT32(13) /**< 56 Gbps */
>  -#define RTE_ETH_LINK_SPEED_100GRTE_BIT32(14) /**< 100 Gbps */
>  -#define RTE_ETH_LINK_SPEED_200GRTE_BIT32(15) /**< 200 Gbps */
>  -#define RTE_ETH_LINK_SPEED_400GRTE_BIT32(16) /**< 400 Gbps */
>  +#define RTE_ETH_LINK_SPEED_10GRTE_BIT32(8)  /**< 10 Gbps */
>  +#define RTE_ETH_LINK_SPEED_20GRTE_BIT32(9)  /**< 20 Gbps 
>  2lanes */
>  +#define RTE_ETH_LINK_SPEED_25GRTE_BIT32(10) /**< 25 Gbps */
>  +#define RTE_ETH_LINK_SPEED_40GRTE_BIT32(11) /**< 40 Gbps 
>  4lanes */
>  +#define RTE_ETH_LINK_SPEED_50GRTE_BIT32(12) /**< 50 Gbps */
>  +#define RTE_ETH_LINK_SPEED_56GRTE_BIT32(13) /**< 56 Gbps 
>  4lanes */
>  +#define RTE_ETH_LINK_SPEED_100G   RTE_BIT32(14) /**< 100 Gbps */
>  +#define RTE_ETH_LINK_SPEED_200G   RTE_BIT32(15) /**< 200 Gbps 
>  4lanes */
>  +#define RTE_ETH_LINK_SPEED_400G   RTE_BIT32(16) /**< 400 Gbps 
>  4lanes */
>  +#define RTE_ETH_LINK_SPEED_10G_4LANES RTE_BIT32(17)  /**< 10 Gbps 
>  4lanes */
>  +#define RTE_ETH_LINK_SPEED_50G_2LANES RTE_BIT32(18) /**< 50 Gbps 2 
>  lanes */
>  +#define RTE_ETH_LINK_SPEED_100G_2LANESRTE_BIT32(19) /**< 100 Gbps 2 
>  lanes */
>  +#define RTE_ETH_LINK_SPEED_100G_4LANESRTE_BIT32(20) /**< 100 Gbps 
>  4lanes */
>  +#define RTE_ETH_LINK_SPEED_200G_2LANESRTE_BIT32(21) /**< 200 Gbps 
>  2lanes */
>  +#define RTE_ETH_LINK_SPEED_400G_8LANESRTE_BIT32(22) /**< 400 Gbps 
>  8lanes */
> >>> I don't think it is a good idea to make this more complex.
> >>> It brings nothing as far as I can see, compared to having speed and lanes 
> >>> separated.
> >>> Can we have lanes information a separate value? no need for bitmask.
> >>>
> >> Hi,Thomas, Ajit, roretzla, damodharam
> >>
> >> I also considered the option at the beginning of the design.
> >> But this option is not used due to the following reasons:
> >> 1. For the user, ethtool couples speed and lanes.
> >> The result of querying the NIC capability is as follows:
> >> Supported link modes:
> >>  10baseSR4/Full
> >>  10baseSR2/Full
So if DPDK provides a get lanes API, it should be able to tell the
number of lanes supported.
After that, the user should be able to pick one of the supported lane counts?

> >> The NIC capability is configured as follows:
> >> ethtool -s eth1 speed 10 lanes 4 autoneg off
> >> ethtool -s eth1 speed 10 lanes 2 autoneg off
> >>
> >> Therefore, users are more accustomed to the coupling of speed and lanes.
> >>
> >> 2. For the PHY, When the physical layer capability is configured through 
> >> the MDIO,
> >> the speed and lanes are also coupled.
> >> For example:
> >> Table 45–7—PMA/PMD control 2 register bit definitions[1]
> >> PMA/PMD type selection
> >>  1 0 0 1 0 1 0 = 100GBASE-SR2 PMA/PMD
> >>  0 1 0 1 1 1 1 = 100GBASE-SR4 PMA/PMD
> >>
> >> Therefore, coupling speeds and lanes is easier to understand.
> >> And it is easier for the driver to report the support lanes.
> >>
> >> In addition, the code implementation is compatible with the old version.
> >> When the driver does not support the lanes setting, the code does not need 
> >> to be modified.
> >>
> >> So I think the speed and lanes coupling is better.
> > I don't think so.
> > You are mixing hardware implementation, user tool, and API.
> > Having a separate and simple API is cleaner and not more difficult to handle
> > in some get/set style functions.
> Having a separate and simple API is cleaner. It's good.
> But supported lane capabilities have a lot to do with the specified
> speed. This is determined by releated specification.
> If we add a separate API for speed lanes, it probably is hard to check
> the validity of the configuration for speed and lanes.
> And the setting lane API sepparated from speed is not good for
> uniforming all PMD's behavior in ethdev layer.
>
> The patch[1] is an example for this separate API.
> I think it is not very good. It cannot tell user and PMD the follow points:
> 1) user don't know what lanes should or can be set for a specified speed
> on one NIC.
> 2) how should PMD do for a supported lanes in their HW?
>
> Anyway, if we a

64 matches

Mail list logo