FreeBSD 13.1 changes

2022-05-29 Thread Lewis Donzis
Apparently FreeBSD 13.1 changed the syntax of the CPUSET macros, so DPDK no 
longer compiles. 

For example, here's one definition on FreeBSD 13.0 and prior: 

   CPU_OR(cpuset_t *dst, cpuset_t *src); 

and here it is in FreeBSD 13.1: 

   CPU_OR(cpuset_t *dst, cpuset_t *src1, cpuset_t *src2); 

I've modified lib/eal/freebsd/include/rte_os.h to handle both old and new 
versions of FreeBSD. 

I'd like to provide the updated file, if someone would be willing to review and 
commit, please? 

Thanks, 
lew


Re: FreeBSD 13.1 changes

2022-05-30 Thread Lewis Donzis



- On May 30, 2022, at 3:09 AM, Bruce Richardson bruce.richard...@intel.com 
wrote:

> On Sun, May 29, 2022 at 06:36:21AM -0500, Lewis Donzis wrote:
>> Apparently FreeBSD 13.1 changed the syntax of the CPUSET macros, so DPDK no
>> longer compiles.
>> 
>> For example, here's one definition on FreeBSD 13.0 and prior:
>> 
>>CPU_OR(cpuset_t *dst, cpuset_t *src);
>> 
>> and here it is in FreeBSD 13.1:
>> 
>>CPU_OR(cpuset_t *dst, cpuset_t *src1, cpuset_t *src2);
>> 
>> I've modified lib/eal/freebsd/include/rte_os.h to handle both old and new
>> versions of FreeBSD.
>> 
>> I'd like to provide the updated file, if someone would be willing to review 
>> and
>> commit, please?
>> 
> Can you please retest with the latest DPDK code in git. This should be
> fixed there now. See patch [1]. The fix should make its way into the LTS
> backports over time too.
> 
> /Bruce
> 
> [1]
> http://patches.dpdk.org/project/dpdk/patch/20220520181050.55654-1-bruce.richard...@intel.com/

Yes, it definitely compiles now, thank you!

Something is still not working properly, I think in the ixl driver.  But I 
tried the same test on FreeBSD 13.0 and it also doesn't work, so I don't think 
that's related to this change.  The problem appears to be related to forcing 
"wait" true on FreeBSD in ixgbe_dev_link_update_share().  Should I post that as 
a separate thread?


Hang in ixgbe_dev_link_update_share()

2022-05-30 Thread Lewis Donzis
Using DPDK 21.11.1 on FreeBSD 13.1, calling rte_eth_link_get_nowait() appears 
to hang waiting for the link to come up on an XL710 or 82599 based NIC.

This call eventually makes its way to ixgbe_dev_link_update_share() with 
wait_to_complete set to false.

Inside that function, there is this code:

/* BSD has no interrupt mechanism, so force NIC status synchronization. */ 
#ifdef RTE_EXEC_ENV_FREEBSD 
wait = 1; 
#endif

This then calls ixgbe_check_link() with wait == true, which then calls 
ixgbe_check_mac_link_generic(), but the parameter is now called 
"link_up_wait_to_complete", and it loops forever waiting for the link to be up, 
with a 100ms delay between polls.

Perhaps our understanding is incorrect, but we're using 
rte_eth_link_get_nowait() because we can't tolerate any significant delay in 
the function call, but we certainly don't want to wait for the link to be up.  
We're just trying to find out if it's up or down.  Empirically, removing the 
"wait = 1" restores normal operation.

Thanks,
lew


Re: vmxnet3 no longer functional on DPDK 21.11

2022-06-03 Thread Lewis Donzis
Hi, all.

Resurrecting this thread from six months ago, I apologize for not having more 
time to dig into it, but in light of recent findings, I see numerous other 
drivers and other parts of the code that have comments to the effect that 
"FreeBSD doesn't support interrupts" and they effectively #ifdef out the 
attempt.

Could this be as simple as needing to do the same in vmxnet3?  Empirically, 
ignoring the error from rte_intr_enable() allows the driver to work normally, 
for what that's worth.

Thanks,
lew

- On Dec 6, 2021, at 6:08 AM, Konstantin Ananyev 
konstantin.anan...@intel.com wrote:

>> -Original Message-
>> From: Richardson, Bruce 
>> Sent: Monday, December 6, 2021 9:17 AM
>> To: Lewis Donzis 
>> Cc: dev ; Wang, Yong ; Ananyev, Konstantin
>> 
>> Subject: Re: vmxnet3 no longer functional on DPDK 21.11
>> 
>> On Sun, Dec 05, 2021 at 07:52:33PM -0600, Lewis Donzis wrote:
>> >
>> >
>> > - On Nov 30, 2021, at 7:42 AM, Bruce Richardson 
>> > bruce.richard...@intel.com
>> > wrote:
>> >
>> > > On Mon, Nov 29, 2021 at 02:45:15PM -0600, Lewis Donzis wrote:
>> > >>Hello.
>> > >>We just upgraded from 21.08 to 21.11 and it's rather astounding the
>> > >>number of incompatible changes in three months.  Not a big deal, just
>> > >>kind of a surprise, that's all.
>> > >>Anyway, the problem is that the vmxnet3 driver is no longer 
>> > >> functional
>> > >>on FreeBSD.
>> > >>In drivers/net/vmxnet3/vmxnet3_ethdev.c, vmxnet3_dev_start() gets an
>> > >>error calling rte_intr_enable().  So it logs "interrupt enable 
>> > >> failed"
>> > >>and returns an error.
>> > >>In lib/eal/freebsd/eal_interrupts.c, rte_intr_enable() is returning 
>> > >> an
>> > >>error because rte_intr_dev_fd_get(intr_handle) is returning -1.
>> > >>I don't see how that could ever return anything other than -1 since 
>> > >> it
>> > >>appears that there is no code that ever calls rte_intr_dev_fd_set()
>> > >>with a value other than -1 on FreeBSD.  Also weird to me is that even
>> > >>if it didn't get an error, the switch statement that follows looks 
>> > >> like
>> > >>it will return an error in every case.
>> > >>Nonetheless, it worked in 21.08, and I can't quite see why the
>> > >>difference, so I must be missing something.
>> > >>For the moment, I just commented the "return -EIO" in 
>> > >> vmxnet3_ethdev.c,
>> > >>and it's now working again, but that's obviously not the correct
>> > >>solution.
>> > >>Can someone who's knowledgable about this mechanism perhaps explain a
>> > >>little bit about what's going on?  I'll be happy to help 
>> > >> troubleshoot.
>> > >>It seems like it must be something simple, but I just don't see it 
>> > >> yet.
>> > >
>> > > Hi
>> > >
>> > > if you have the chance, it would be useful if you could use "git bisect" 
>> > > to
>> > > identify the commit in 21.11 that broke this driver. Looking through the
>> > > logs for 21.11 I can't identify any particular likely-looking commit, so
>> > > bisect is likely a good way to start looking into this.
>> > >
>> > > Regards,
>> > > /Bruce
>> >
>> > Hi, Bruce.  git bisect is very time-consuming and very cool!
>> >
>> > I went back to 21.08, about 1100 commits, and worked through the process, 
>> > but
>> > then I realized that I had forgotten to run ninja on one of
>> the steps, so I did it again.
>> >
>> > I also re-checked it after the bisect, just to make sure that
>> > c87d435a4d79739c0cec2ed280b94b41cb908af7 is good, and
>> 7a0935239b9eb817c65c03554a9954ddb8ea5044 is bad.
>> >
>> > Thanks,
>> > lew
>> >
>> 
>> Many thanks for taking the time to do this. Adding Konstantin to thread as
>> author of the commit you identified. Konstantin, any thoughts on this
>> issue?
> 
> Hmm, that's looks really strange to me.
> So to clarify, it fails at:
> static int
> vmxnet3_dev_start(struct rte_eth_dev *dev)
> {
>   ...
> line 1695:
>   if (rte_intr_enable(dev->intr_handle) < 0) {
>PMD_INIT_LOG(ERR, "interrupt enable failed");
>return -EIO;
>}
> 
> Right?
> 
> The strange thing here is that 7a0935239b9e
> doesn't change dev_start or rte_intr code in any way.
> All it does - change rte_eth_rx_burst/rte_eth_tx_burst and other fast-path
> functions.
> Anyway, if git blames that commit, let's try to figure out what is going on.
> Unfortunately, I don't have freebsd with vmxnet3, so will need to rely on your
> help here.
> As the first thing can you try to run testpmd build with last good commit
> (c87d435a4d79)
> and then testpmd build with bad commit applied and collect for both cases:
> - contents of 'struct rte_eth_dev' and ' rte_eth_dev->intr_handle' for
>  your vmxnet3 port
> - debug log output (--log-level=eal,debug --log-level=pmd,debug)
> 
> Konstantin


vmxnet3 no longer functional on DPDK 21.11

2021-11-29 Thread Lewis Donzis
Hello. 

We just upgraded from 21.08 to 21.11 and it's rather astounding the number of 
incompatible changes in three months. Not a big deal, just kind of a surprise, 
that's all. 

Anyway, the problem is that the vmxnet3 driver is no longer functional on 
FreeBSD. 

In drivers/net/vmxnet3/vmxnet3_ethdev.c, vmxnet3_dev_start() gets an error 
calling rte_intr_enable(). So it logs "interrupt enable failed" and returns an 
error. 

In lib/eal/freebsd/eal_interrupts.c, rte_intr_enable() is returning an error 
because rte_intr_dev_fd_get(intr_handle) is returning -1. 

I don't see how that could ever return anything other than -1 since it appears 
that there is no code that ever calls rte_intr_dev_fd_set() with a value other 
than -1 on FreeBSD. Also weird to me is that even if it didn't get an error, 
the switch statement that follows looks like it will return an error in every 
case. 

Nonetheless, it worked in 21.08, and I can't quite see why the difference, so I 
must be missing something. 

For the moment, I just commented the "return -EIO" in vmxnet3_ethdev.c, and 
it's now working again, but that's obviously not the correct solution. 

Can someone who's knowledgable about this mechanism perhaps explain a little 
bit about what's going on? I'll be happy to help troubleshoot. It seems like it 
must be something simple, but I just don't see it yet. 

Thanks, 
lew 


Re: vmxnet3 no longer functional on DPDK 21.11

2021-12-05 Thread Lewis Donzis



- On Nov 30, 2021, at 7:42 AM, Bruce Richardson bruce.richard...@intel.com 
wrote:

> On Mon, Nov 29, 2021 at 02:45:15PM -0600, Lewis Donzis wrote:
>>Hello.
>>We just upgraded from 21.08 to 21.11 and it's rather astounding the
>>number of incompatible changes in three months.  Not a big deal, just
>>kind of a surprise, that's all.
>>Anyway, the problem is that the vmxnet3 driver is no longer functional
>>on FreeBSD.
>>In drivers/net/vmxnet3/vmxnet3_ethdev.c, vmxnet3_dev_start() gets an
>>error calling rte_intr_enable().  So it logs "interrupt enable failed"
>>and returns an error.
>>In lib/eal/freebsd/eal_interrupts.c, rte_intr_enable() is returning an
>>error because rte_intr_dev_fd_get(intr_handle) is returning -1.
>>I don't see how that could ever return anything other than -1 since it
>>appears that there is no code that ever calls rte_intr_dev_fd_set()
>>with a value other than -1 on FreeBSD.  Also weird to me is that even
>>if it didn't get an error, the switch statement that follows looks like
>>it will return an error in every case.
>>Nonetheless, it worked in 21.08, and I can't quite see why the
>>difference, so I must be missing something.
>>For the moment, I just commented the "return -EIO" in vmxnet3_ethdev.c,
>>and it's now working again, but that's obviously not the correct
>>solution.
>>Can someone who's knowledgable about this mechanism perhaps explain a
>>little bit about what's going on?  I'll be happy to help troubleshoot.
>>It seems like it must be something simple, but I just don't see it yet.
> 
> Hi
> 
> if you have the chance, it would be useful if you could use "git bisect" to
> identify the commit in 21.11 that broke this driver. Looking through the
> logs for 21.11 I can't identify any particular likely-looking commit, so
> bisect is likely a good way to start looking into this.
> 
> Regards,
> /Bruce

Hi, Bruce.  git bisect is very time-consuming and very cool!

I went back to 21.08, about 1100 commits, and worked through the process, but 
then I realized that I had forgotten to run ninja on one of the steps, so I did 
it again.

I also re-checked it after the bisect, just to make sure that 
c87d435a4d79739c0cec2ed280b94b41cb908af7 is good, and 
7a0935239b9eb817c65c03554a9954ddb8ea5044 is bad.

Thanks,
lew


Here's the result:

root@fbdev:/usr/local/share/dpdk-git # git bisect start
root@fbdev:/usr/local/share/dpdk-git # git bisect bad
root@fbdev:/usr/local/share/dpdk-git # git bisect good 
74bd4072996e64b0051d24d8d641554d225db196
Bisecting: 556 revisions left to test after this (roughly 9 steps)
[e2a289a788c0a128a15bc0f1099af7c031201ac5] net/ngbe: add mailbox process 
operations
root@fbdev:/usr/local/share/dpdk-git # git bisect bad
Bisecting: 277 revisions left to test after this (roughly 8 steps)
[5906be5af6570db8b70b307c96aace0b096d1a2c] ethdev: fix ID spelling in comments 
and log messages
root@fbdev:/usr/local/share/dpdk-git # git bisect bad
Bisecting: 138 revisions left to test after this (roughly 7 steps)
[a7c236b894a848c7bb9afb773a7e3c13615abaa8] net/cnxk: support meter ops get
root@fbdev:/usr/local/share/dpdk-git # git bisect bad
Bisecting: 69 revisions left to test after this (roughly 6 steps)
[14fc81aed73842d976dd19a93ca47e22d61c1759] ethdev: update modify field flow 
action
root@fbdev:/usr/local/share/dpdk-git # git bisect bad
Bisecting: 34 revisions left to test after this (roughly 5 steps)
[cdea571becb4dabf9962455f671af0c99594e380] common/sfc_efx/base: add flag to use 
Rx prefix user flag
root@fbdev:/usr/local/share/dpdk-git # git bisect good
Bisecting: 17 revisions left to test after this (roughly 4 steps)
[7a0935239b9eb817c65c03554a9954ddb8ea5044] ethdev: make fast-path functions to 
use new flat array
root@fbdev:/usr/local/share/dpdk-git # git bisect bad
Bisecting: 8 revisions left to test after this (roughly 3 steps)
[012bf708c20f4b23d055717e28f8de74887113d8] net/sfc: support group flows in 
tunnel offload
root@fbdev:/usr/local/share/dpdk-git # git bisect good
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[9df2d8f5cc9653d6413cb2240c067ea455ab7c3c] net/sfc: support counters in tunnel 
offload jump rules
root@fbdev:/usr/local/share/dpdk-git # git bisect good
Bisecting: 2 revisions left to test after this (roughly 1 step)
[c024496ae8c8c075b0d0a3b43119475787b24b45] ethdev: allocate max space for 
internal queue array
root@fbdev:/usr/local/share/dpdk-git # git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[c87d435a4d79739c0cec2ed280b94b41cb908af7] ethdev: copy fast-path API into 
separate structure
root@fbdev:/usr/local/share/dpdk-git # git bisec

Re: vmxnet3 no longer functional on DPDK 21.11

2021-12-06 Thread Lewis Donzis


- On Dec 6, 2021, at 6:08 AM, Ananyev, Konstantin 
konstantin.anan...@intel.com wrote:

> So to clarify, it fails at:
> static int
> vmxnet3_dev_start(struct rte_eth_dev *dev)
> {
>   ...
> line 1695:
>   if (rte_intr_enable(dev->intr_handle) < 0) {
>PMD_INIT_LOG(ERR, "interrupt enable failed");
>return -EIO;
>}
> 
> Right?

That's right.  And further, the failure inside of rte_intr_enable() is the test 
on rte_intr_dev_fd_get():

int
rte_intr_enable(const struct rte_intr_handle *intr_handle)
{
...
if (rte_intr_fd_get(intr_handle) < 0 ||
rte_intr_dev_fd_get(intr_handle) < 0) {
rc = -1;
goto out;
}

switch (rte_intr_type_get(intr_handle)) {
/* not used at this moment */
case RTE_INTR_HANDLE_ALARM:
rc = -1;
break;
/* not used at this moment */
case RTE_INTR_HANDLE_DEV_EVENT:
rc = -1;
break;
/* unknown handle type */
default:
RTE_LOG(ERR, EAL, "Unknown handle type of fd %d\n",
rte_intr_fd_get(intr_handle));
rc = -1;
break;
}

out:
rte_eal_trace_intr_enable(intr_handle, rc);
return rc;
}


Two things about this code that confuse me:

1. rte_intr_dev_fd_get(intr_handle) just returns the value of 
intr_handle->dev_fd, which is never set to anything other than -1 in any code I 
can find.

2. Even if it made it past that "if" statement, I don't see how the switch that 
follows ever *doesn't* return an error, i.e., every single case results in an 
error!  Even if it got past rte_intr_dev_fd_get(), the interrupt type is 
neither of the first two cases, so it would presumably execute the default case.


> The strange thing here is that 7a0935239b9e
> doesn't change dev_start or rte_intr code in any way.

I agree, and also don't see any way the diffs between those two commits could 
cause this.


> Anyway, if git blames that commit, let's try to figure out what is going on.
> Unfortunately, I don't have freebsd with vmxnet3, so will need to rely on your
> help here.

Sure no problem.  If it helps, we could arrange for remote access to a FreeBSD 
VM, but I suspect this is going to be something simple, so let's see if we can 
get it figured out.


> As the first thing can you try to run testpmd build with last good commit
> (c87d435a4d79)
> and then testpmd build with bad commit applied and collect for both cases:
> - contents of 'struct rte_eth_dev' and ' rte_eth_dev->intr_handle' for
>  your vmxnet3 port
> - debug log output (--log-level=eal,debug --log-level=pmd,debug)

Ok, we'll check it out.

Thanks,
lew


ixgbe link status semi-broken on FreeBSD

2024-02-14 Thread Lewis Donzis
I keep meaning to mention this because we've been patching the ixgbe driver for 
a while...

In ixgbe_ethdev.c, function ixgbe_dev_link_update_share(), we find the 
following: 


/* BSD has no interrupt mechanism, so force NIC status synchronization. */ 
#ifdef RTE_EXEC_ENV_FREEBSD
wait = 1;
#endif


Which changes the behavior of rte_eth_link_get_nowait() and causes it to wait.  
For a port that does not have link up, this takes approximately 12-14 seconds 
before it returns, which is highly undesirable. 

However, after disabling this one line of code, it still appears to work 
properly. It returns the correct link status on ports that are up or down, but 
does so without any delay. 

So I'm wondering, is this still necessary? No doubt, FreeBSD doesn't support 
interrupts, but why does that require that we wait in this case?  Empirically, 
it works much better without that line of code.


Re: vmxnet3 no longer functional on DPDK 21.11

2024-01-08 Thread Lewis Donzis
Good morning.

I just wanted to mention that this problem still persists in 22.11.3, and we 
still have to patch the vmxnet3 driver every time we upgrade.

As mentioned before, ixgbe_ethdev.c is an example of a driver that ifdef's out 
the attempt to use interrupts on FreeBSD.

Thanks,
lew


- On Jun 3, 2022, at 8:19 AM, Lewis Donzis l...@perftech.com wrote:

> Hi, all.
> 
> Resurrecting this thread from six months ago, I apologize for not having more
> time to dig into it, but in light of recent findings, I see numerous other
> drivers and other parts of the code that have comments to the effect that
> "FreeBSD doesn't support interrupts" and they effectively #ifdef out the
> attempt.
> 
> Could this be as simple as needing to do the same in vmxnet3?  Empirically,
> ignoring the error from rte_intr_enable() allows the driver to work normally,
> for what that's worth.
> 
> Thanks,
> lew
> 
> - On Dec 6, 2021, at 6:08 AM, Konstantin Ananyev
> konstantin.anan...@intel.com wrote:
> 
>>> -Original Message-----
>>> From: Richardson, Bruce 
>>> Sent: Monday, December 6, 2021 9:17 AM
>>> To: Lewis Donzis 
>>> Cc: dev ; Wang, Yong ; Ananyev, 
>>> Konstantin
>>> 
>>> Subject: Re: vmxnet3 no longer functional on DPDK 21.11
>>> 
>>> On Sun, Dec 05, 2021 at 07:52:33PM -0600, Lewis Donzis wrote:
>>> >
>>> >
>>> > - On Nov 30, 2021, at 7:42 AM, Bruce Richardson 
>>> > bruce.richard...@intel.com
>>> > wrote:
>>> >
>>> > > On Mon, Nov 29, 2021 at 02:45:15PM -0600, Lewis Donzis wrote:
>>> > >>Hello.
>>> > >>We just upgraded from 21.08 to 21.11 and it's rather astounding the
>>> > >>number of incompatible changes in three months.  Not a big deal, 
>>> > >> just
>>> > >>kind of a surprise, that's all.
>>> > >>Anyway, the problem is that the vmxnet3 driver is no longer 
>>> > >> functional
>>> > >>on FreeBSD.
>>> > >>In drivers/net/vmxnet3/vmxnet3_ethdev.c, vmxnet3_dev_start() gets an
>>> > >>error calling rte_intr_enable().  So it logs "interrupt enable 
>>> > >> failed"
>>> > >>and returns an error.
>>> > >>In lib/eal/freebsd/eal_interrupts.c, rte_intr_enable() is returning 
>>> > >> an
>>> > >>error because rte_intr_dev_fd_get(intr_handle) is returning -1.
>>> > >>I don't see how that could ever return anything other than -1 since 
>>> > >> it
>>> > >>appears that there is no code that ever calls rte_intr_dev_fd_set()
>>> > >>with a value other than -1 on FreeBSD.  Also weird to me is that 
>>> > >> even
>>> > >>if it didn't get an error, the switch statement that follows looks 
>>> > >> like
>>> > >>it will return an error in every case.
>>> > >>Nonetheless, it worked in 21.08, and I can't quite see why the
>>> > >>difference, so I must be missing something.
>>> > >>For the moment, I just commented the "return -EIO" in 
>>> > >> vmxnet3_ethdev.c,
>>> > >>and it's now working again, but that's obviously not the correct
>>> > >>solution.
>>> > >>Can someone who's knowledgable about this mechanism perhaps explain 
>>> > >> a
>>> > >>little bit about what's going on?  I'll be happy to help 
>>> > >> troubleshoot.
>>> > >>It seems like it must be something simple, but I just don't see it 
>>> > >> yet.
>>> > >
>>> > > Hi
>>> > >
>>> > > if you have the chance, it would be useful if you could use "git 
>>> > > bisect" to
>>> > > identify the commit in 21.11 that broke this driver. Looking through the
>>> > > logs for 21.11 I can't identify any particular likely-looking commit, so
>>> > > bisect is likely a good way to start looking into this.
>>> > >
>>> > > Regards,
>>> > > /Bruce
>>> >
>>> > Hi, Bruce.  git bisect is very time-consuming and very cool!
>>> >
>>> > I went back to 21.08, about 1100 commits, and worked through the process, 
>>>

Re: vmxnet3 no longer functional on DPDK 21.11

2024-01-10 Thread Lewis Donzis
Hi, Bruce.

I'm even less familiar with it, but we do quite a lot of testing using VMs, so 
it's been quite handy.

Your patch seems very reasonable, however it also produces a warning:

../drivers/net/vmxnet3/vmxnet3_ethdev.c:264:1: warning: unused function 
'vmxnet3_enable_all_intrs' [-Wunused-function]

Adding an #ifndef around vmxnet3_enable_all_intrs() eliminates that warning.

Please pardon the uninformed view, but we've been using FreeBSD + DPDK for 
nearly a decade, and I thought the whole point was to avoid using interrupts.  
We have no need or desire for them in our applications, so we just hope the 
sprinkling of interrupt support code throughout the drivers doesn't cause any 
harm.  But I also realize we're probably in the minority on this.

Thanks for the help,
lew



> I'm not at all familiar with the vmxnet3 driver, so apologies for the lack
> of response up till now. Does something like the below simple fix work for
> you? If so, I'm happy enough to submit as a patch for upstream merge and
> then backport.
> 
> /Bruce
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> index e49191718a..d088b42d35 100644
> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> @@ -1129,6 +1129,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
>/* Setting proper Rx Mode and issue Rx Mode Update command */
>vmxnet3_dev_set_rxmode(hw, VMXNET3_RXM_UCAST | VMXNET3_RXM_BCAST, 1);
> 
> +#ifndef RTE_EXEC_ENV_FREEBSD
>/* Setup interrupt callback  */
>rte_intr_callback_register(dev->intr_handle,
>   vmxnet3_interrupt_handler, dev);
> @@ -1140,6 +1141,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
> 
>/* enable all intrs */
>vmxnet3_enable_all_intrs(hw);
> +#endif
> 
> vmxnet3_process_events(dev);


Re: vmxnet3 no longer functional on DPDK 21.11

2024-01-10 Thread Lewis Donzis



- On Jan 9, 2024, at 8:28 AM, Bruce Richardson bruce.richard...@intel.com 
wrote:

> On Tue, Jan 09, 2024 at 07:46:47AM -0600, Lewis Donzis wrote:
>> Hi, Bruce.
>> 
>> I'm even less familiar with it, but we do quite a lot of testing using VMs, 
>> so
>> it's been quite handy.
>> 
>> Your patch seems very reasonable, however it also produces a warning:
>> 
>> ../drivers/net/vmxnet3/vmxnet3_ethdev.c:264:1: warning: unused function
>> 'vmxnet3_enable_all_intrs' [-Wunused-function]
>> 
>> Adding an #ifndef around vmxnet3_enable_all_intrs() eliminates that warning.
> 
> Right, I should have compile-tested on FreeBSD myself, before sending the
> suggestion. Patch has now been submitted. Please test and ack if the fix
> works for your use-cases, thanks.

I compiled it and ran it just now and it appears to work just fine.  Thanks 
very much for submitting.

> In general, yes we try and avoid interrupts on the data-path or fast-path
> and use polling. However, for some use-cases where traffic levels are low,
> interrupts may make sense to save power for fast-path. Even if not,
> interrupts are useful for things like error conditions or for monitoring
> link-status changes (LSC). Unfortunately, we don't have any interrupt
> support on BSD, so fixes like this are necessary.

That makes sense.  Makes me wonder why there's no interrupt support on BSD, 
i.e., maybe it's better to fix that than to have to fix "avoiding it" in the 
drivers?

I kind of feel like we're a bit orphaned in the FreeBSD world.  I don't know 
how many others are using BSD, but it seems like we're in a relatively 
less-supported environment.

Thanks again,
lew


Re: [PATCH] net/vmxnet3: fix use of interrupts on FreeBSD

2024-01-10 Thread Lewis Donzis



- On Jan 9, 2024, at 8:23 AM, Bruce Richardson bruce.richard...@intel.com 
wrote:

> DPDK does not support interrupts on FreeBSD, so the vmxnet3 driver
> returns error when enabling interrupts as it initializes. We can fix
> this by #ifdef'ing out the interrupt calls when building for FreeBSD,
> allowing the driver to initialize correctly.
> 
> Fixes: 046f11619567 ("net/vmxnet3: support MSI-X interrupt")
> 
> Reported-by: Lewis Donzis 
Tested-by: Lewis Donzis 
> Signed-off-by: Bruce Richardson 
> ---
> drivers/net/vmxnet3/vmxnet3_ethdev.c | 4 
> 1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> index e49191718a..7032f0e324 100644
> --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c
> +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c
> @@ -257,6 +257,7 @@ vmxnet3_disable_all_intrs(struct vmxnet3_hw *hw)
>   vmxnet3_disable_intr(hw, i);
> }
> 
> +#ifndef RTE_EXEC_ENV_FREEBSD
> /*
>  * Enable all intrs used by the device
>  */
> @@ -280,6 +281,7 @@ vmxnet3_enable_all_intrs(struct vmxnet3_hw *hw)
>   vmxnet3_enable_intr(hw, i);
>   }
> }
> +#endif
> 
> /*
>  * Gets tx data ring descriptor size.
> @@ -1129,6 +1131,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
>   /* Setting proper Rx Mode and issue Rx Mode Update command */
>   vmxnet3_dev_set_rxmode(hw, VMXNET3_RXM_UCAST | VMXNET3_RXM_BCAST, 1);
> 
> +#ifndef RTE_EXEC_ENV_FREEBSD
>   /* Setup interrupt callback  */
>   rte_intr_callback_register(dev->intr_handle,
>  vmxnet3_interrupt_handler, dev);
> @@ -1140,6 +1143,7 @@ vmxnet3_dev_start(struct rte_eth_dev *dev)
> 
>   /* enable all intrs */
>   vmxnet3_enable_all_intrs(hw);
> +#endif
> 
>   vmxnet3_process_events(dev);
> 
> --
> 2.42.0


Re: vmxnet3 no longer functional on DPDK 21.11

2024-01-11 Thread Lewis Donzis



- On Jan 9, 2024, at 5:55 PM, Stephen Hemminger step...@networkplumber.org 
wrote:
> Probably need to go further with this.
> - what about unreigster in vmxnet3_dev_stop
> - vmxnet3_interrupt_handler is then dead code, should it be #ifdef guarded?
> - and vmxnet3_dev_rx_queue_intr_enable/disable
> - and vmxnet3_enable_intr
> - and vmxnet3_configure_msix
>  - and checks for rte_eth_intr_conf bits? in configure

I wondered the same thing, but checking other drivers, there appears to be 
little provision for this.  Just as an example, ixgbe has a FREEBSD ifdef, but 
it doesn't bother with trying to avoid all interrupt code.  And hardly any 
other network drivers have FREEBSD ifdefs at all and just ignore errors from 
registering interrupt callbacks.

In addition, the FreeBSD EAL interrupt code appears to have stubs that return 
"false" for calls that question whether interrupts are enabled, e.g., 
rte_intr_dp_is_en(), or an error for calls that attempt to use interrupts.

It's interesting that it works as well as it does because most drivers appear 
to be enabling interrupts in the NIC hardware.  But if the interrupt remains 
masked in the APIC due to lack of support in freebsd/eal_interrupts.c, then 
perhaps it doesn't matter?

So while your suggestions seem quite well-founded, I don't see any equivalent 
provision in any other driver.  Apparently there is no harm in attempting to 
use interrupts on FreeBSD since they will never trigger anyway?


Re: [PATCH] net/vmxnet3: fix use of interrupts on FreeBSD

2024-01-24 Thread Lewis Donzis
Did this get checked in?

Just curious because I don't see it in 22.11.4 that was released yesterday, so 
wasn't sure when it would show up.

Thanks,
lew

- On Jan 11, 2024, at 6:03 AM, Ferruh Yigit ferruh.yi...@amd.com wrote:

> On 1/9/2024 4:00 PM, Lewis Donzis wrote:
>> 
>> 
>> - On Jan 9, 2024, at 8:23 AM, Bruce Richardson bruce.richard...@intel.com
>> wrote:
>> 
>>> DPDK does not support interrupts on FreeBSD, so the vmxnet3 driver
>>> returns error when enabling interrupts as it initializes. We can fix
>>> this by #ifdef'ing out the interrupt calls when building for FreeBSD,
>>> allowing the driver to initialize correctly.
>>>
>>> Fixes: 046f11619567 ("net/vmxnet3: support MSI-X interrupt")
>>>
>>> Reported-by: Lewis Donzis 
>>> Signed-off-by: Bruce Richardson 
>>> >
>> Tested-by: Lewis Donzis 
>>
> 
> Acked-by: Ferruh Yigit 
> 
> Applied to dpdk-next-net/main, thanks.


Re: [PATCH] net/vmxnet3: fix use of interrupts on FreeBSD

2024-01-24 Thread Lewis Donzis



- On Jan 24, 2024, at 7:58 AM, Ferruh Yigit ferruh.yi...@amd.com wrote:

> On 1/24/2024 12:34 PM, Lewis Donzis wrote:
> 
>> - On Jan 11, 2024, at 6:03 AM, Ferruh Yigit ferruh.yi...@amd.com wrote:
>> 
>>> On 1/9/2024 4:00 PM, Lewis Donzis wrote:
>>>>
>>>>
>>>> - On Jan 9, 2024, at 8:23 AM, Bruce Richardson 
>>>> bruce.richard...@intel.com
>>>> wrote:
>>>>
>>>>> DPDK does not support interrupts on FreeBSD, so the vmxnet3 driver
>>>>> returns error when enabling interrupts as it initializes. We can fix
>>>>> this by #ifdef'ing out the interrupt calls when building for FreeBSD,
>>>>> allowing the driver to initialize correctly.
>>>>>
>>>>> Fixes: 046f11619567 ("net/vmxnet3: support MSI-X interrupt")
>>>>>
>>>>> Reported-by: Lewis Donzis 
>>>>> Signed-off-by: Bruce Richardson 
>>>>>>
>>>> Tested-by: Lewis Donzis 
>>>>
>>>
>>> Acked-by: Ferruh Yigit 
>>>
>>> Applied to dpdk-next-net/main, thanks.
>>
>> Did this get checked in?
>>
>> Just curious because I don't see it in 22.11.4 that was released
> yesterday, so wasn't sure when it would show up.
>>
>>
> 
> Hi Lew,
> 
> It has been merged into the 'next-net' subtree and will be subsequently
> pulled into the 'main' tree. Eventually, it will be included in the
> 24.03 release.
> 
> LTS releases will backport the fix after the release of v24.03.
> 
> If you are interested in 22.11.# LTS, this fix is expected to be
> included in v22.11.5 scheduled for release in April."

Perfect, thanks very much for the explanation.

lew


Including contigmem in core dumps

2024-05-27 Thread Lewis Donzis
I've been wondering why we exclude memory allocated by eal_get_virtual_area() 
from core dumps? (More specifically, it calls eal_mem_set_dump() to call 
madvise() to disable core dumps from the allocated region.) 

On many occasions, when debugging after a crash, it would have been very 
convenient to be able to see the contents of an mbuf or other object allocated 
in contigmem space. And we often avoid using the rte memory allocator just 
because of this. 

Is there any reason for this, or could it perhaps be a compile-time 
configuration option not to call madvise()? 


Re: Including contigmem in core dumps

2024-05-28 Thread Lewis Donzis


- On May 28, 2024, at 1:55 AM, Dmitry Kozlyuk dmitry.kozl...@gmail.com 
wrote:

> Hi Lewis,
> 
> Memory reserved by eal_get_virtual_area() is not yet useful,
> but it is very large, so by excluding it from dumps,
> DPDK prevents dumps from including large zero-filled parts.
> 
> It also makes sense to call eal_mem_set_dump(..., false)
> from eal_memalloc.c:free_seg(), because of --huge-unlink=never:
> in this mode (Linux-only), freed segments are not cleared,
> so if they were included into dump, it would be a lot of garbage data.
> 
> Newly allocated hugepages are not included into dumps
> because this would make dumps very large by default.
> However, this could be an opt-in as a runtime option if need be.

Thanks for the clarification.  I agree that not including freed segments makes 
perfect sense.

But when debugging a core dump, it's sometimes really helpful to be able to see 
what's in the mbuf that was being processed at the time.  Perhaps it would be a 
useful option to be able to tell the allocator not to disable core dumps.

In the mean time, my experiments to get around this have not been fruitful.

I wondered if we could enable core dumps for mbufs by calling 
rte_mempool_mem_iter() on the pool returned by rte_pktmbuf_pool_create(), and 
have the callback function call madvise(memhdr->addr, memhdr->len, MADV_CORE).  
But that didn't help, or at least the size of the core file didn't increase.

I then tried disabling the call to madvise() in the DPDK source code, and that 
didn't make any difference either.

Note that this is on FreeBSD, so I wonder if there's some fundamental reason 
that the contigmem memory doesn't get included in a core dump?


[dpdk-dev] Change ether_addr to avoid conflict

2018-10-22 Thread Lewis Donzis
Please consider changing “struct ether_addr” in rte_ether.h to “struct 
rte_ether_addr”, in order to avoid conflicts with the same structure name 
/usr/include/net/ethernet.h.

This is kind of a pain for us since we would like to include ethernet.h in 
order to get some other definitions, but we can’t because it conflicts with the 
same structure name in rte_ether.h.

Thanks,
lew

[dpdk-dev] Change ether_addr to avoid conflict

2018-10-24 Thread Lewis Donzis
Please consider changing "struct ether_addr" in rte_ether.h to "struct 
rte_ether_addr", in order to avoid conflicts with the same structure name 
/usr/include/net/ethernet.h.

Sometimes, there is need to include /usr/include/net/ethernet.h, but it 
conflicts with the same structure name in rte_ether.h.  Was there a reason this 
structure wasn't given the standard rte_ prefix?

Thanks,
lew

[dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed memory on FreeBSD

2016-11-03 Thread Lewis Donzis
I?m curious about how/whether this got resolved.  The 16.07.1 code doesn?t 
appear to have this fixed.

Is it still forthcoming?

Thanks,
lew


[dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed memory on FreeBSD

2016-11-04 Thread Lewis Donzis

> On Nov 4, 2016, at 11:38 AM, Sergio Gonzalez Monroy  at intel.com> wrote:
> 
> On 03/11/2016 20:04, Lewis Donzis wrote:
>> I?m curious about how/whether this got resolved.  The 16.07.1 code doesn?t 
>> appear to have this fixed.
>> 
>> Is it still forthcoming?
>> 
>> Thanks,
>> lew
> 
> It should have been fixed in 16.07.1.
> http://dpdk.org/browse/dpdk-stable/commit/?id=82f931805506efbb8b5046e9045bec8f04bbabf6
> 
> Do you have an easy test to reproduce? can you reproduce it with any of the 
> app/examples?

Oh, sorry, I was looking at the rte_zmalloc() code, assuming it would be 
changed there, but it looks like the commit was to change contigmem instead, 
which seems perfectly logical and reasonable.

It?s entirely possible that the folks here who tested 16.07.1 forgot to replace 
the contigmem driver, so perhaps that is the problem.

We?ll check it out and report back.

Thanks!
lew


[dpdk-dev] [dpdk-users] rte_zmalloc() returning non-zeroed memory on FreeBSD

2016-11-04 Thread Lewis Donzis

> On Nov 4, 2016, at 11:38 AM, Sergio Gonzalez Monroy  at intel.com> wrote:
> It should have been fixed in 16.07.1.
> http://dpdk.org/browse/dpdk-stable/commit/?id=82f931805506efbb8b5046e9045bec8f04bbabf6

Hi, Sergio.

We confirmed that it works perfectly with 16.07.1 with the new contigmem driver.

Thanks!
lew


[dpdk-dev] CPU hog & memory leak on FreeBSD

2020-08-03 Thread Lewis Donzis
Hello. 

We've posted about this before, but I'm hoping to find someone willing to 
commit a patched version of lib/librte_eal/bsdapp/eal/eal_interrupts.c that 
corrects a memory leak and 100% CPU hog that is immediately noticeable with the 
i40e driver, at a minimum. We have modified this file to correct the problem, 
and would be happy to provide it back to whomever is a committer for this. 

The detailed explanation is: 

Currently, s etting an alarm with rte_eal_alarm_set() registers an alarm 
interrupt by calling rte_intr_callback_register(), which links the callback 
function (eal_alarm_callback) onto a list for that source and sets up a 
one-shot timer via kevent. Setting additional alarms links them on to the 
alarm_list, but also calls rte_eal_alarm_set() again, which links the callback 
function onto the source callback list again. 

When the alarm triggers and eal_alarm_callback() gets called, it goes down the 
list of pending alarms, calling as many callback functions as possible and 
removing each one from the list until it reaches one which has not yet expired. 
Once it's done, if alarm_list is not empty, it calls 
rte_intr_callback_register(), which then links yet another callback onto the 
interrupt source's list, thus creating an infinite loop. 

The problem is that the source callback list grows infinitely under this 
condition (essentially, when more than one alarm is queued). However, the call 
is necessary in order to reset the kevent timer. 

The proposed fix recognizes and leverages the fact that an alarm interrupt in 
FreeBSD should never have more than one callback on its list, so if 
rte_intr_callback_register() is called with an interrupt handle type of 
RTE_INTR_HANDLE_ALARM, and if such an interrupt type already has a non-empty 
list, then a new callback is not created, but the kevent timer is restarted 
properly. 

A much simpler change is possible if we don't mind the overhead of allocating, 
filling-in, linking, de-linking, and freeing a callback unnecessarily. This 
proposed change makes every attempt to avoid that overhead. 



Re: [dpdk-dev] CPU hog & memory leak on FreeBSD

2020-08-03 Thread Lewis Donzis
Sure, that would be great.  This is from 18.11.9.

Also, the entire modified file is attached.

Thanks very much!
lew


84,86c84,86
<   struct rte_intr_callback *callback = NULL;
<   struct rte_intr_source *src = NULL;
<   int ret, add_event;
---
>   struct rte_intr_callback *callback;
>   struct rte_intr_source *src;
>   int ret, add_event = 0;
99,106c99
<   /* allocate a new interrupt callback entity */
<   callback = calloc(1, sizeof(*callback));
<   if (callback == NULL) {
<   RTE_LOG(ERR, EAL, "Can not allocate memory\n");
<   return -ENOMEM;
<   }
<   callback->cb_fn = cb;
<   callback->cb_arg = cb_arg;
---
>   rte_spinlock_lock(&intr_lock);
108,110c101
<   rte_spinlock_lock(&intr_lock);
< 
<   /* check if there is at least one callback registered for the fd */
---
>   /* find the source for this intr_handle */
112,115c103,105
<   if (src->intr_handle.fd == intr_handle->fd) {
<   /* we had no interrupts for this */
<   if (TAILQ_EMPTY(&src->callbacks))
<   add_event = 1;
---
>   if (src->intr_handle.fd == intr_handle->fd)
> break;
> }
117,121c107,121
<   TAILQ_INSERT_TAIL(&(src->callbacks), callback, next);
<   ret = 0;
<   break;
<   }
<   }
---
> /* If this is an alarm interrupt and it already has a callback, then 
> we don't want to create a new callback
>  * because the only thing on the list should be eal_alarm_callback() 
> and we may be called just to reset the timer.
>  */
> if (src != NULL && src->intr_handle.type == RTE_INTR_HANDLE_ALARM && 
> !TAILQ_EMPTY(&src->callbacks)) {
> callback = NULL;
> } else {
>   /* allocate a new interrupt callback entity */
>   callback = calloc(1, sizeof(*callback));
>   if (callback == NULL) {
>   RTE_LOG(ERR, EAL, "Can not allocate memory\n");
>   ret = -ENOMEM;
> goto fail;
>   }
>   callback->cb_fn = cb;
>   callback->cb_arg = cb_arg;
123,134c123,137
<   /* no existing callbacks for this - add new source */
<   if (src == NULL) {
<   src = calloc(1, sizeof(*src));
<   if (src == NULL) {
<   RTE_LOG(ERR, EAL, "Can not allocate memory\n");
<   ret = -ENOMEM;
<   goto fail;
<   } else {
<   src->intr_handle = *intr_handle;
<   TAILQ_INIT(&src->callbacks);
<   TAILQ_INSERT_TAIL(&(src->callbacks), callback, next);
<   TAILQ_INSERT_TAIL(&intr_sources, src, next);
---
> if (src == NULL) {
>   src = calloc(1, sizeof(*src));
>   if (src == NULL) {
>   RTE_LOG(ERR, EAL, "Can not allocate memory\n");
>   ret = -ENOMEM;
>   goto fail;
>   } else {
>   src->intr_handle = *intr_handle;
>   TAILQ_INIT(&src->callbacks);
>   TAILQ_INSERT_TAIL(&intr_sources, src, next);
>   }
> }
> 
>   /* we had no interrupts for this */
>   if (TAILQ_EMPTY(&src->callbacks))
136,137c139,140
<   ret = 0;
<   }
---
> 
>   TAILQ_INSERT_TAIL(&(src->callbacks), callback, next);
177c180
<   return ret;
---
>   return 0;
181c184,185
<   TAILQ_REMOVE(&(src->callbacks), callback, next);
---
> if (callback != NULL)
>   TAILQ_REMOVE(&(src->callbacks), callback, next);



- On Aug 3, 2020, at 6:22 PM, Honnappa Nagarahalli 
honnappa.nagaraha...@arm.com wrote:

> Hello,
>   I can take a look if you can post the patch.
> 
>> -Original Message-
>> From: dev  On Behalf Of Lewis Donzis
>> Sent: Monday, August 3, 2020 2:43 PM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] CPU hog & memory leak on FreeBSD
>> 
>> Hello.
>> 
>> We've posted about this before, but I'm hoping to find someone willing to
>> commit a patched version of

Re: [PATCH] ixgbe: Removed FreeBSD forcing wait in ixgbe_dev_link_update_share()

2024-10-22 Thread Lewis Donzis



- On Oct 22, 2024, at 10:34 AM, Stephen Hemminger 
step...@networkplumber.org wrote:

> On Tue, 22 Oct 2024 09:42:05 -0500
> l...@perftech.com wrote:
> 
>> From: Lewis Donzis 
>> 
>> Forcing wait true prevents checking link status without delay, because the
>> function will wait more than 10 seconds for link status to be true.
>> 
>> Signed-off-by: Lewis Donzis 
> 
> A little concerned that original patch was trying to address a problem.
> 
> Fixes: 0012111a3d87 ("net/ixgbe: fix link status synchronization on BSD")

Yeah, I was concerned, too, but the fix causes a call to "give me link status 
immediately without waiting" to wait, and worse, it waits a really long time 
for the link to be "up".  So it pretty fundamentally breaks operation on 
FreeBSD.

I was worried it would wouldn't work properly with that change removed, but 
empirically, it appears to work correctly.  It seems like the presumption was 
that, because FreeBSD doesn't support interrupts, it wouldn't be able to get 
link status, but it does properly interrogate the hardware and provide the 
correct link status.

Perhaps Zhihong Peng could comment on why this was done?


Re: Including contigmem in core dumps

2024-10-22 Thread Lewis Donzis



- On Oct 22, 2024, at 10:39 AM, Stephen Hemminger 
step...@networkplumber.org wrote:

> On Tue, 22 Oct 2024 17:47:11 +0300
> Dmitry Kozlyuk  wrote:
> 
>> 2024-10-22 07:41 (UTC-0500), Lewis Donzis:
>> > I've been wondering why we exclude memory allocated by
>> > eal_get_virtual_area() from core dumps? (More specifically, it calls
>> > eal_mem_set_dump() to call madvise() to disable core dumps from the
>> > allocated region.)
>> > 
>> > On many occasions, when debugging after a crash, it would have been very
>> > convenient to be able to see the contents of an mbuf or other object
>> > allocated in contigmem space. And we often avoid using the rte memory
>> > allocator just because of this.
>> > 
>> > Is there any reason for this, or could it perhaps be a compile-time
>> > configuration option not to call madvise()?
>> 
>> The commit that originally added madvise() argued that dumping everything
>> ended up in coredumps with "useless" data [non-mapped or unused pages]:
>> 
>> http://git.dpdk.org/dpdk/commit/?id=d72e4042c5ebda7af81448b387af8218136402d0
>> 
>> Dumping mapped pages sounds reasonable in many cases.
>> Not in all cases admittedly:
>> - legacy memory mode mapping a lot of pages that are not (yet) used;
>> - if packet data is confidential while the app is not.
>> 
>> The option to dump or not can easily be a runtime one.
>> The safe default however seems to be "off".
>> 
>> In dynamic memory node (not FreeSBD, unfortunately)
>> rte_mem_event_callback-register() may be used to call madvise().
>> Maybe DPDK should allow such callbacks in any mode
>> and invoke them during initialization to make the above solution universal.
> 
> It is not unusual to have 2 or 4 Gigabytes of huge pages mapped.
> Many embedded systems do not have 6G of extra storage available for a single
> core
> dump, not to mention multiples. Plus any storage can be really slow on 
> embedded
> systems.
> 
> And the common scenario on Linux is to use systemd to capture and compress
> core dumps.

Totally agree.  Most of the time, we wouldn't want this enabled because the 
core dumps would be huge, but in environments where we do have storage 
available, it can greatly help troubleshooting to be able to examine the 
contents of contigmem in the debugger.  Unfortunately, on FreeBSD, we don't 
have the luxury of systemd's ability to do this, or even to compress the core 
dumps, although we do have ZFS compression on the local filesystem, which does 
a pretty decent job on on core dumps.


Re: FreeBSD problem with ixgbe

2024-10-22 Thread Lewis Donzis
I've reported this several times over the last two years, but there's been no 
reply and no change to the ixgbe driver.

Specifically, calling rte_eth_link_get_nowait() on FreeBSD does, in fact, wait 
for link-up which causes unexpected and long delays.

I suggest removing the line from ixgbe_dev_link_update_share() that forces 
"wait" to be set on FreeBSD.  Would someone be willing to commit this, please?

Thanks,
lew

Here's the "git diff" from a modified version:

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index ab37c37469..008760e315 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -4314,11 +4314,6 @@ ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc != 0)
wait = 0;
 
-/* BSD has no interrupt mechanism, so force NIC status synchronization. */
-#ifdef RTE_EXEC_ENV_FREEBSD
-   wait = 1;
-#endif
-


Including contigmem in core dumps

2024-10-22 Thread Lewis Donzis
I've been wondering why we exclude memory allocated by eal_get_virtual_area() 
from core dumps? (More specifically, it calls eal_mem_set_dump() to call 
madvise() to disable core dumps from the allocated region.) 

On many occasions, when debugging after a crash, it would have been very 
convenient to be able to see the contents of an mbuf or other object allocated 
in contigmem space. And we often avoid using the rte memory allocator just 
because of this. 

Is there any reason for this, or could it perhaps be a compile-time 
configuration option not to call madvise()? 



Re: FreeBSD problem with ixgbe

2024-10-22 Thread Lewis Donzis
Certainly.  This is my first attempt and I didn't realize some of the rules, 
but hopefully it'll work.

Thanks,
lew

- On Oct 22, 2024, at 8:03 AM, Bruce Richardson bruce.richard...@intel.com 
wrote:

> On Tue, Oct 22, 2024 at 07:32:01AM -0500, Lewis Donzis wrote:
>> I've reported this several times over the last two years, but there's been no
>> reply and no change to the ixgbe driver.
>> 
>> Specifically, calling rte_eth_link_get_nowait() on FreeBSD does, in fact, 
>> wait
>> for link-up which causes unexpected and long delays.
>> 
>> I suggest removing the line from ixgbe_dev_link_update_share() that forces
>> "wait" to be set on FreeBSD.  Would someone be willing to commit this, 
>> please?
>> 
>> Thanks,
>> lew
>> 
> 
> Hi Lewis,
> 
> could you please submit this change as a patch (using git send-email)
> including your signoff on it? We can't take code into DPDK without the
> appropriate signoff.
> 
> Thanks,
> 
> /Bruce
> 
> PS: For reference see:
> https://doc.dpdk.org/guides/contributing/patches.html#commit-messages-body
> 
>> Here's the "git diff" from a modified version:
>> 
>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c 
>> b/drivers/net/ixgbe/ixgbe_ethdev.c
>> index ab37c37469..008760e315 100644
>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
>> @@ -4314,11 +4314,6 @@ ixgbe_dev_link_update_share(struct rte_eth_dev *dev,
>> if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc != 0)
>> wait = 0;
>>  
>> -/* BSD has no interrupt mechanism, so force NIC status synchronization. */
>> -#ifdef RTE_EXEC_ENV_FREEBSD
>> -   wait = 1;
>> -#endif
> > -


Re: [PATCH] eal: support including mapped memory in core dump

2024-10-24 Thread Lewis Donzis



- On Oct 23, 2024, at 6:18 PM, Dmitry Kozlyuk dmitry.kozl...@gmail.com 
wrote:
> Lewis, testing on FreeBSD would be appreciated.

Well, unfortunately, it's not working very well...  The contigmem memory was 
not included in the core dump.

I added logging just before the madvise() call to prove that it's calling 
madvise() properly, and it looks good.
I then tried forcing all madvise() calls to use EAL_DODUMP.
And finally, I tried removing the call to madvise() altogether.

None of those things changed the size of the core dump from what it was, and my 
test of trying to access an mbuf in the debugger still says the memory is 
inaccessible.

What's even worse is, it appears that I had already tried the above 
work-arounds back on May 28th and mentioned that in an e-mail to you.  
Apparently I totally forgot about those experiments!  At the time, I wondered 
if it was just something fundamental about FreeBSD and contigmem.

It may take some digging through the FreeBSD source code to see how the dumper 
works.

So I apologize for not remembering that this wasn't entirely a DPDK issue.  
That said, it's nice that this is available on Linux and maybe we'll find a way 
to make it work on FreeBSD.

Thanks again for the efforts,
lew


Re: [PATCH v2 1/2] contigmem: support including mapped buffers in core dump

2024-10-28 Thread Lewis Donzis
It seems unlikely that there are other users of contigmem on FreeBSD, 
especially since it's not necessary for other applications.

On FreeBSD, use of large and huge pages is automatic; you just call mmap() with 
a large size and it automatically tries to use the largest physical pages 
possible.  So the only thing contigmem is doing for us on FreeBSD is providing 
the physical address and, of course, making it consistent with Linux.

At least, that's my understanding.


- On Oct 28, 2024, at 8:26 AM, Dmitry Kozlyuk dmitry.kozl...@gmail.com 
wrote:

> 2024-10-26 06:43 (UTC-0500), Lewis Donzis:
>> Is the extra control necessary, i.e., why not just always do this and let
>> the EAL option control whether the pages get dumped?
> 
> I've been evaluating your suggestion and see no downsides,
> except contigmem default behavior change, but does it have non-DPDK users?
> If no one objects, I'll prepare v3 doing the following:
> 1) everything from v2,
> 2) except always mark contigmem buffers as dumpable,
> 3) add --dump-mapped back and make DPDK disable dumping by default.
> As a result, in order to include mapped memory in coredump:
> * FreeBSD will require only "--dump-mapped";
> * Linux will require both "coredump_filter" setup and "--dump-mapped".


Re: [PATCH v2 1/2] contigmem: support including mapped buffers in core dump

2024-10-26 Thread Lewis Donzis
Is the extra control necessary, i.e., why not just always do this and let the 
EAL option control whether the pages get dumped?

- On Oct 25, 2024, at 3:26 PM, Dmitry Kozlyuk dmitry.kozl...@gmail.com 
wrote:

> It was impossible to include mapped contigmem buffers in core dump.
> Add hw.contigmem.coredump_enable tunable to control the inclusion.
> 
> Mappings backed by OBJ_FICTITIOUS are excluded from core dump.
> While contigmem is a device and its OBJT_DEVICE mappings get this flag,
> they are actually backed by memory.
> Clear OBJ_FICTITIOUS mapping bit to include them in core dump.
> Do this only if hw.contigmem.coredump_enable=1.
> 
> Signed-off-by: Dmitry Kozlyuk 
> ---
> doc/guides/freebsd_gsg/build_dpdk.rst | 27 ---
> kernel/freebsd/contigmem/contigmem.c  |  8 
> 2 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/guides/freebsd_gsg/build_dpdk.rst
> b/doc/guides/freebsd_gsg/build_dpdk.rst
> index 86e8e5a805..387ec52820 100644
> --- a/doc/guides/freebsd_gsg/build_dpdk.rst
> +++ b/doc/guides/freebsd_gsg/build_dpdk.rst
> @@ -86,6 +86,22 @@ module loading using::
> kenv hw.contigmem.num_buffers=n
> kenv hw.contigmem.buffer_size=m
> 
> +Where n is the number of blocks and m is the size in bytes of each area of
> +contiguous memory.  A default of two buffers of size 1073741824 bytes (1
> Gigabyte)
> +each is set during module load if they are not specified in the environment.
> +
> +Buffers are excluded from core dump by default.
> +Mapped buffers can be included in core dump using the following tunable:
> +
> +hw.contigmem.coredump_enable=1
> +
> +.. note::
> +
> +Including contigmem buffers in core dump file increases its size,
> +which may fill the storage or overload the transport.
> +Buffers typically hold data processed by the application,
> +like network packets, which may contain sensitive information.
> +
> The kernel environment variables can also be specified during boot by placing
> the
> following in ``/boot/loader.conf``:
> 
> @@ -93,15 +109,12 @@ following in ``/boot/loader.conf``:
> 
> hw.contigmem.num_buffers=n
> hw.contigmem.buffer_size=m
> +hw.contigmem.coredump_enable=1
> 
> The variables can be inspected using the following command::
> 
> sysctl -a hw.contigmem
> 
> -Where n is the number of blocks and m is the size in bytes of each area of
> -contiguous memory.  A default of two buffers of size 1073741824 bytes (1
> Gigabyte)
> -each is set during module load if they are not specified in the environment.
> -
> The module can then be loaded using kldload::
> 
> kldload contigmem
> @@ -115,13 +128,13 @@ up time.  This can be achieved by placing lines similar 
> to
> the following into
> 
> hw.contigmem.num_buffers=1
> hw.contigmem.buffer_size=1073741824
> +hw.contigmem.coredump_enable=1
> contigmem_load="YES"
> 
> .. note::
> 
> -The contigmem_load directive should be placed after any definitions of
> -``hw.contigmem.num_buffers`` and ``hw.contigmem.buffer_size`` if the
> default values
> -are not to be used.
> +The ``contigmem_load`` directive should be placed after any definitions
> +of ``hw.contigmem.*`` if the default values are not to be used.
> 
> An error such as::
> 
> diff --git a/kernel/freebsd/contigmem/contigmem.c
> b/kernel/freebsd/contigmem/contigmem.c
> index 7dd87599d9..591e46dded 100644
> --- a/kernel/freebsd/contigmem/contigmem.c
> +++ b/kernel/freebsd/contigmem/contigmem.c
> @@ -51,6 +51,7 @@ static d_close_tcontigmem_close;
> 
> static int  contigmem_num_buffers = 
> RTE_CONTIGMEM_DEFAULT_NUM_BUFS;
> static int64_t  contigmem_buffer_size = 
> RTE_CONTIGMEM_DEFAULT_BUF_SIZE;
> +static bool contigmem_coredump_enable;
> 
> static eventhandler_tag contigmem_eh_tag;
> static struct contigmem_buffer contigmem_buffers[RTE_CONTIGMEM_MAX_NUM_BUFS];
> @@ -59,6 +60,7 @@ static int  contigmem_refcnt;
> 
> TUNABLE_INT("hw.contigmem.num_buffers", &contigmem_num_buffers);
> TUNABLE_QUAD("hw.contigmem.buffer_size", &contigmem_buffer_size);
> +TUNABLE_BOOL("hw.contigmem.coredump_enable", &contigmem_coredump_enable);
> 
> static SYSCTL_NODE(_hw, OID_AUTO, contigmem, CTLFLAG_RD, 0, "contigmem");
> 
> @@ -68,6 +70,8 @@ SYSCTL_QUAD(_hw_contigmem, OID_AUTO, buffer_size, 
> CTLFLAG_RD,
>   &contigmem_buffer_size, 0, "Size of each contiguous buffer");
> SYSCTL_INT(_hw_contigmem, OID_AUTO, num_references, CTLFLAG_RD,
>   &contigmem_refcnt, 0, "Number of references to contigmem");
> +SYSCTL_BOOL(_hw_contigmem, OID_AUTO, coredump_enable, CTLFLAG_RD,
> + &contigmem_coredump_enable, 0, "Include mapped buffers in core dump");
> 
> static SYSCTL_NODE(_hw_contigmem, OID_AUTO, physaddr, CTLFLAG_RD, 0,
>   "physaddr");
> @@ -357,5 +361,9 @@ contigmem_mmap_single(struct cdev *cdev, vm_ooffset_t
> *offset, vm_size_t size,
>   *obj = cdev_pager_allocate(vmh, OBJT_DEVICE, &contigmem_cdev_pager_

FreeBSD problem with ixgbe

2024-09-27 Thread Lewis Donzis
I'm pretty sure this is been reported before, but in ixgbe_ethdev.c, line 4311 
begins:

/* BSD has no interrupt mechanism, so force NIC status synchronization. */
#ifdef RTE_EXEC_ENV_FREEBSD
wait = 1;
#endif

We've had to remove this code ever since it was added because it causes 
improper delays in our code.  When we ask for link status without waiting, we 
need for it not to wait, and I'm not sure why a lack of interrupts makes any 
difference in this case.  Removing this code allows it to behave properly, best 
we can tell.

Thanks,
lew


fbdev

2024-12-03 Thread Lewis Donzis
fbdev is now runnin FreeBSD 14.2. 

The former version (14.1) is now running on fbdev141. 

Thanks, 
lew