Re: Performance test for CUBIC in stable/14

2024-10-27 Thread void
On Fri, 25 Oct 2024, at 13:13, Cheng Cui wrote: > Here is my example. I am using two 6-core/12-threads desktops for my > bhyve servers. > CPU: AMD Ryzen 5 5560U with Radeon Graphics (2295.75-MHz > K8-class CPU) > > You can find test results on VMs from

Re: Performance test for CUBIC in stable/14

2024-10-25 Thread Cheng Cui
Here is my example. I am using two 6-core/12-threads desktops for my bhyve servers. CPU: AMD Ryzen 5 5560U with Radeon Graphics (2295.75-MHz K8-class CPU) You can find test results on VMs from my wiki: https://wiki.freebsd.org/chengcui/testD46046 All the CPU utilization results are low

Re: Performance test for CUBIC in stable/14

2024-10-23 Thread void
example, the sender CPU shows 97.7% utilization. Would there be any way to reduce CPU usage? There are 11 VMs running on the bhyve server. None of them are very busy but the server shows % uptime 9:54p.m. up 8 days, 6:08, 22 users, load averages: 0.82, 1.25, 1.74 The test vm vm4-fbsd14s: % uptime

Re: Performance test for CUBIC in stable/14

2024-10-23 Thread Cheng Cui
18:09:22 BST 2024 > r...@vm4-fbsd14s.home.arpa:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 > Control connection MSS 1460 > Time: Wed, 23 Oct 2024 14:41:11 UTC > Connecting to host 192.168.1.232, port 5201 >Cookie: tvrlkd2axzx24uui7gglzk4ni66ib7qy4kxa >TCP MSS

Re: Performance test for CUBIC in stable/14

2024-10-23 Thread void
efault) [ 5] local 192.168.1.13 port 5201 connected to 192.168.1.232 port 5201 Starting Test: protocol: TCP, 1 streams, 1048576 byte blocks, omitting 0 seconds, 20 second test, tos 0 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-2.01 sec 137 MBytes 572 Mbi

Re: Performance test for CUBIC in stable/14

2024-10-23 Thread Cheng Cui
t from `ping` (latency) between these VMs? > > That test wasn't between VMs. It was from the vm with the patches to a > workstation > on the same switch. > > ping from the vm to the workstation: > > --- 192.168.1.232 ping statistics --- > 10 packets transmitted, 10 packets

Re: Performance test for CUBIC in stable/14

2024-10-22 Thread void
On Tue, Oct 22, 2024 at 03:57:42PM -0400, Cheng Cui wrote: What is the output from `ping` (latency) between these VMs? That test wasn't between VMs. It was from the vm with the patches to a workstation on the same switch. ping from the vm to the workstation: --- 192.168.1.232

Re: Performance test for CUBIC in stable/14

2024-10-22 Thread Cheng Cui
What is the output from `ping` (latency) between these VMs? cc On Tue, Oct 22, 2024 at 11:31 AM void wrote: > On Tue, Oct 22, 2024 at 10:59:28AM -0400, Cheng Cui wrote: > > > Please re-organize your test result in before/after patch order. So that > I > > can unders

Re: Performance test for CUBIC in stable/14

2024-10-22 Thread void
On Tue, Oct 22, 2024 at 10:59:28AM -0400, Cheng Cui wrote: Please re-organize your test result in before/after patch order. So that I can understand and compare them. Sure. Before: [ ID] Interval Transfer Bandwidth [ 1] 0.00-60.02 sec 5.16 GBytes 738 Mbits/sec After: [ ID

Re: Performance test for CUBIC in stable/14

2024-10-22 Thread Cheng Cui
On Mon, Oct 21, 2024 at 2:25 PM void wrote: > On Mon, Oct 21, 2024 at 10:42:49AM -0400, Cheng Cui wrote: > >Change the subject to `Performance test for CUBIC in stable/14`, was `Re: > >Performance issues with vnet jails + epair + bridge`. > > > >I actually prepared

Re: Performance test for CUBIC in stable/14

2024-10-21 Thread void
On Mon, Oct 21, 2024 at 10:42:49AM -0400, Cheng Cui wrote: Change the subject to `Performance test for CUBIC in stable/14`, was `Re: Performance issues with vnet jails + epair + bridge`. I actually prepared two patches, one depends on the other: https://reviews.freebsd.org/D47218 <<

Re: Performance test for CUBIC in stable/14

2024-10-21 Thread Cheng Cui
Change the subject to `Performance test for CUBIC in stable/14`, was `Re: Performance issues with vnet jails + epair + bridge`. I actually prepared two patches, one depends on the other: https://reviews.freebsd.org/D47218 << apply this patch firstly https://reviews.freebsd.org/

Re: CALL FOR TEST axgbe promisc mode

2024-10-12 Thread Zhenlei Huang
r wrote: >>>> Hi, >>>> >>>>> On 1. Oct 2024, at 02:47, Zhenlei Huang wrote: >>>>> >>>>> The test plan is simple, either of the following should suffice: >>>>> >>>>> • Do traffic sniffing on axgbe

Re: CALL FOR TEST axgbe promisc mode

2024-10-11 Thread Franco Fichtner
On 8. Oct 2024, at 14:25, Mark Johnston wrote: Maybe the firmware / hardware happens to been ( wrongly ) set to promisc mode already ? Maybe, or the driver is missing some initialization step. That would be the likeliest case although I'm not sure why exiting promisc mode doesn't turn it off

Re: CALL FOR TEST axgbe promisc mode

2024-10-08 Thread Mark Johnston
On Mon, Oct 07, 2024 at 10:52:19PM +0800, Zhenlei Huang wrote: > > > > On Oct 2, 2024, at 3:42 PM, Mark Johnston wrote: > > > > On Tue, Oct 01, 2024 at 12:46:07PM +, Franco Fichtner wrote: > >> Hi, > >> > >>> On 1. Oct 2024, at 02:4

Re: CALL FOR TEST axgbe promisc mode

2024-10-07 Thread Zhenlei Huang
> On Oct 2, 2024, at 3:42 PM, Mark Johnston wrote: > > On Tue, Oct 01, 2024 at 12:46:07PM +, Franco Fichtner wrote: >> Hi, >> >>> On 1. Oct 2024, at 02:47, Zhenlei Huang wrote: >>> >>> The test plan is simple, either of the following s

Re: CALL FOR TEST axgbe promisc mode

2024-10-02 Thread Mark Johnston
On Tue, Oct 01, 2024 at 12:46:07PM +, Franco Fichtner wrote: > Hi, > > > On 1. Oct 2024, at 02:47, Zhenlei Huang wrote: > > > > The test plan is simple, either of the following should suffice: > > > > • Do traffic sniffing on axgbe interface. The interf

Re: CALL FOR TEST axgbe promisc mode

2024-10-01 Thread Franco Fichtner
Hi, > On 1. Oct 2024, at 02:47, Zhenlei Huang wrote: > > The test plan is simple, either of the following should suffice: > > • Do traffic sniffing on axgbe interface. The interface will enter promisc > mode and should see packets not for us. I tested this with and without

CALL FOR TEST axgbe promisc mode

2024-09-30 Thread Zhenlei Huang
Hi, I have a simple patch [1] to fix setting PROMISC mode for driver axgbe. I do not have that hardware, so can not runtime verify the fix. I'd appreciate if someone could have a test for it. The test plan is simple, either of the following should suffice: • Do traffic sniffing on

A syzkaller regression test triggered a panic

2023-09-09 Thread Peter Holm
Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0x80d21330 stack pointer = 0x28:0xfe01d39b9b20 frame pointer = 0x28:0xfe01d39b9b40 code segment= base 0x0, limit 0xf, type 0x1b

[Bug 257268] Panic via kyua test netpfil/common/tos:ipfw_tos: Bad link elm 0xfffff8003d66ffd8 prev->next != elm in slab_free_item * at /boiler/nfs/src/sys/vm/uma_core.c:4733

2022-08-12 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257268 Gordon Bergling changed: What|Removed |Added Resolution|--- |Overcome By Events S

[Bug 257268] Panic via kyua test netpfil/common/tos:ipfw_tos: Bad link elm 0xfffff8003d66ffd8 prev->next != elm in slab_free_item * at /boiler/nfs/src/sys/vm/uma_core.c:4733

2021-08-20 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257268 --- Comment #1 from Gordon Bergling --- This panic only happens with the mentioned KERNCONF, when running GENERIC I couldn't reproduce this panic. -- You are receiving this mail because: You are the assignee for the bug.

[Bug 257268] Panic via kyua test netpfil/common/tos:ipfw_tos: Bad link elm 0xfffff8003d66ffd8 prev->next != elm in slab_free_item * at /boiler/nfs/src/sys/vm/uma_core.c:4733

2021-07-20 Thread bugzilla-noreply
|crash, needs-qa Summary|[Panic] Reproducible panic |Panic via kyua test |via kyua test |netpfil/common/tos:ipfw_tos |netpfil/common/tos:ipfw_tos |: Bad link elm

[Bug 257268] [Panic] Reproducible panic via kyua test netpfil/common/tos:ipfw_tos

2021-07-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257268 Gordon Bergling changed: What|Removed |Added Assignee|b...@freebsd.org|n...@freebsd.org Keywo

[Bug 253061] sys/net/if_vlan:qinq_deep test triggers "UNR: free_unr(3735929054) out of range" panic

2021-01-29 Thread bugzilla-noreply
Assignee|n...@freebsd.org |melif...@freebsd.org Status|New |Open --- Comment #1 from Alexander V. Chernikov --- Does it always panic? Wasn't able to reproduce by running the actual test ~100 times on

[Bug 253061] sys/net/if_vlan:qinq_deep test triggers "UNR: free_unr(3735929054) out of range" panic

2021-01-28 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253061 Mark Linimon changed: What|Removed |Added Assignee|b...@freebsd.org|n...@freebsd.org -- You are receiv

wireguard integration D26137 connectivity test

2020-09-01 Thread Peter Libassi
I can confirm that the issue reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247853 is now resolved. Thanks Peter ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.or

wireguard integration D26137 test on 13-CURRENT r364973

2020-08-30 Thread Peter Libassi
I had a first look at the latest version of the wireguard kernel integration. There was some findings that may need attention. buildkernel stops with kernel option INVARIANTS enabled: --- all_subdir_if_wg --- /usr/src/sys/dev/if_wg/module/if_wg_session.c:1639:22: error: unused variable 'e' [-We

Re: test suite for NIC features...

2020-07-20 Thread Kurt Jaeger
Hi! > Has anyone compiled a script/test suite for testing various NIC > features to make sure they work/function properly? > > That is, being able to run a couple interfaces back to back, and turn > off the features off on one, and make sure things like checksum offload >

test suite for NIC features...

2020-07-20 Thread John-Mark Gurney
Has anyone compiled a script/test suite for testing various NIC features to make sure they work/function properly? That is, being able to run a couple interfaces back to back, and turn off the features off on one, and make sure things like checksum offload and the like work properly? -- John

[Bug 138266] [panic] kernel panic when udp benchmark test used as regular user

2019-02-01 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=138266 Tom Jones changed: What|Removed |Added Resolution|--- |Overcome By Events Status|

Re: Panic during ci test run

2018-08-15 Thread Kristof Provost
te: The fibs_test:subnet_route_with_multiple_fibs_on_same_subnet test (/usr/tests/sys/netinet/) consistently provokes a panic. Note that this requires: - test_suites.FreeBSD.fibs = '1 2' in /usr/local/etc/kyua/kyua.conf - net.fibs=3 in /boot/loader.conf - sysctl net.add_

Re: Panic during ci test run

2018-08-15 Thread Kristof Provost
That’s odd. With the appropriate flags set the panic is completely reliable for me. Does the test just succeed for you, or does it skip? I don’t have any local changes, and it reproduces on the ci.freebsd.org tests in any case. Regards, Kristof On 14 Aug 2018, at 23:42, Matthew Macy wrote

Re: Panic during ci test run

2018-08-14 Thread Matthew Macy
This isn't reproducing it for me. I'll need more specifics on your configuration. -M On Sat, Aug 11, 2018 at 2:04 AM Kristof Provost wrote: > The fibs_test:subnet_route_with_multiple_fibs_on_same_subnet test > (/usr/tests/sys/netinet/) consistently provokes a panic. > > N

Panic during ci test run

2018-08-11 Thread Kristof Provost
The fibs_test:subnet_route_with_multiple_fibs_on_same_subnet test (/usr/tests/sys/netinet/) consistently provokes a panic. Note that this requires: - test_suites.FreeBSD.fibs = '1 2' in /usr/local/etc/kyua/kyua.conf - net.fibs=3 in /boot/loader.conf - sysctl net.add_addr_allfi

[Bug 138266] [panic] kernel panic when udp benchmark test used as regular user

2018-05-28 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=138266 Eitan Adler changed: What|Removed |Added Status|In Progress |Open --- Comment #2 from Eitan Adler

Re: about that DFBSD performance test

2017-03-11 Thread Sepherosa Ziehau
On Wed, Mar 8, 2017 at 8:25 PM, Kevin Bowling wrote: > Right off the bat, FreeBSD doesn't really understand NUMA in any sufficient > capacity. Unfortunately at companies like the one I work at, we take that > to mean "OK buy a high bin CPU and only populate one socket" which serves > us well and

Re: about that DFBSD performance test

2017-03-09 Thread Anton Yuzhaninov
On 03/08/17 10:03, Mateusz Guzik wrote: First and foremost there is general kernel scalability. Certain counters and most locks are purely managed with atomic operations. An atomic operation grabs the entire cacheline with the particular variable (64 bytes in total) in exclusive mode. Isn't pro

Re: about that DFBSD performance test

2017-03-08 Thread Slawa Olhovchenkov
On Wed, Mar 08, 2017 at 04:03:46PM +0100, Mateusz Guzik wrote: > On Wed, Mar 08, 2017 at 03:57:10PM +0300, Slawa Olhovchenkov wrote: > > On Wed, Mar 08, 2017 at 05:25:57AM -0700, Kevin Bowling wrote: > > > > > Right off the bat, FreeBSD doesn't really understand NUMA in any > > > sufficient > >

Re: about that DFBSD performance test

2017-03-08 Thread Mateusz Guzik
On Wed, Mar 08, 2017 at 03:57:10PM +0300, Slawa Olhovchenkov wrote: > On Wed, Mar 08, 2017 at 05:25:57AM -0700, Kevin Bowling wrote: > > > Right off the bat, FreeBSD doesn't really understand NUMA in any sufficient > > capacity. Unfortunately at companies like the one I work at, we take that > >

Re: about that DFBSD performance test

2017-03-08 Thread Slawa Olhovchenkov
On Wed, Mar 08, 2017 at 05:25:57AM -0700, Kevin Bowling wrote: > Right off the bat, FreeBSD doesn't really understand NUMA in any sufficient > capacity. Unfortunately at companies like the one I work at, we take that > to mean "OK buy a high bin CPU and only populate one socket" which serves NUM

Re: about that DFBSD performance test

2017-03-08 Thread Kevin Bowling
EPORT anyway for software that does not understand librss natively. I have no commercial need for kernel IP forwarding but Matt Macy is doing a lot of driver work for me and we are at least trying to keep it in line with the legacy drivers. Since sephe's test was ixgbe, it'd be interesting

Re: about that DFBSD performance test

2017-03-08 Thread Slawa Olhovchenkov
On Wed, Mar 08, 2017 at 09:00:34AM +0500, Eugene M. Zheganin wrote: > Hi. > > Some have probably seen this already - > http://lists.dragonflybsd.org/pipermail/users/2017-March/313254.html > > So, could anyone explain why FreeBSD was owned that much. Test is split > i

about that DFBSD performance test

2017-03-07 Thread Eugene M. Zheganin
Hi. Some have probably seen this already - http://lists.dragonflybsd.org/pipermail/users/2017-March/313254.html So, could anyone explain why FreeBSD was owned that much. Test is split into two parts, one is nginx part, and the other is the IPv4 forwarding part. I understand that nginx

TSO test

2016-03-19 Thread Jack Vogel
Anyone have a 'simple' test case for TSO, something a bit more distilled than running netperf or iperf? Thanks, Jack ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send a

Re: TSO test

2016-03-19 Thread Alan Somers
ednesday, March 16, 2016 11:31 PM > To: FreeBSD Net > Subject: TSO test > > Anyone have a 'simple' test case for TSO, something a bit > more distilled than running netperf or iperf? > > Thanks, > > Jack > ___ > free

Re: TSO test

2016-03-19 Thread Eric van Gyzen
Message- > From: Eric van Gyzen [mailto:vangy...@freebsd.org] > Sent: Thursday, March 17, 2016 8:56 AM > To: Pieper, Jeffrey E ; Alan Somers > > Cc: FreeBSD Net ; Jack Vogel > Subject: Re: TSO test > > Jeff, > > So, you reboot the DUT between each scenario? > >

Re: TSO test

2016-03-19 Thread Eric van Gyzen
arch 17, 2016 8:23 AM > To: Alan Somers ; Pieper, Jeffrey E > > Cc: FreeBSD Net ; Jack Vogel > Subject: Re: TSO test > > Alan, > > That does sound useful. As one complication, "vmstat -i" shows the > interrupt rate since boot. Either the test would need to r

RE: TSO test

2016-03-19 Thread Pieper, Jeffrey E
[mailto:owner-freebsd-...@freebsd.org] On Behalf Of Jack Vogel Sent: Wednesday, March 16, 2016 11:31 PM To: FreeBSD Net Subject: TSO test Anyone have a 'simple' test case for TSO, something a bit more distilled than running netperf or iperf? Tha

Re: TSO test

2016-03-19 Thread Eugene Grosbein
17.03.2016 22:22, Eric van Gyzen пишет: Alan, That does sound useful. As one complication, "vmstat -i" shows the interrupt rate since boot. Either the test would need to reboot between each iteration, or vmstat would need to be improved to show the "recent" rate. The la

RE: TSO test

2016-03-19 Thread Pieper, Jeffrey E
No, we have scripts that parse vmstat -i. Jeff -Original Message- From: Eric van Gyzen [mailto:vangy...@freebsd.org] Sent: Thursday, March 17, 2016 8:56 AM To: Pieper, Jeffrey E ; Alan Somers Cc: FreeBSD Net ; Jack Vogel Subject: Re: TSO test Jeff, So, you reboot the DUT between

RE: TSO test

2016-03-18 Thread Pieper, Jeffrey E
You can also parse vmstat -i, which we do as well. Jeff -Original Message- From: Eric van Gyzen [mailto:vangy...@freebsd.org] Sent: Thursday, March 17, 2016 8:23 AM To: Alan Somers ; Pieper, Jeffrey E Cc: FreeBSD Net ; Jack Vogel Subject: Re: TSO test Alan, That does sound useful

Re: TSO test

2016-03-18 Thread Eric van Gyzen
Alan, That does sound useful. As one complication, "vmstat -i" shows the interrupt rate since boot. Either the test would need to reboot between each iteration, or vmstat would need to be improved to show the "recent" rate. The latter would be a much welcome improvement,

IPv6/UDP locking improvement (can you review? test?)

2015-12-22 Thread Bjoern A. Zeeb
Hi, I have had a patch in review https://reviews.freebsd.org/D3721 for a while which improves IPv6/UDP packets per second rates. It’s modelled after the IPv4 version done a few years back. In case anyone wants to or can review or test it, any feedback will be welcome. I plan to commit it

Test. Please ignore

2015-11-13 Thread team
Test message. Please ignore. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

em, igb performance test

2015-09-05 Thread Nomad Esst via freebsd-net
packets to get arrived at the other side, but tcpdump (on the other side) shows 4, sometimes 8 and etc ... (not all 10 packets are arrived at the other side). We test this scenario with a Cisco router, and all packets are received at the Cisco side. What causes this packet loss in FreeBSD (maybe in

[Differential] [Closed] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests).

2015-03-28 Thread rrs (Randall Stewart)
rrs closed this revision. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, imp, adrian, hselasky, sbruno Cc: julian, hiren, jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net ___ freebsd-net@freeb

[Differential] [Accepted] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests).

2015-03-24 Thread sbruno (Sean Bruno)
sbruno accepted this revision. sbruno added a comment. This revision is now accepted and ready to land. Randall: I think this needs to be manually closed as the svn commit hook didn't fire when r278469 hit the tree. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lst

[Differential] [Closed] D1872: Add test cases for nvlist_move_*

2015-02-19 Thread rstone (Ryan Stone)
rstone closed this revision. REVISION DETAIL https://reviews.freebsd.org/D1872 To: rstone, jfvogel, pjd Cc: freebsd-net, pjd ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail t

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-19 Thread hiren (hiren panchasara)
hiren added a comment. Another panic from an almost *idle* box: Sanitized panic #6 Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 6525980672B (6223 MB) Blocksize: 512 Dumptime: Thu Feb 19 06:16:57 2015 Hostname: xx

[Differential] [Accepted] D1872: Add test cases for nvlist_move_*

2015-02-18 Thread pjd (Pawel Jakub Dawidek)
pjd accepted this revision. pjd added a reviewer: pjd. pjd added a comment. Looks good to me. REVISION DETAIL https://reviews.freebsd.org/D1872 To: rstone, jfvogel, pjd Cc: freebsd-net, pjd ___ freebsd-net@freebsd.org mailing list http://lists.freebs

[Differential] [Accepted] D1872: Add test cases for nvlist_move_*

2015-02-18 Thread jfvogel (Jack Vogel)
jfvogel accepted this revision. This revision is now accepted and ready to land. REVISION DETAIL https://reviews.freebsd.org/D1872 To: rstone, jfvogel Cc: freebsd-net, pjd ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listi

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-18 Thread rrs (Randall Stewart)
rrs added a comment. Ok after much discussion with Hans, we *could* have an issue where the user sends in an invalid CPU. This is *not* what I think is happening with Hiren since the cc_cpu and lock is all sane (it would be a invalid index to cc_cpu which would not have an init'd lock). But I hav

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-18 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. Let me re-phrase if I was unclear: I see nothing preventing the callout_reset() macro from reading (c)->c_cpu lock when it is equal to CPUBLOCK while another CPU is calling callout_cpu_switch() on the same callout. Especially in the case of a migration case done by th

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-18 Thread rrs (Randall Stewart)
rrs added a comment. I have thought long and hard about this. I don't think its a bug. But to know for sure I will need to add some instrumentation. I suspect what is happening is a tremendous number of callouts all come due at the same time. The three back traces trying to stop or reset a callou

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-18 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. Randall: Shooting again: Thread 1 is executing in "softclock_call_cc()" in the "new_cc = callout_cpu_switch(c, cc, new_cpu)" it has set "c->c_cpu = CPUBLOCK;" Thread 2 is now executing callout_reset(). As you can see in the implementation detail, it is reading "c_cpu"

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hans: I think your wrong here. The caller of callout_cpu_switch() is holding the CC_LOCK(). Now there are only two callers of this function. Either the actual callout code itself (softclock_call_cc()) or the callout_reset_sbt_on(). In the case of callout_reset_sbt_on(). So

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. randall: You are right I confused the two c_cpu values. Let my try to shoot again: static struct callout_cpu * callout_cpu_switch(struct callout *c, struct callout_cpu *cc, int new_cpu) { struct callout_cpu *new_cc; MPASS(c != NULL && cc != NULL);

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hans: Let me explain to you how I think you are wrong, you are missing a small subtle thing here When we do the callout_stop we set cc_migration_cpu() = CPUBLOCK *NOT* c->c_cpu = CPUBLOCK; You are confusing the two things. The CPUBLOCK is used in two different places

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#96, @rrs wrote: > Hiren: > > You have the wrong structure type. > > In the printf before panic it is giving you the lock that was spinning.. that > would be in the callout_cpu structure I bet.. I mis-told you in email. > > So if you did > > print *(struct ca

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. randall: Let me try to explain a bit slower: Assume that a callout has been cancelled and is now migrating to another CPU. c->c_cpu = CPUBLOCK. Upon calling _callout_stop_safe() we will enter the callout_lock() function which will wait for the condition "c->c_cpu == C

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hiren: You have the wrong structure type. In the printf before panic it is giving you the lock that was spinning.. that would be in the callout_cpu structure I bet.. I mis-told you in email. So if you did print *(struct callout_cpu *)0x81364180 It should show you

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#92, @rrs wrote: > Hiren: > > There also should have been a printf before the panic string > printf( "spin lock %p (%s) held by %p (tid %d) too long\n", > m, m->lock_object.lo_name, td, td->td_tid); > > Can we see what that lovely printf has displa

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Wow, but look at the flags here. They are cc_flags == 0. That means its *not* on the wheel and yet the thing it points to (our victim) *thinks* its on the wheel. This is not good.. We are stuck in a lock trying to reschedule the timeout (a lock that is not locked by the way

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#91, @rrs wrote: > Hiren: > > Thats helpful.. as I said this is strange. The callout you posted shows its > associated with CPU 0, (c_cpu == 0), and yet > the mtx on that (which is what we are spinning on) is free (its owned == 4). > So why would we have crash

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hiren: There also should have been a printf before the panic string printf( "spin lock %p (%s) held by %p (tid %d) too long\n", m, m->lock_object.lo_name, td, td->td_tid); Can we see what that lovely printf has displayed? In theory the lo_name should be "callou

[Differential] [Request, 82 lines] D1872: Add test cases for nvlist_move_*

2015-02-17 Thread rstone (Ryan Stone)
rstone created this revision. rstone added a reviewer: jfvogel. rstone added subscribers: pjd, freebsd-net. REVISION DETAIL https://reviews.freebsd.org/D1872 AFFECTED FILES lib/libnv/tests/nv_tests.cc To: rstone, jfvogel Cc: freebsd-net, pjd ___ fr

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hiren: Thats helpful.. as I said this is strange. The callout you posted shows its associated with CPU 0, (c_cpu == 0), and yet the mtx on that (which is what we are spinning on) is free (its owned == 4). So why would we have crashed holding the spin lock too long? Unless j

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#86, @hselasky wrote: > Hi, > > rrs + hiren: > > I think the problem is this: > > In "_callout_stop_safe()" we sometimes exit having "cc_migration_cpu(cc, > direct) = CPUBLOCK;". Now if a second call to "_callout_stop_safe()" happens > before the pending cal

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hiren (hiren panchasara)
hiren added a comment. >>! In D1711#88, @rrs wrote: > Hans: > > I don't get your call sequence, I sent you an email on it.. > > Hiren: > > Can you go up the call chain and dump the callout structure > c in > 0x80760064 in callout_lock (c=0xf8000d81dc98) at > /usr/src/sys/kern/kern_

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread rrs (Randall Stewart)
rrs added a comment. Hans: I don't get your call sequence, I sent you an email on it.. Hiren: Can you go up the call chain and dump the callout structure c in 0x80760064 in callout_lock (c=0xf8000d81dc98) at /usr/src/sys/kern/kern_timeout.c:530 There is something funny here, becau

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. If you change how "cc_migration_cpu(cc, direct)" works, the "cc_cce_migrating()" checks become invalid. I think you need to introduce yet another callout flag REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbr

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-17 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. Hi, rrs + hiren: I think the problem is this: In "_callout_stop_safe()" we sometimes exit having "cc_migration_cpu(cc, direct) = CPUBLOCK;". Now if a second call to "_callout_stop_safe()" happens before the pending callback has returned, which is using a mutex, we ar

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-16 Thread hiren (hiren panchasara)
hiren added a comment. @hps: cc_cpu[MAXCPU] info as you requested on IRC. Let me know if you need more info. (kgdb) backtrace #0 doadump (textdump=1) at pcpu.h:219 #1 0x80749c17 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0x80749ff4 in pa

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-16 Thread hiren (hiren panchasara)
hiren added a comment. @rrs: One more Sanitized panic #5 Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 1694281728B (1615 MB) Blocksize: 512 Dumptime: Sun Feb 15 18:03:14 2015 Hostname: x Magic: FreeBSD

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-16 Thread hiren (hiren panchasara)
hiren added a comment. @rrs: Looks like we've come full circle back to the very first crash reported. We are on stable10 with all relevant fixes. Sanitized panic #4 Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 2 Dump Length: 6764437504B (6451

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-06 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. Don't forget to add the "MFC after" tag. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbruno, imp, adrian, hselasky Cc: julian, hiren, jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net ___

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-06 Thread hiren (hiren panchasara)
hiren added a comment. Update from llnw world: Things have been pretty stable here without any panics for 24+ hours with Stable10+D1711+D1777. Thanks a lot, Randall! REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbruno, imp, adrian, hsela

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. I have created D1777 to address the nd6/arp crash separately. I am currently in the midst of testing these. REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, sbruno, imp, adrian, hselasky Cc: julian, hiren, jhb, kostikbel,

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Imp: Ok I have spent a bit of time puzzling this out. First I was mistaken, the callouts being run are either arptimer or nd6timer(function name not right). These are not using giant but the passed in lle structure rw_lock. We need to adjust these so that they check they:

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread imp (Warner Losh)
imp added a comment. So why doesn't the lle* code set mpsafe to 1? After re-reading the man page several times, I'm thinking that's the solution here. It already uses other locks and reference counts to synchronize things, so why get Giant involved at all? REVISION DETAIL https://reviews.fre

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread imp (Warner Losh)
imp added a comment. For the lle* code, it looks like the reference count for the data structure improperly doesn't cover the implicit use of the mutex by the callout system. That seems to be the real bug here, no? Protecting a mutex with a reference count without holding a reference to that mu

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
nal and the callout). It would not free here but when the callout runs it would. Now that all being said, I have put that in some netflix code and will test it.. but there is something strange about this whole lle* path that I don't yet grok. The arp code and nd6 code carefully do r

[Differential] [Updated, 241 lines] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-04 Thread rrs (Randall Stewart)
rrs updated this revision to Diff 3631. rrs added a comment. This revision now requires review to proceed. This fixes the comment as imp suggested and the indent... CHANGES SINCE LAST UPDATE https://reviews.freebsd.org/D1711?vs=3603&id=3631 REVISION DETAIL https://reviews.freebsd.org/D1711

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. julian: What do you mean by "wait a bit". Spinning or sleeping? REVISION DETAIL https://reviews.freebsd.org/D1711 To: rrs, gnn, rwatson, lstewart, jhb, kostikbel, hselasky, adrian, imp, sbruno Cc: julian, hiren, jhb, kostikbel, emaste, delphij, neel, erj, freebsd-net

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread julian (JulianElischer)
julian added a comment. let me see if this is correct.. I'm assuming that the reference count can not change if you hold the lock? if so then before releasing the lock, we can check if the count is 1.. if it is then we are the only person who even knows where this thing is so there can't be any

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Julian: The point is *exactly* that, the callout *has* a reference.. and now that the table is being flushed if the callout_stop returns 1 it thinks the callout has been stopped, which it has, which means it will not run and release its reference. Thus the lowering of the

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread hselasky (Hans Petter Selasky)
hselasky added a comment. julian: Hence a lock is used, the callback won't be called when callout_stop() returns 1. Only the mutex will still be used. Maybe a callout_reset() having 1 tick as timeout will work instead of callout_stop(). If the callout_reset() returns 0 we add a ref instead of g

[Differential] [Changed Subscribers] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other

2015-02-04 Thread julian (JulianElischer)
julian added a subscriber: julian. julian added a comment. >>! In D1711#66, @rrs wrote: > 3) The callout_stop() on CPU 2 does what it is supposed to and sets the > cc_cancel bit to true and > return 1. This causes callout_stop() to lower the reference count which > means when llentry_free(

[Differential] [Commented On] D1711: Changes to the callout code to restore active semantics and also add a test-framework and test to validate thecallout code (and potentially for use by other tests)

2015-02-04 Thread rrs (Randall Stewart)
rrs added a comment. Ok guys, I have puzzled out what that crash *may* be that was posted by Hiren. The same issue exists in the timeout code rewrite that Han's has up on the board as well. Though the callout_drain_async() may solve it if the user called that instead. Here is what is happening.

  1   2   3   4   5   >