Ugh, I did not see the other patch that was in the FreeBSD tracker! https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220611
It appears that pfSense used a similar patch https://github.com/pfsense/FreeBSD-src/commit/4204c9f01d2ab439f6e0b9454ab22d4ffcca8cc4 But it is not included in 2.4! https://github.com/pfsense/FreeBSD-src/blob/RELENG_2_4/sys/netpfil/pf/pf_norm.c#L791 Was this removed intentionally? If not, seems like a good reason to open a bug On 3 December 2017 at 02:27, Liwei <xieli...@gmail.com> wrote: > tl;dr: How do I figure out what the state of the ip6_forward (or other > associated functions) are after a crash? It is really a pain trying to > figure this out based on traffic and trying to replicate the correct > set of pfSense configurations. > > > Just an update on my adventures. It's quite complicated so I shall put > it as a list of points... > > Part 1 > 1. Spent a weekend copying the pfSense VM to a physical machine and > running it in place of the VM. > > 2. Eventually the crashes do occur, so I'm entirely convinced this is > not a VM issue > > 3. Over the last week, I've been disabling various interfaces and > services to narrow down the cause, finally it has been narrowed down > to our VPN bridge interface <LAN, OVPN1, OVPN2> > > 4. If I bring any of the VPN interfaces down/take any of the VPN > interfaces out of the bridge, or bring the bridge interface down, the > crashes stop > > > Part 2 > 1. I managed to dig up an old (closed) bug with the exact backtrace I > was getting https://redmine.pfsense.org/issues/5428 > > 2. There are two more bugs on the FreeBSD bugtracker with similar > crashes, the latest was in July this year > > 3. However, they've all been fixed with the same patch. Checked > RELENG_2_4/sys/netpfil/pf/pf.c , and the patch should be in the 2.4.X > release > > a. The network is running with jumbo packets (9k), is it possible > the patch does not cover such a case? > > 4. This similarity led me to believe that I could be facing a similar > issue, apparently with IPv6 multicast traffic > > 5. Set up port remote tcpdump so I could capture traffic right before the > crash > > 6. Isolated the traffic cause! Two conditions happening causes the crash: > > a. There is at least one VPN client connected > b. There is a macbook running Sierra/High Sierra on the main network > > 7. Each time the macbook joins the network/sleeps/wakes, the V6 > traffic, specifically a certain MDNS query, causes the crash > > 8. Now the somewhat random but consistent timing makes sense! We have > someone using a macbook come in at around 8pm every day > > 9. Isolated 2 packet specimens that causes the crash, and 2 of the > same type that does not > > a. It does contain names of our users' computers (which on a mac > contains real names), so I'm not inclined to share them on the list, > furthermore, I don't have steps to reproduce the crash with the > packets from a vanilla install, so they're of limited use > b. If anyone is interested to take a look at what the differences > between these two sets of packets are, I can email them to you > directly > > Part 3 > 1. Since I could cause the crash at will, I tried creating > reproducible steps so I can properly report this as a bug > > 2. Set up a new pfSense install, replicated the interfaces, set up > oVPN, made a single client connection > > 3. Unable to reproduce crash with clean install > > 4. Tried reproducing crash on the actual pfSense install... crashes > now not happening??!!? > > 5. It was very late into the night, no devices except my dummy oVPN > connection and test machine were online, maybe the captured MDNS > packets were not the direct cause, but the response from one of the > devices is? > > > So I'm at a loss right now. I have things narrowed down really tight > on the traffic end, but still have no way to reproduce it from a > vanilla install, nor do I know where to even begin looking for the > cause in the kernel code. > > I'll try again tomorrow to see if there is a response from some device > that is the actual cause of the crash. But some suggestions are > welcome! > > > > Liwei > > > > On 23 November 2017 at 01:17, WebDawg <webd...@gmail.com> wrote: >> The bridging may need tested and filed as a bug. >> >> On Wed, Nov 22, 2017 at 11:15 AM, Liwei <xieli...@gmail.com> wrote: >>> On Thu, 23 Nov 2017 at 00:38 WebDawg <webd...@gmail.com> wrote: >>> >>>> I am glad that you seemed to have resolved it, does the serial port >>>> get the standard kernel messages... >>>> >>> >>> It isn't really solved though as I have to take our bridged VPNs offline. >>> >>> Yes it does, but nothing relevant gets spewed out of the serial port before >>> the panic comes up. The first sign I can see on the serial port of things >>> going wrong is the kernel panic itself. >>> >>> >>>> >>>> usually you log in and tail some log files >>>> >>> >>> Got it >>> >>> >>>> >>>> (bridging our oVPN tap interfaces to the main and private LANs) >>>> >>>> This was bridging done in pfSense right? >>>> >>> >>> That's right. >>> >>> >>>> >>>> On Wed, Nov 22, 2017 at 8:07 AM, Liwei <xieli...@gmail.com> wrote: >>>> > On Tue, 21 Nov 2017 at 01:08 WebDawg <webd...@gmail.com> wrote: >>>> > >>>> >> It should work though. A great many people virtualize pfSense: >>>> >> >>>> >> https://doc.pfsense.org/index.php/PfSense_on_VMware_vSphere_/_ESXi >>>> >> >>>> >> Here is some more information: >>>> >> >>>> >> https://doc.pfsense.org/index.php/VirtIO_Driver_Support >>>> >> https://doc.pfsense.org/index.php/Lost_Traffic_/_Packets_Disappear >>>> >> https://doc.pfsense.org/index.php/Virtualizing_pfSense_on_Proxmox >>>> >> >>>> >> I know what it is like to ask for support and see people stop helping >>>> >> because something is virtualized. I have seen bad code fail in >>>> >> virtualization situations only to here 'do not virtualize'. >>>> >> >>>> >> From what I know, BSD has trouble with NIC interfaces and such. Do >>>> >> you have any limiters or QOS installed? I would take a look at the >>>> >> nic interfaces first. Can you actively monitor the log to look for >>>> >> errors once the VM is booted? >>>> >> >>>> >> I virtualized pfSense on proxmox about a year ago and BSD hated the >>>> >> cpu timers and such. I would get so many issues from it until I >>>> >> figured it out but everything was plain as day in the kernel messages >>>> >> that were outputted. >>>> >> >>>> >> There is an ova file available via the gold subscription: >>>> >> >>>> >> https://doc.pfsense.org/index.php/VMware_Appliance >>>> >> >>>> >> You need to get more information for me to help further. It would be >>>> >> great to get a copy of some logs. >>>> >> >>>> >> Here is a XenServer thread: >>>> >> https://forum.pfsense.org/index.php?topic=88467 >>>> >> >>>> >> Last time I virtualized the big deal was hvm nic vs pvhvm NIC. You >>>> >> could do limiters on one (I think hvm) but the NIC's become CPU bound >>>> >> because of how HVM works. I could only push like 10-30 mbits out of >>>> >> an i3 processor. >>>> >> >>>> >> I do not know if this has been solved, or if it is solvable. pfSense >>>> >> follows FreeBSD so most of the fixes for this come from FreeBSD, >>>> >> though pfSense had/has some of its own kernel hacks. >>>> >> >>>> >> >>>> >> >>>> > Hi Vick, thanks for the assistance, nonetheless! >>>> > >>>> > Hi WebDawg, >>>> > Yeah, I guessed as much that the problem should be on my side, >>>> because >>>> > something this fatal should already be widely reported. >>>> > >>>> > I don't have any limiters or QoS set. I've set up logging of the >>>> serial >>>> > port so at least I know what are the events leading up to the crash. >>>> > Nothing interesting though, it just... happens. How do I set up log >>>> > monitoring? My guess is I'll probably have to turn on remote syslog and >>>> log >>>> > over. Will set up when I get the chance. >>>> > >>>> > The odd thing is this is a 7+ years old setup (but we did do a fresh >>>> > install of 2.3 when we upgraded hardware 1+ years ago), and we never had >>>> > any serious issues. In fact it was purring along nicely on 2.3 since it >>>> was >>>> > first installed, until we upgraded to 2.4. >>>> > >>>> > I'm pretty confident of the hardware since it is only a year old, the >>>> > other VMs are not having any issues, and reverting to 2.3 works fine. >>>> Thus >>>> > based on a hunch I decided to remove a couple of bridge interfaces >>>> > (bridging our oVPN tap interfaces to the main and private LANs) when I >>>> sent >>>> > my first email to the list. >>>> > >>>> > The crashes haven't occurred since then for 2 days. I'm not sure if >>>> it >>>> > is a coincidence or not, but it does seem like my configuration may be >>>> > triggering some bug. Or I may have mis-configured something. >>>> > >>>> > I'll continue to iterate things around to narrow down the problem, >>>> but >>>> > given that I have to wait a few days after each change to be sure on >>>> > whether it crashes or not, any suggestion is very welcome! >>>> > >>>> > Warm regards, >>>> > Liwei _______________________________________________ pfSense mailing list https://lists.pfsense.org/mailman/listinfo/list Support the project with Gold! https://pfsense.org/gold