Ugh, I did not see the other patch that was in the FreeBSD tracker!

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=220611

It appears that pfSense used a similar patch
https://github.com/pfsense/FreeBSD-src/commit/4204c9f01d2ab439f6e0b9454ab22d4ffcca8cc4

But it is not included in 2.4!

https://github.com/pfsense/FreeBSD-src/blob/RELENG_2_4/sys/netpfil/pf/pf_norm.c#L791

Was this removed intentionally? If not, seems like a good reason to open a bug

On 3 December 2017 at 02:27, Liwei <xieli...@gmail.com> wrote:
> tl;dr: How do I figure out what the state of the ip6_forward (or other
> associated functions) are after a crash? It is really a pain trying to
> figure this out based on traffic and trying to replicate the correct
> set of pfSense configurations.
>
>
> Just an update on my adventures. It's quite complicated so I shall put
> it as a list of points...
>
> Part 1
> 1. Spent a weekend copying the pfSense VM to a physical machine and
> running it in place of the VM.
>
> 2. Eventually the crashes do occur, so I'm entirely convinced this is
> not a VM issue
>
> 3. Over the last week, I've been disabling various interfaces and
> services to narrow down the cause, finally it has been narrowed down
> to our VPN bridge interface <LAN, OVPN1, OVPN2>
>
> 4. If I bring any of the VPN interfaces down/take any of the VPN
> interfaces out of the bridge, or bring the bridge interface down, the
> crashes stop
>
>
> Part 2
> 1. I managed to dig up an old (closed) bug with the exact backtrace I
> was getting https://redmine.pfsense.org/issues/5428
>
> 2. There are two more bugs on the FreeBSD bugtracker with similar
> crashes, the latest was in July this year
>
> 3. However, they've all been fixed with the same patch. Checked
> RELENG_2_4/sys/netpfil/pf/pf.c , and the patch should be in the 2.4.X
> release
>
>     a. The network is running with jumbo packets (9k), is it possible
> the patch does not cover such a case?
>
> 4. This similarity led me to believe that I could be facing a similar
> issue, apparently with IPv6 multicast traffic
>
> 5. Set up port remote tcpdump so I could capture traffic right before the 
> crash
>
> 6. Isolated the traffic cause! Two conditions happening causes the crash:
>
>     a. There is at least one VPN client connected
>     b. There is a macbook running Sierra/High Sierra on the main network
>
> 7. Each time the macbook joins the network/sleeps/wakes, the V6
> traffic, specifically a certain MDNS query, causes the crash
>
> 8. Now the somewhat random but consistent timing makes sense! We have
> someone using a macbook come in at around 8pm every day
>
> 9. Isolated 2 packet specimens that causes the crash, and 2 of the
> same type that does not
>
>     a. It does contain names of our users' computers (which on a mac
> contains real names), so I'm not inclined to share them on the list,
> furthermore, I don't have steps to reproduce the crash with the
> packets from a vanilla install, so they're of limited use
>     b. If anyone is interested to take a look at what the differences
> between these two sets of packets are, I can email them to you
> directly
>
> Part 3
> 1. Since I could cause the crash at will, I tried creating
> reproducible steps so I can properly report this as a bug
>
> 2. Set up a new pfSense install, replicated the interfaces, set up
> oVPN, made a single client connection
>
> 3. Unable to reproduce crash with clean install
>
> 4. Tried reproducing crash on the actual pfSense install... crashes
> now not happening??!!?
>
> 5. It was very late into the night, no devices except my dummy oVPN
> connection and test machine were online, maybe the captured MDNS
> packets were not the direct cause, but the response from one of the
> devices is?
>
>
> So I'm at a loss right now. I have things narrowed down really tight
> on the traffic end, but still have no way to reproduce it from a
> vanilla install, nor do I know where to even begin looking for the
> cause in the kernel code.
>
> I'll try again tomorrow to see if there is a response from some device
> that is the actual cause of the crash. But some suggestions are
> welcome!
>
>
>
> Liwei
>
>
>
> On 23 November 2017 at 01:17, WebDawg <webd...@gmail.com> wrote:
>> The bridging may need tested and filed as a bug.
>>
>> On Wed, Nov 22, 2017 at 11:15 AM, Liwei <xieli...@gmail.com> wrote:
>>> On Thu, 23 Nov 2017 at 00:38 WebDawg <webd...@gmail.com> wrote:
>>>
>>>> I am glad that you seemed to have resolved it, does the serial port
>>>> get the standard kernel messages...
>>>>
>>>
>>> It isn't really solved though as I have to take our bridged VPNs offline.
>>>
>>> Yes it does, but nothing relevant gets spewed out of the serial port before
>>> the panic comes up. The first sign I can see on the serial port of things
>>> going wrong is the kernel panic itself.
>>>
>>>
>>>>
>>>> usually you log in and tail some log files
>>>>
>>>
>>> Got it
>>>
>>>
>>>>
>>>> (bridging our oVPN tap interfaces to the main and private LANs)
>>>>
>>>> This was bridging done in pfSense right?
>>>>
>>>
>>> That's right.
>>>
>>>
>>>>
>>>> On Wed, Nov 22, 2017 at 8:07 AM, Liwei <xieli...@gmail.com> wrote:
>>>> > On Tue, 21 Nov 2017 at 01:08 WebDawg <webd...@gmail.com> wrote:
>>>> >
>>>> >> It should work though.  A great many people virtualize pfSense:
>>>> >>
>>>> >> https://doc.pfsense.org/index.php/PfSense_on_VMware_vSphere_/_ESXi
>>>> >>
>>>> >> Here is some more information:
>>>> >>
>>>> >> https://doc.pfsense.org/index.php/VirtIO_Driver_Support
>>>> >> https://doc.pfsense.org/index.php/Lost_Traffic_/_Packets_Disappear
>>>> >> https://doc.pfsense.org/index.php/Virtualizing_pfSense_on_Proxmox
>>>> >>
>>>> >> I know what it is like to ask for support and see people stop helping
>>>> >> because something is virtualized.  I have seen bad code fail in
>>>> >> virtualization situations only to here 'do not virtualize'.
>>>> >>
>>>> >> From what I know, BSD has trouble with NIC interfaces and such.  Do
>>>> >> you have any limiters or QOS installed?  I would take a look at the
>>>> >> nic interfaces first.  Can you actively monitor the log to look for
>>>> >> errors once the VM is booted?
>>>> >>
>>>> >> I virtualized pfSense on proxmox about a year ago and BSD hated the
>>>> >> cpu timers and such.  I would get so many issues from it until I
>>>> >> figured it out but everything was plain as day in the kernel messages
>>>> >> that were outputted.
>>>> >>
>>>> >> There is an ova file available via the gold subscription:
>>>> >>
>>>> >> https://doc.pfsense.org/index.php/VMware_Appliance
>>>> >>
>>>> >> You need to get more information for me to help further.  It would be
>>>> >> great to get a copy of some logs.
>>>> >>
>>>> >> Here is a XenServer thread:
>>>> >> https://forum.pfsense.org/index.php?topic=88467
>>>> >>
>>>> >> Last time I virtualized the big deal was hvm nic vs pvhvm NIC.  You
>>>> >> could do limiters on one (I think hvm) but the NIC's become CPU bound
>>>> >> because of how HVM works.  I could only push like 10-30 mbits out of
>>>> >> an i3 processor.
>>>> >>
>>>> >> I do not know if this has been solved, or if it is solvable.  pfSense
>>>> >> follows FreeBSD so most of the fixes for this come from FreeBSD,
>>>> >> though pfSense had/has some of its own kernel hacks.
>>>> >>
>>>> >>
>>>> >>
>>>> > Hi Vick, thanks for the assistance, nonetheless!
>>>> >
>>>> > Hi WebDawg,
>>>> >     Yeah, I guessed as much that the problem should be on my side,
>>>> because
>>>> > something this fatal should already be widely reported.
>>>> >
>>>> >     I don't have any limiters or QoS set. I've set up logging of the
>>>> serial
>>>> > port so at least I know what are the events leading up to the crash.
>>>> > Nothing interesting though, it just... happens. How do I set up log
>>>> > monitoring? My guess is I'll probably have to turn on remote syslog and
>>>> log
>>>> > over. Will set up when I get the chance.
>>>> >
>>>> >     The odd thing is this is a 7+ years old setup (but we did do a fresh
>>>> > install of 2.3 when we upgraded hardware 1+ years ago), and we never had
>>>> > any serious issues. In fact it was purring along nicely on 2.3 since it
>>>> was
>>>> > first installed, until we upgraded to 2.4.
>>>> >
>>>> >     I'm pretty confident of the hardware since it is only a year old, the
>>>> > other VMs are not having any issues, and reverting to 2.3 works fine.
>>>> Thus
>>>> > based on a hunch I decided to remove a couple of bridge interfaces
>>>> > (bridging our oVPN tap interfaces to the main and private LANs) when I
>>>> sent
>>>> > my first email to the list.
>>>> >
>>>> >     The crashes haven't occurred since then for 2 days. I'm not sure if
>>>> it
>>>> > is a coincidence or not, but it does seem like my configuration may be
>>>> > triggering some bug. Or I may have mis-configured something.
>>>> >
>>>> >     I'll continue to iterate things around to narrow down the problem,
>>>> but
>>>> > given that I have to wait a few days after each change to be sure on
>>>> > whether it crashes or not, any suggestion is very welcome!
>>>> >
>>>> > Warm regards,
>>>> > Liwei
_______________________________________________
pfSense mailing list
https://lists.pfsense.org/mailman/listinfo/list
Support the project with Gold! https://pfsense.org/gold

Reply via email to