> Only a quick look ...
>
> There is no guarantee, that the ports of the UDP packets are not modified by
> libalias (NAT is designed to do exactly this modification). So some of the
> matches seems to be a bit optimistic,
>
> > - This system has net.inet.ip.fw.one_pass=0
>
>
> man ipfw
> To let the packet continue after being (de)aliased, set the sysctl
> variable net.inet.ip.fw.one_pass to 0. For more information about
> aliasing modes, refer to libalias(3).
>
> Hence the NAT is applied multiple times if the path through the rules is a
> bit unlucky.
>
Thank you for your response.
Thanks for bringing up this point about ports. I had not thought about it.
However, I'm not sure exactly what you mean here. redirect_port should not
change the destination port of incoming packets, and if I am not mistaken, rule
452 should allow all relevant incoming packets through (after they have been
processed by NAT). Unless I have made a foolish error, rules 450-452 specify
destination ports.
On the other hand, since we are forwarding it's true that incoming and outgoing
packets are evaluated by the firewall twice in this case, once at the external
interface and once at bridge0 (or on the epair, I'm not sure which, I think
I've
seen both). I don't see how that could be causing an issue, since even when the
packet it as the bridge, it should still match "via $extif", since "recv
$extif" is still true. So it would still match 450-452.
Though, I can't rule out that I have a major misunderstanding about how IPFW
works- it has happened before. In fact, as I do some further experimenting, I'm
starting to doubt whether what I said above is correct.
>
> The traces show, that the problematic cases are those where the packets are
> not (de)aliased. This can be the case, when libalias has no more free ports
> available for aliasing. In such a case, the packet is returned unmodified
> (unaliased) with an error code. I'm not sure, if this will cause a packet
> drop or not, especially in the one_pass=0 case.
>
> It might be possible, that duplicate packets (or quickly repeated ones)
> trigger an unintended aliasing of the source port. This will create an flow
> in the NAT table which is handled before the port redirection. And it might
> miss the rules with explicit port numbers.
>
> But this will be probably the wrong idea.
I am intrigued by this idea of unintended creation of NAT flows. It's not
something I am an expert in by any means. However, I do not think source ports
are changing here under any circumstances, because I have never witnessed a
packet trace with any ports aside from 500 and 4500.
But, what you have said about NAT flows being inadvertently created is still
interesting, and I had not thought about it. It sounds like it could be a
factor. I will experiment further. Is there a good way to examine the contents
of this table?
I will also mention that while this overall setup was working properly prior to
my upgrade to 12.3, I did not have rules 450-452 specified explicitly as I do
here. I had placed them here early on in an attempt to fix the issue.
Prior to the upgrade, all NAT was handled in 500-540.
More information / report on today's observations:
I'm not sure if any of this information is useful, but here it is in case it
provides any clues.
This issue has actually been happening more frequently now that I've started to
experiment with it more and also after moving some traffic off of this host.
It happened again today, and I was actually able to start natd (previously I
had an error, but I've now invoked it in the foreground using -v). Specifying a
divert rule for this natd instance on rule 445 fixed the issue, but only for
about 20 minutes. As I was experimenting to try to see if my rules were wrong,
it started to work again, apparently not due to any experimental changes I had
made, since after eliminating these changes, it still continued to work as
expected.
I also witnessed something quite extraordinary and, to me, inexplicable. So
far, I've been talking about a specific host that has been having this problem.
I've referred to its IP address as 1.1.1.1, and for my packet traces, I've been
making reference to an external host whose packets often have this issue on
1.1.1.1, calling that external host 2.2.2.2; this is the external host whose
packets are in the packet traces I posted.
Just now, as I was doing some experimentation on 1.1.1.1 as mentioned above,
the same issue was produced on 2.2.2.2 (with other hosts on my network), even
though 2.2.2.2 has not experienced this issue for 6 months or more.
Incidentally,
2.2.2.2 sends and receives far, far more UDP on 500,4500 than 1.1.1.1. The only
distinguishable thing that I did on 2.2.2.2 before the issue occurred was to
try to initiate an outgoing IKE connection repeatedly to 1.1.1.1, as I was
experimenting on 1.1.1.1. But I can't imagine that this is the first time that
I've done that.