pf route-to issues

David Gwynne Sun, 18 Oct 2020 22:37:04 -0700

every few years i try and use route-to in pf, and every time it
goes badly. i tried it again last week in a slightly different
setting, and actually tried to understand the sharp edges i hit
this time instead of giving up. it turns out there are 2 or 3
different things together that have cause me trouble, which is why
the diff below is so big.


the first and i would argue most fundamental problem is a semantic
problem. if you ask a random person who has some clue about networks
and routing what they would expect the "argument" to route-to or
reply-to to be, they would say "a nexthop address" or "a gateway
address". eg, say i want to force packets to a specific backend
server without using NAT, i would write a rule like this:

  n_servers="192.0.2.128/27"
  pass out on $if_internal to $n_servers route-to 192.168.0.1

pfctl will happily parse this, shove it into the kernel, let you read
the rules back out again with pfctl -sr, and it all looks plausible, but
it turns out that it's using the argument to route-to as an interface
name. because rulesets can refer to interfaces that don't exist yet, pf
just passes the IP address around as a string, hoping i'll plug in an
interface with a driver name that looks like an ip address. i spent
literally a day trying to figure out why a rule like this wasn't
working.

i happened to be talking to pascoe@ at the time, and his vague memory
was that the idea was to try and switch the interface a packet was going
to travel over, but to try and reuse the arp lookup from the parent one.
neither of us could figure out why that would be a good idea though.

the best i can say about this is that it only really makes some
kind of sense if you're moving a packet into a tunnel. tunnels don't
really care about nexthops and will happily route anything you give
them. if you were trying to add a route to the routing table to do this,
you'd be specifying the peer address on a tunnel interface as the
gateway. pf has a if0:peer syntax that makes this convenient to write.

so i want to change route-to in pfctl so it takes a nexthop instead
of an interface. you could argue that pf already lets you do this,
because there's some bs nexthop@interface syntax. my counter argument
is that the interface the nexthop is reachable over is redundant, and it
makes fixing some of the other problems harder if we keep it.

the second and third problems i hit are when route-to is used on
a pair of boxes that have pfsync and pfsync defer set up. when defer
is enabled, pfsync takes the packet away from the forwarding path,
and when it has some confidence that the peer is aware of the state,
then it tries to push the packet back out.

to understand the following, be aware that route-to, reply-to, and
dup-to are implemented in pf in a pair of functions called pf_route
and pf_route6. if i say pf_route, just assume i'm talking about
both of these functions.

the second problem is that the pf_route calls from pfsync don't
have all the information it is supposed to have. more specifically,
an ifp pointer isn't set which leads to a segfault. the ifp pointer
isn't set because pfsync doesnt track which interface a packet is
going out, it assumes the ip layer will get it right again later, or a
rule provided something usable.

the third problem is that pf_route relies on information from rules to
work correctly. this is a problem in a pfsync environment because you
cannot have the same ruleset on both firewalls 100% of the time, which
means you cannot have route-to/reply-to behave consistently on a pair of
firwalls 100% of the time.

my solution to both these problems is reduce the amount of information
pf_route needs to work with, to make sure that the info it does need
is in the pf state structure, and that pfsync handles it properly.

if we limit the information needed for pf_route to a nexthop address,
and which direction the address is used, this is doable. both the
pf_state and pfsync_state structs already contain an address to store a
nexthop in, i just had to move the route-to direction from the rule into
the state. this is easy with pf_state, but i used a spare pad field in
pfsync_state for this.

the pf_state struct has had which interface the route is using
removed. there's no simple way to sync interface information between
pfsync peers on the wire, and the need for them is marginal at best.
things are much simpler if we can get away with not having this info.

a bonus problem i hit is that there's code in pf_match that
appears to try and short circuit some processing of states when
route-to/reply-to is in effect. this has two consequences. first,
if you're using route-to with tcp states, half the tcp state machine
is is skipped. when you look at these states with pfctl -vvss,
one half of the TCP state never moves forward. secondly, because
the processing is short circuited, it never falls through to the
end of pf_test where the actual call to pf_route is done. so the
first packet is properly handled by pf_route, but none of the packets
after that.

all of this together makes things work pretty obviously and smoothly.
in my opinion anyway. route-to now works more like rdr-to, it just
feels like it changes the address used for the route lookup rather
than changing the actual IP address in the packet. it also works
predictably in a pfsync pair, which is great from the point of view of
high availability.

the main caveat is that it's not backward compatible. if you're already
using route-to, you will need to tweak your rules to have them parse.
however, i doubt anyone is using this stuff because it feels very broken
to me.

the overarching use case for me with this stuff is i want a frontend
router/load balancers to have "sticky" routes for connections to a pool
of backend servers. eg, i have webservers that have a bunch of IP
addresses bound to lo1 interfaces. all the webservers have the same set
of IPs on lo1 interface. with this i can write rules like this:

  pass out on $if_web to 203.0.113.0/27 route-to 192.168.0.6/31

these webservers cannot take over from each other because they actually
terminate tcp connections for these IP addresses. the normal
statelessness of routes where a different router can handle them like
any other router doesnt apply in this situation. once the frontend has
picked a backend route target, i want the route to stick to the backend.
pf is the mechanism im using to create and apply that stickiness.

im also looking at having frontends load balance traffic over backends
that are accessible over gre tunnels. in that case i just want to tie
the routes to the interface that was already picked by the routing
table. that would look like:

  n_anycast="203.0.113.128/31"
  pass out on gre0 to $n_anycast route-to gre0:peer
  pass out on gre1 to $n_anycast route-to gre1:peer

without this diff i need to write something like this:

  pass out on gre0 to $n_anycast route-to gre0:peer@gre0
  pass out on gre1 to $n_anycast route-to gre1:peer@gre1

and then it panics anyway cos pf_route via pfsync blows up. and i can't
have the frontend failover reliably to a peer.

thoughts?

there's some further cleanup that could be done in pfctl if this is
allowed to proceed.

Index: sbin/pfctl/parse.y
===================================================================
RCS file: /cvs/src/sbin/pfctl/parse.y,v
retrieving revision 1.704
diff -u -p -r1.704 parse.y
--- sbin/pfctl/parse.y  1 Oct 2020 14:02:08 -0000       1.704
+++ sbin/pfctl/parse.y  19 Oct 2020 01:56:50 -0000
@@ -276,6 +276,7 @@ struct filter_opts {
        struct redirspec         nat;
        struct redirspec         rdr;
        struct redirspec         rroute;
+       u_int8_t                 rt;
 
        /* scrub opts */
        int                      nodf;
@@ -284,15 +285,6 @@ struct filter_opts {
        int                      randomid;
        int                      max_mss;
 
-       /* route opts */
-       struct {
-               struct node_host        *host;
-               u_int8_t                 rt;
-               u_int8_t                 pool_opts;
-               sa_family_t              af;
-               struct pf_poolhashkey   *key;
-       }                        route;
-
        struct {
                u_int32_t       limit;
                u_int32_t       seconds;
@@ -517,7 +509,6 @@ int parseport(char *, struct range *r, i
 %type  <v.host>                ipspec xhost host dynaddr host_list
 %type  <v.host>                table_host_list tablespec
 %type  <v.host>                redir_host_list redirspec
-%type  <v.host>                route_host route_host_list routespec
 %type  <v.os>                  os xos os_list
 %type  <v.port>                portspec port_list port_item
 %type  <v.uid>                 uids uid_list uid_item
@@ -974,7 +965,7 @@ anchorrule  : ANCHOR anchorname dir quick
                                YYERROR;
                        }
 
-                       if ($9.route.rt) {
+                       if ($9.rt) {
                                yyerror("cannot specify route handling "
                                    "on anchors");
                                YYERROR;
@@ -1842,37 +1833,18 @@ pfrule          : action dir logquick interface 
                        decide_address_family($7.src.host, &r.af);
                        decide_address_family($7.dst.host, &r.af);
 
-                       if ($8.route.rt) {
+                       if ($8.rt) {
+                               if ($8.rt != PF_DUPTO && !r.direction) {
+                                       yyerror("direction must be explicit"
+                                           " with rules that specify routing");
+                                       YYERROR;
+                               }
                                if (!r.direction) {
                                        yyerror("direction must be explicit "
                                            "with rules that specify routing");
                                        YYERROR;
                                }
-                               r.rt = $8.route.rt;
-                               r.route.opts = $8.route.pool_opts;
-                               if ($8.route.key != NULL)
-                                       memcpy(&r.route.key, $8.route.key,
-                                           sizeof(struct pf_poolhashkey));
-                       }
-                       if (r.rt) {
-                               decide_address_family($8.route.host, &r.af);
-                               if ((r.route.opts & PF_POOL_TYPEMASK) ==
-                                   PF_POOL_NONE && ($8.route.host->next != 
NULL ||
-                                   $8.route.host->addr.type == PF_ADDR_TABLE ||
-                                   DYNIF_MULTIADDR($8.route.host->addr)))
-                                       r.route.opts |= PF_POOL_ROUNDROBIN;
-                               if ($8.route.host->next != NULL) {
-                                       if (!PF_POOL_DYNTYPE(r.route.opts)) {
-                                               yyerror("address pool option "
-                                                   "not supported by type");
-                                               YYERROR;
-                                       }
-                               }
-                               /* fake redirspec */
-                               if (($8.rroute.rdr = calloc(1,
-                                   sizeof(*$8.rroute.rdr))) == NULL)
-                                       err(1, "$8.rroute.rdr");
-                               $8.rroute.rdr->host = $8.route.host;
+                               r.rt = $8.rt;
                        }
 
                        if (expand_divertspec(&r, &$8.divert))
@@ -2136,30 +2108,14 @@ filter_opt      : USER uids {
                            sizeof(filter_opts.nat.pool_opts));
                        filter_opts.nat.pool_opts.staticport = 1;
                }
-               | ROUTETO routespec pool_opts {
-                       filter_opts.route.host = $2;
-                       filter_opts.route.rt = PF_ROUTETO;
-                       filter_opts.route.pool_opts = $3.type | $3.opts;
-                       memcpy(&filter_opts.rroute.pool_opts, &$3,
-                           sizeof(filter_opts.rroute.pool_opts));
-                       if ($3.key != NULL)
-                               filter_opts.route.key = $3.key;
+               | ROUTETO routespec {
+                       filter_opts.rt = PF_ROUTETO;
                }
-               | REPLYTO routespec pool_opts {
-                       filter_opts.route.host = $2;
-                       filter_opts.route.rt = PF_REPLYTO;
-                       filter_opts.route.pool_opts = $3.type | $3.opts;
-                       if ($3.key != NULL)
-                               filter_opts.route.key = $3.key;
-               }
-               | DUPTO routespec pool_opts {
-                       filter_opts.route.host = $2;
-                       filter_opts.route.rt = PF_DUPTO;
-                       filter_opts.route.pool_opts = $3.type | $3.opts;
-                       memcpy(&filter_opts.rroute.pool_opts, &$3,
-                           sizeof(filter_opts.rroute.pool_opts));
-                       if ($3.key != NULL)
-                               filter_opts.route.key = $3.key;
+               | REPLYTO routespec {
+                       filter_opts.rt = PF_REPLYTO;
+               }
+               | DUPTO routespec {
+                       filter_opts.rt = PF_DUPTO;
                }
                | not RECEIVEDON if_item {
                        if (filter_opts.rcv) {
@@ -3733,122 +3689,16 @@ pool_opt       : BITMASK       {
                }
                ;
 
-route_host     : STRING                        {
-                       /* try to find @if0 address specs */
-                       if (strrchr($1, '@') != NULL) {
-                               if (($$ = host($1, pf->opts)) == NULL)  {
-                                       yyerror("invalid host for route spec");
-                                       YYERROR;
-                               }
-                               free($1);
-                       } else {
-                               $$ = calloc(1, sizeof(struct node_host));
-                               if ($$ == NULL)
-                                       err(1, "route_host: calloc");
-                               $$->ifname = $1;
-                               $$->addr.type = PF_ADDR_NONE;
-                               set_ipmask($$, 128);
-                               $$->next = NULL;
-                               $$->tail = $$;
-                       }
-               }
-               | STRING '/' STRING             {
-                       char    *buf;
-
-                       if (asprintf(&buf, "%s/%s", $1, $3) == -1)
-                               err(1, "host: asprintf");
-                       free($1);
-                       if (($$ = host(buf, pf->opts)) == NULL) {
-                               /* error. "any" is handled elsewhere */
-                               free(buf);
-                               yyerror("could not parse host specification");
-                               YYERROR;
-                       }
-                       free(buf);
-               }
-               | '<' STRING '>'        {
-                       if (strlen($2) >= PF_TABLE_NAME_SIZE) {
-                               yyerror("table name '%s' too long", $2);
-                               free($2);
-                               YYERROR;
-                       }
-                       $$ = calloc(1, sizeof(struct node_host));
-                       if ($$ == NULL)
-                               err(1, "host: calloc");
-                       $$->addr.type = PF_ADDR_TABLE;
-                       if (strlcpy($$->addr.v.tblname, $2,
-                           sizeof($$->addr.v.tblname)) >=
-                           sizeof($$->addr.v.tblname))
-                               errx(1, "host: strlcpy");
-                       free($2);
-                       $$->next = NULL;
-                       $$->tail = $$;
-               }
-               | dynaddr '/' NUMBER            {
-                       struct node_host        *n;
-
-                       if ($3 < 0 || $3 > 128) {
-                               yyerror("bit number too big");
-                               YYERROR;
-                       }
-                       $$ = $1;
-                       for (n = $1; n != NULL; n = n->next)
-                               set_ipmask(n, $3);
-               }
-               | '(' STRING host ')'           {
-                       struct node_host        *n;
-
-                       $$ = $3;
-                       /* XXX check masks, only full mask should be allowed */
-                       for (n = $3; n != NULL; n = n->next) {
-                               if ($$->ifname) {
-                                       yyerror("cannot specify interface twice 
"
-                                           "in route spec");
-                                       YYERROR;
-                               }
-                               if (($$->ifname = strdup($2)) == NULL)
-                                       errx(1, "host: strdup");
-                       }
-                       free($2);
-               }
-               ;
-
-route_host_list        : route_host optweight optnl            { 
-                       if ($2 > 0) {
-                               struct node_host        *n;
-                               for (n = $1; n != NULL; n = n->next)
-                                       n->weight = $2;
-                       }
-                       $$ = $1;
-               }
-               | route_host_list comma route_host optweight optnl {
-                       if ($1->af == 0)
-                               $1->af = $3->af;
-                       if ($1->af != $3->af) {
-                               yyerror("all pool addresses must be in the "
-                                   "same address family");
+routespec      : redirpool pool_opts {
+                       if (filter_opts.rt != PF_NOPFROUTE) {
+                               yyerror("cannot respecify "
+                                   "route-to/reply-to/dup-to");
                                YYERROR;
                        }
-                       $1->tail->next = $3;
-                       $1->tail = $3->tail;
-                       if ($4 > 0) {
-                               struct node_host        *n;
-                               for (n = $3; n != NULL; n = n->next)
-                                       n->weight = $4;
-                       }
-                       $$ = $1;
-               }
-               ;
-
-routespec      : route_host optweight                  {
-                       if ($2 > 0) {
-                               struct node_host        *n;
-                               for (n = $1; n != NULL; n = n->next)
-                                       n->weight = $2;
-                       }
-                       $$ = $1;
+                       filter_opts.rroute.rdr = $1;
+                       memcpy(&filter_opts.rroute.pool_opts, &$2,
+                           sizeof(filter_opts.rroute.pool_opts));
                }
-               | '{' optnl route_host_list '}' { $$ = $3; }
                ;
 
 timeout_spec   : STRING NUMBER
@@ -4709,7 +4559,7 @@ expand_rule(struct pf_rule *r, int keepr
 
                error += collapse_redirspec(&r->rdr, r, rdr, 0);
                error += collapse_redirspec(&r->nat, r, nat, 0);
-               error += collapse_redirspec(&r->route, r, rroute, 1);
+               error += collapse_redirspec(&r->route, r, rroute, 0);
 
                /* disallow @if in from or to for the time being */
                if ((src_host->addr.type == PF_ADDR_ADDRMASK &&
@@ -5955,7 +5805,7 @@ filteropts_to_rule(struct pf_rule *r, st
                yyerror("af-to can only be used with direction in");
                return (1);
        }
-       if ((opts->marker & FOM_AFTO) && opts->route.rt) {
+       if ((opts->marker & FOM_AFTO) && opts->rt) {
                yyerror("af-to cannot be used together with "
                    "route-to, reply-to, dup-to");
                return (1);
Index: sys/net/if_pfsync.c
===================================================================
RCS file: /cvs/src/sys/net/if_pfsync.c,v
retrieving revision 1.278
diff -u -p -r1.278 if_pfsync.c
--- sys/net/if_pfsync.c 24 Aug 2020 15:30:58 -0000      1.278
+++ sys/net/if_pfsync.c 19 Oct 2020 01:56:57 -0000
@@ -612,7 +612,8 @@ pfsync_state_import(struct pfsync_state 
        st->rtableid[PF_SK_STACK] = ntohl(sp->rtableid[PF_SK_STACK]);
 
        /* copy to state */
-       bcopy(&sp->rt_addr, &st->rt_addr, sizeof(st->rt_addr));
+       st->rt_addr = sp->rt_addr;
+       st->rt = sp->rt;
        st->creation = getuptime() - ntohl(sp->creation);
        st->expire = getuptime();
        if (ntohl(sp->expire)) {
@@ -643,7 +644,6 @@ pfsync_state_import(struct pfsync_state 
 
        st->rule.ptr = r;
        st->anchor.ptr = NULL;
-       st->rt_kif = NULL;
 
        st->pfsync_time = getuptime();
        st->sync_state = PFSYNC_S_NONE;
@@ -1843,6 +1843,7 @@ pfsync_undefer(struct pfsync_deferral *p
 {
        struct pfsync_softc *sc = pfsyncif;
        struct pf_pdesc pdesc;
+       struct pf_state *s = pd->pd_st;
 
        NET_ASSERT_LOCKED();
 
@@ -1852,35 +1853,33 @@ pfsync_undefer(struct pfsync_deferral *p
        TAILQ_REMOVE(&sc->sc_deferrals, pd, pd_entry);
        sc->sc_deferred--;
 
-       CLR(pd->pd_st->state_flags, PFSTATE_ACK);
+       CLR(s->state_flags, PFSTATE_ACK);
        if (drop)
                m_freem(pd->pd_m);
        else {
-               if (pd->pd_st->rule.ptr->rt == PF_ROUTETO) {
+               if (s->rt == PF_ROUTETO) {
                        if (pf_setup_pdesc(&pdesc,
-                           pd->pd_st->key[PF_SK_WIRE]->af,
-                           pd->pd_st->direction, pd->pd_st->rt_kif,
+                           s->key[PF_SK_WIRE]->af,
+                           s->direction, s->kif,
                            pd->pd_m, NULL) != PF_PASS) {
                                m_freem(pd->pd_m);
                                goto out;
                        }
-                       switch (pd->pd_st->key[PF_SK_WIRE]->af) {
+                       switch (s->key[PF_SK_WIRE]->af) {
                        case AF_INET:
-                               pf_route(&pdesc,
-                                   pd->pd_st->rule.ptr, pd->pd_st);
+                               pf_route(&pdesc, s);
                                break;
 #ifdef INET6
                        case AF_INET6:
-                               pf_route6(&pdesc,
-                                   pd->pd_st->rule.ptr, pd->pd_st);
+                               pf_route6(&pdesc, s);
                                break;
 #endif /* INET6 */
                        default:
-                               unhandled_af(pd->pd_st->key[PF_SK_WIRE]->af);
+                               unhandled_af(s->key[PF_SK_WIRE]->af);
                        }
                        pd->pd_m = pdesc.m;
                } else {
-                       switch (pd->pd_st->key[PF_SK_WIRE]->af) {
+                       switch (s->key[PF_SK_WIRE]->af) {
                        case AF_INET:
                                ip_output(pd->pd_m, NULL, NULL, 0, NULL, NULL,
                                    0);
@@ -1892,12 +1891,12 @@ pfsync_undefer(struct pfsync_deferral *p
                                break;
 #endif /* INET6 */
                        default:
-                               unhandled_af(pd->pd_st->key[PF_SK_WIRE]->af);
+                               unhandled_af(s->key[PF_SK_WIRE]->af);
                        }
                }
        }
  out:
-       pf_state_unref(pd->pd_st);
+       pf_state_unref(s);
        pool_put(&sc->sc_pool, pd);
 }
 
Index: sys/net/pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.1094
diff -u -p -r1.1094 pf.c
--- sys/net/pf.c        24 Jul 2020 18:17:15 -0000      1.1094
+++ sys/net/pf.c        19 Oct 2020 01:56:57 -0000
@@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc
        }
 
        *state = s;
-       if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif &&
-           ((s->rule.ptr->rt == PF_ROUTETO &&
-           s->rule.ptr->direction == PF_OUT) ||
-           (s->rule.ptr->rt == PF_REPLYTO &&
-           s->rule.ptr->direction == PF_IN)))
-               return (PF_PASS);
 
        return (PF_MATCH);
 }
@@ -1186,7 +1180,8 @@ pf_state_export(struct pfsync_state *sp,
 
        /* copy from state */
        strlcpy(sp->ifname, st->kif->pfik_name, sizeof(sp->ifname));
-       memcpy(&sp->rt_addr, &st->rt_addr, sizeof(sp->rt_addr));
+       sp->rt = st->rt;
+       sp->rt_addr = st->rt_addr;
        sp->creation = htonl(getuptime() - st->creation);
        expire = pf_state_expires(st);
        if (expire <= getuptime())
@@ -3433,29 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct
        struct pf_rule *r = s->rule.ptr;
        int     rv;
 
-       s->rt_kif = NULL;
-       if (!r->rt)
+       if (r->rt == PF_NOPFROUTE)
                return (0);
 
-       switch (af) {
-       case AF_INET:
-               rv = pf_map_addr(AF_INET, r, saddr, &s->rt_addr, NULL, sns,
-                   &r->route, PF_SN_ROUTE);
-               break;
-#ifdef INET6
-       case AF_INET6:
-               rv = pf_map_addr(AF_INET6, r, saddr, &s->rt_addr, NULL, sns,
-                   &r->route, PF_SN_ROUTE);
-               break;
-#endif /* INET6 */
-       default:
-               rv = 1;
-       }
-
-       if (rv == 0) {
-               s->rt_kif = r->route.kif;
-               s->natrule.ptr = r;
-       }
+       rv = pf_map_addr(af, r, saddr, &s->rt_addr, NULL, sns, 
+           &r->route, PF_SN_ROUTE);
+       if (rv == 0)
+               s->rt = r->rt;
 
        return (rv);
 }
@@ -5986,15 +5965,13 @@ pf_rtlabel_match(struct pf_addr *addr, s
 
 /* pf_route() may change pd->m, adjust local copies after calling */
 void
-pf_route(struct pf_pdesc *pd, struct pf_rule *r, struct pf_state *s)
+pf_route(struct pf_pdesc *pd, struct pf_state *s)
 {
        struct mbuf             *m0, *m1;
        struct sockaddr_in      *dst, sin;
        struct rtentry          *rt = NULL;
        struct ip               *ip;
        struct ifnet            *ifp = NULL;
-       struct pf_addr           naddr;
-       struct pf_src_node      *sns[PF_SN_MAX];
        int                      error = 0;
        unsigned int             rtableid;
 
@@ -6004,11 +5981,11 @@ pf_route(struct pf_pdesc *pd, struct pf_
                return;
        }
 
-       if (r->rt == PF_DUPTO) {
+       if (s->rt == PF_DUPTO) {
                if ((m0 = m_dup_pkt(pd->m, max_linkhdr, M_NOWAIT)) == NULL)
                        return;
        } else {
-               if ((r->rt == PF_REPLYTO) == (r->direction == pd->dir))
+               if ((s->rt == PF_REPLYTO) == (s->direction == pd->dir))
                        return;
                m0 = pd->m;
        }
@@ -6021,44 +5998,31 @@ pf_route(struct pf_pdesc *pd, struct pf_
 
        ip = mtod(m0, struct ip *);
 
-       memset(&sin, 0, sizeof(sin));
-       dst = &sin;
-       dst->sin_family = AF_INET;
-       dst->sin_len = sizeof(*dst);
-       dst->sin_addr = ip->ip_dst;
-       rtableid = m0->m_pkthdr.ph_rtableid;
-
        if (pd->dir == PF_IN) {
                if (ip->ip_ttl <= IPTTLDEC) {
-                       if (r->rt != PF_DUPTO)
+                       if (s->rt != PF_DUPTO)
                                pf_send_icmp(m0, ICMP_TIMXCEED,
                                    ICMP_TIMXCEED_INTRANS, 0,
-                                   pd->af, r, pd->rdomain);
+                                   pd->af, s->rule.ptr, pd->rdomain);
                        goto bad;
                }
                ip->ip_ttl -= IPTTLDEC;
        }
 
-       if (s == NULL) {
-               memset(sns, 0, sizeof(sns));
-               if (pf_map_addr(AF_INET, r,
-                   (struct pf_addr *)&ip->ip_src,
-                   &naddr, NULL, sns, &r->route, PF_SN_ROUTE)) {
-                       DPFPRINTF(LOG_ERR,
-                           "%s: pf_map_addr() failed", __func__);
-                       goto bad;
-               }
+       memset(&sin, 0, sizeof(sin));
+       dst = &sin;
+       dst->sin_family = AF_INET;
+       dst->sin_len = sizeof(*dst);
+       dst->sin_addr.s_addr = s->rt_addr.v4.s_addr;
+       rtableid = m0->m_pkthdr.ph_rtableid;
 
-               if (!PF_AZERO(&naddr, AF_INET))
-                       dst->sin_addr.s_addr = naddr.v4.s_addr;
-               ifp = r->route.kif ?
-                   r->route.kif->pfik_ifp : NULL;
-       } else {
-               if (!PF_AZERO(&s->rt_addr, AF_INET))
-                       dst->sin_addr.s_addr =
-                           s->rt_addr.v4.s_addr;
-               ifp = s->rt_kif ? s->rt_kif->pfik_ifp : NULL;
+       rt = rtalloc(sintosa(dst), RT_RESOLVE, rtableid);
+       if (!rtisvalid(rt)) {
+               ipstat_inc(ips_noroute);
+               goto bad;
        }
+
+       ifp = if_get(rt->rt_ifidx);
        if (ifp == NULL)
                goto bad;
 
@@ -6074,12 +6038,6 @@ pf_route(struct pf_pdesc *pd, struct pf_
                }
                ip = mtod(m0, struct ip *);
        }
-
-       rt = rtalloc(sintosa(dst), RT_RESOLVE, rtableid);
-       if (!rtisvalid(rt)) {
-               ipstat_inc(ips_noroute);
-               goto bad;
-       }
        /* A locally generated packet may have invalid source address. */
        if ((ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET &&
            (ifp->if_flags & IFF_LOOPBACK) == 0)
@@ -6105,9 +6063,9 @@ pf_route(struct pf_pdesc *pd, struct pf_
         */
        if (ip->ip_off & htons(IP_DF)) {
                ipstat_inc(ips_cantfrag);
-               if (r->rt != PF_DUPTO)
+               if (s->rt != PF_DUPTO)
                        pf_send_icmp(m0, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG,
-                           ifp->if_mtu, pd->af, r, pd->rdomain);
+                           ifp->if_mtu, pd->af, s->rule.ptr, pd->rdomain);
                goto bad;
        }
 
@@ -6131,8 +6089,9 @@ pf_route(struct pf_pdesc *pd, struct pf_
                ipstat_inc(ips_fragmented);
 
 done:
-       if (r->rt != PF_DUPTO)
+       if (s->rt != PF_DUPTO)
                pd->m = NULL;
+       if_put(ifp);
        rtfree(rt);
        return;
 
@@ -6144,15 +6103,13 @@ bad:
 #ifdef INET6
 /* pf_route6() may change pd->m, adjust local copies after calling */
 void
-pf_route6(struct pf_pdesc *pd, struct pf_rule *r, struct pf_state *s)
+pf_route6(struct pf_pdesc *pd, struct pf_state *s)
 {
        struct mbuf             *m0;
        struct sockaddr_in6     *dst, sin6;
        struct rtentry          *rt = NULL;
        struct ip6_hdr          *ip6;
        struct ifnet            *ifp = NULL;
-       struct pf_addr           naddr;
-       struct pf_src_node      *sns[PF_SN_MAX];
        struct m_tag            *mtag;
        unsigned int             rtableid;
 
@@ -6162,11 +6119,11 @@ pf_route6(struct pf_pdesc *pd, struct pf
                return;
        }
 
-       if (r->rt == PF_DUPTO) {
+       if (s->rt == PF_DUPTO) {
                if ((m0 = m_dup_pkt(pd->m, max_linkhdr, M_NOWAIT)) == NULL)
                        return;
        } else {
-               if ((r->rt == PF_REPLYTO) == (r->direction == pd->dir))
+               if ((s->rt == PF_REPLYTO) == (s->direction == pd->dir))
                        return;
                m0 = pd->m;
        }
@@ -6178,42 +6135,31 @@ pf_route6(struct pf_pdesc *pd, struct pf
        }
        ip6 = mtod(m0, struct ip6_hdr *);
 
-       memset(&sin6, 0, sizeof(sin6));
-       dst = &sin6;
-       dst->sin6_family = AF_INET6;
-       dst->sin6_len = sizeof(*dst);
-       dst->sin6_addr = ip6->ip6_dst;
-       rtableid = m0->m_pkthdr.ph_rtableid;
-
        if (pd->dir == PF_IN) {
                if (ip6->ip6_hlim <= IPV6_HLIMDEC) {
-                       if (r->rt != PF_DUPTO)
+                       if (s->rt != PF_DUPTO)
                                pf_send_icmp(m0, ICMP6_TIME_EXCEEDED,
                                    ICMP6_TIME_EXCEED_TRANSIT, 0,
-                                   pd->af, r, pd->rdomain);
+                                   pd->af, s->rule.ptr, pd->rdomain);
                        goto bad;
                }
                ip6->ip6_hlim -= IPV6_HLIMDEC;
        }
 
-       if (s == NULL) {
-               memset(sns, 0, sizeof(sns));
-               if (pf_map_addr(AF_INET6, r, (struct pf_addr *)&ip6->ip6_src,
-                   &naddr, NULL, sns, &r->route, PF_SN_ROUTE)) {
-                       DPFPRINTF(LOG_ERR,
-                           "%s: pf_map_addr() failed", __func__);
-                       goto bad;
-               }
-               if (!PF_AZERO(&naddr, AF_INET6))
-                       pf_addrcpy((struct pf_addr *)&dst->sin6_addr,
-                           &naddr, AF_INET6);
-               ifp = r->route.kif ? r->route.kif->pfik_ifp : NULL;
-       } else {
-               if (!PF_AZERO(&s->rt_addr, AF_INET6))
-                       pf_addrcpy((struct pf_addr *)&dst->sin6_addr,
-                           &s->rt_addr, AF_INET6);
-               ifp = s->rt_kif ? s->rt_kif->pfik_ifp : NULL;
+       memset(&sin6, 0, sizeof(sin6));
+       dst = &sin6;
+       dst->sin6_family = AF_INET6;
+       dst->sin6_len = sizeof(*dst);
+       pf_addrcpy((struct pf_addr *)&dst->sin6_addr, &s->rt_addr, AF_INET6);
+       rtableid = m0->m_pkthdr.ph_rtableid;
+
+       rt = rtalloc(sin6tosa(dst), RT_RESOLVE, rtableid);
+       if (!rtisvalid(rt)) {
+               ip6stat_inc(ip6s_noroute);
+               goto bad;
        }
+
+       ifp = if_get(rt->rt_ifidx);
        if (ifp == NULL)
                goto bad;
 
@@ -6231,11 +6177,7 @@ pf_route6(struct pf_pdesc *pd, struct pf
 
        if (IN6_IS_SCOPE_EMBED(&dst->sin6_addr))
                dst->sin6_addr.s6_addr16[1] = htons(ifp->if_index);
-       rt = rtalloc(sin6tosa(dst), RT_RESOLVE, rtableid);
-       if (!rtisvalid(rt)) {
-               ip6stat_inc(ip6s_noroute);
-               goto bad;
-       }
+
        /* A locally generated packet may have invalid source address. */
        if (IN6_IS_ADDR_LOOPBACK(&ip6->ip6_src) &&
            (ifp->if_flags & IFF_LOOPBACK) == 0)
@@ -6253,15 +6195,16 @@ pf_route6(struct pf_pdesc *pd, struct pf
                ifp->if_output(ifp, m0, sin6tosa(dst), rt);
        } else {
                ip6stat_inc(ip6s_cantfrag);
-               if (r->rt != PF_DUPTO)
+               if (s->rt != PF_DUPTO)
                        pf_send_icmp(m0, ICMP6_PACKET_TOO_BIG, 0,
-                           ifp->if_mtu, pd->af, r, pd->rdomain);
+                           ifp->if_mtu, pd->af, s->rule.ptr, pd->rdomain);
                goto bad;
        }
 
 done:
-       if (r->rt != PF_DUPTO)
+       if (s->rt != PF_DUPTO)
                pd->m = NULL;
+       if_put(ifp);
        rtfree(rt);
        return;
 
@@ -6271,7 +6214,6 @@ bad:
 }
 #endif /* INET6 */
 
-
 /*
  * check TCP checksum and set mbuf flag
  *   off is the offset where the protocol header starts
@@ -7287,14 +7229,14 @@ done:
                pd.m = NULL;
                break;
        default:
-               if (r->rt) {
+               if (s && s->rt) {
                        switch (pd.af) {
                        case AF_INET:
-                               pf_route(&pd, r, s);
+                               pf_route(&pd, s);
                                break;
 #ifdef INET6
                        case AF_INET6:
-                               pf_route6(&pd, r, s);
+                               pf_route6(&pd, s);
                                break;
 #endif /* INET6 */
                        }
Index: sys/net/pfvar.h
===================================================================
RCS file: /cvs/src/sys/net/pfvar.h,v
retrieving revision 1.496
diff -u -p -r1.496 pfvar.h
--- sys/net/pfvar.h     24 Aug 2020 15:30:58 -0000      1.496
+++ sys/net/pfvar.h     19 Oct 2020 01:56:57 -0000
@@ -133,7 +133,7 @@ enum        { PFTM_TCP_FIRST_PACKET, PFTM_TCP_O
  */
 #define PF_FRAG_ENTRY_LIMIT            64
 
-enum   { PF_NOPFROUTE, PF_ROUTETO, PF_DUPTO, PF_REPLYTO };
+enum   { PF_NOPFROUTE = 0 , PF_ROUTETO, PF_DUPTO, PF_REPLYTO };
 enum   { PF_LIMIT_STATES, PF_LIMIT_SRC_NODES, PF_LIMIT_FRAGS,
          PF_LIMIT_TABLES, PF_LIMIT_TABLE_ENTRIES, PF_LIMIT_PKTDELAY_PKTS,
          PF_LIMIT_MAX };
@@ -762,7 +762,6 @@ struct pf_state {
        struct pf_sn_head        src_nodes;
        struct pf_state_key     *key[2];        /* addresses stack and wire  */
        struct pfi_kif          *kif;
-       struct pfi_kif          *rt_kif;
        u_int64_t                packets[2];
        u_int64_t                bytes[2];
        int32_t                  creation;
@@ -797,6 +796,7 @@ struct pf_state {
        u_int16_t                if_index_out;
        pf_refcnt_t              refcnt;
        u_int16_t                delay;
+       u_int8_t                 rt;
 };
 
 /*
@@ -852,7 +852,7 @@ struct pfsync_state {
        u_int8_t         proto;
        u_int8_t         direction;
        u_int8_t         log;
-       u_int8_t         pad0;
+       u_int8_t         rt;
        u_int8_t         timeout;
        u_int8_t         sync_flags;
        u_int8_t         updates;
@@ -1798,8 +1798,8 @@ int       pf_state_key_attach(struct pf_state_
 int    pf_translate(struct pf_pdesc *, struct pf_addr *, u_int16_t,
            struct pf_addr *, u_int16_t, u_int16_t, int);
 int    pf_translate_af(struct pf_pdesc *);
-void   pf_route(struct pf_pdesc *, struct pf_rule *, struct pf_state *);
-void   pf_route6(struct pf_pdesc *, struct pf_rule *, struct pf_state *);
+void   pf_route(struct pf_pdesc *, struct pf_state *);
+void   pf_route6(struct pf_pdesc *, struct pf_state *);
 void   pf_init_threshold(struct pf_threshold *, u_int32_t, u_int32_t);
 int    pf_delay_pkt(struct mbuf *, u_int);

pf route-to issues

Reply via email to