every few years i try and use route-to in pf, and every time it goes badly. i tried it again last week in a slightly different setting, and actually tried to understand the sharp edges i hit this time instead of giving up. it turns out there are 2 or 3 different things together that have cause me trouble, which is why the diff below is so big.
the first and i would argue most fundamental problem is a semantic problem. if you ask a random person who has some clue about networks and routing what they would expect the "argument" to route-to or reply-to to be, they would say "a nexthop address" or "a gateway address". eg, say i want to force packets to a specific backend server without using NAT, i would write a rule like this: n_servers="192.0.2.128/27" pass out on $if_internal to $n_servers route-to 192.168.0.1 pfctl will happily parse this, shove it into the kernel, let you read the rules back out again with pfctl -sr, and it all looks plausible, but it turns out that it's using the argument to route-to as an interface name. because rulesets can refer to interfaces that don't exist yet, pf just passes the IP address around as a string, hoping i'll plug in an interface with a driver name that looks like an ip address. i spent literally a day trying to figure out why a rule like this wasn't working. i happened to be talking to pascoe@ at the time, and his vague memory was that the idea was to try and switch the interface a packet was going to travel over, but to try and reuse the arp lookup from the parent one. neither of us could figure out why that would be a good idea though. the best i can say about this is that it only really makes some kind of sense if you're moving a packet into a tunnel. tunnels don't really care about nexthops and will happily route anything you give them. if you were trying to add a route to the routing table to do this, you'd be specifying the peer address on a tunnel interface as the gateway. pf has a if0:peer syntax that makes this convenient to write. so i want to change route-to in pfctl so it takes a nexthop instead of an interface. you could argue that pf already lets you do this, because there's some bs nexthop@interface syntax. my counter argument is that the interface the nexthop is reachable over is redundant, and it makes fixing some of the other problems harder if we keep it. the second and third problems i hit are when route-to is used on a pair of boxes that have pfsync and pfsync defer set up. when defer is enabled, pfsync takes the packet away from the forwarding path, and when it has some confidence that the peer is aware of the state, then it tries to push the packet back out. to understand the following, be aware that route-to, reply-to, and dup-to are implemented in pf in a pair of functions called pf_route and pf_route6. if i say pf_route, just assume i'm talking about both of these functions. the second problem is that the pf_route calls from pfsync don't have all the information it is supposed to have. more specifically, an ifp pointer isn't set which leads to a segfault. the ifp pointer isn't set because pfsync doesnt track which interface a packet is going out, it assumes the ip layer will get it right again later, or a rule provided something usable. the third problem is that pf_route relies on information from rules to work correctly. this is a problem in a pfsync environment because you cannot have the same ruleset on both firewalls 100% of the time, which means you cannot have route-to/reply-to behave consistently on a pair of firwalls 100% of the time. my solution to both these problems is reduce the amount of information pf_route needs to work with, to make sure that the info it does need is in the pf state structure, and that pfsync handles it properly. if we limit the information needed for pf_route to a nexthop address, and which direction the address is used, this is doable. both the pf_state and pfsync_state structs already contain an address to store a nexthop in, i just had to move the route-to direction from the rule into the state. this is easy with pf_state, but i used a spare pad field in pfsync_state for this. the pf_state struct has had which interface the route is using removed. there's no simple way to sync interface information between pfsync peers on the wire, and the need for them is marginal at best. things are much simpler if we can get away with not having this info. a bonus problem i hit is that there's code in pf_match that appears to try and short circuit some processing of states when route-to/reply-to is in effect. this has two consequences. first, if you're using route-to with tcp states, half the tcp state machine is is skipped. when you look at these states with pfctl -vvss, one half of the TCP state never moves forward. secondly, because the processing is short circuited, it never falls through to the end of pf_test where the actual call to pf_route is done. so the first packet is properly handled by pf_route, but none of the packets after that. all of this together makes things work pretty obviously and smoothly. in my opinion anyway. route-to now works more like rdr-to, it just feels like it changes the address used for the route lookup rather than changing the actual IP address in the packet. it also works predictably in a pfsync pair, which is great from the point of view of high availability. the main caveat is that it's not backward compatible. if you're already using route-to, you will need to tweak your rules to have them parse. however, i doubt anyone is using this stuff because it feels very broken to me. the overarching use case for me with this stuff is i want a frontend router/load balancers to have "sticky" routes for connections to a pool of backend servers. eg, i have webservers that have a bunch of IP addresses bound to lo1 interfaces. all the webservers have the same set of IPs on lo1 interface. with this i can write rules like this: pass out on $if_web to 203.0.113.0/27 route-to 192.168.0.6/31 these webservers cannot take over from each other because they actually terminate tcp connections for these IP addresses. the normal statelessness of routes where a different router can handle them like any other router doesnt apply in this situation. once the frontend has picked a backend route target, i want the route to stick to the backend. pf is the mechanism im using to create and apply that stickiness. im also looking at having frontends load balance traffic over backends that are accessible over gre tunnels. in that case i just want to tie the routes to the interface that was already picked by the routing table. that would look like: n_anycast="203.0.113.128/31" pass out on gre0 to $n_anycast route-to gre0:peer pass out on gre1 to $n_anycast route-to gre1:peer without this diff i need to write something like this: pass out on gre0 to $n_anycast route-to gre0:peer@gre0 pass out on gre1 to $n_anycast route-to gre1:peer@gre1 and then it panics anyway cos pf_route via pfsync blows up. and i can't have the frontend failover reliably to a peer. thoughts? there's some further cleanup that could be done in pfctl if this is allowed to proceed. Index: sbin/pfctl/parse.y =================================================================== RCS file: /cvs/src/sbin/pfctl/parse.y,v retrieving revision 1.704 diff -u -p -r1.704 parse.y --- sbin/pfctl/parse.y 1 Oct 2020 14:02:08 -0000 1.704 +++ sbin/pfctl/parse.y 19 Oct 2020 01:56:50 -0000 @@ -276,6 +276,7 @@ struct filter_opts { struct redirspec nat; struct redirspec rdr; struct redirspec rroute; + u_int8_t rt; /* scrub opts */ int nodf; @@ -284,15 +285,6 @@ struct filter_opts { int randomid; int max_mss; - /* route opts */ - struct { - struct node_host *host; - u_int8_t rt; - u_int8_t pool_opts; - sa_family_t af; - struct pf_poolhashkey *key; - } route; - struct { u_int32_t limit; u_int32_t seconds; @@ -517,7 +509,6 @@ int parseport(char *, struct range *r, i %type <v.host> ipspec xhost host dynaddr host_list %type <v.host> table_host_list tablespec %type <v.host> redir_host_list redirspec -%type <v.host> route_host route_host_list routespec %type <v.os> os xos os_list %type <v.port> portspec port_list port_item %type <v.uid> uids uid_list uid_item @@ -974,7 +965,7 @@ anchorrule : ANCHOR anchorname dir quick YYERROR; } - if ($9.route.rt) { + if ($9.rt) { yyerror("cannot specify route handling " "on anchors"); YYERROR; @@ -1842,37 +1833,18 @@ pfrule : action dir logquick interface decide_address_family($7.src.host, &r.af); decide_address_family($7.dst.host, &r.af); - if ($8.route.rt) { + if ($8.rt) { + if ($8.rt != PF_DUPTO && !r.direction) { + yyerror("direction must be explicit" + " with rules that specify routing"); + YYERROR; + } if (!r.direction) { yyerror("direction must be explicit " "with rules that specify routing"); YYERROR; } - r.rt = $8.route.rt; - r.route.opts = $8.route.pool_opts; - if ($8.route.key != NULL) - memcpy(&r.route.key, $8.route.key, - sizeof(struct pf_poolhashkey)); - } - if (r.rt) { - decide_address_family($8.route.host, &r.af); - if ((r.route.opts & PF_POOL_TYPEMASK) == - PF_POOL_NONE && ($8.route.host->next != NULL || - $8.route.host->addr.type == PF_ADDR_TABLE || - DYNIF_MULTIADDR($8.route.host->addr))) - r.route.opts |= PF_POOL_ROUNDROBIN; - if ($8.route.host->next != NULL) { - if (!PF_POOL_DYNTYPE(r.route.opts)) { - yyerror("address pool option " - "not supported by type"); - YYERROR; - } - } - /* fake redirspec */ - if (($8.rroute.rdr = calloc(1, - sizeof(*$8.rroute.rdr))) == NULL) - err(1, "$8.rroute.rdr"); - $8.rroute.rdr->host = $8.route.host; + r.rt = $8.rt; } if (expand_divertspec(&r, &$8.divert)) @@ -2136,30 +2108,14 @@ filter_opt : USER uids { sizeof(filter_opts.nat.pool_opts)); filter_opts.nat.pool_opts.staticport = 1; } - | ROUTETO routespec pool_opts { - filter_opts.route.host = $2; - filter_opts.route.rt = PF_ROUTETO; - filter_opts.route.pool_opts = $3.type | $3.opts; - memcpy(&filter_opts.rroute.pool_opts, &$3, - sizeof(filter_opts.rroute.pool_opts)); - if ($3.key != NULL) - filter_opts.route.key = $3.key; + | ROUTETO routespec { + filter_opts.rt = PF_ROUTETO; } - | REPLYTO routespec pool_opts { - filter_opts.route.host = $2; - filter_opts.route.rt = PF_REPLYTO; - filter_opts.route.pool_opts = $3.type | $3.opts; - if ($3.key != NULL) - filter_opts.route.key = $3.key; - } - | DUPTO routespec pool_opts { - filter_opts.route.host = $2; - filter_opts.route.rt = PF_DUPTO; - filter_opts.route.pool_opts = $3.type | $3.opts; - memcpy(&filter_opts.rroute.pool_opts, &$3, - sizeof(filter_opts.rroute.pool_opts)); - if ($3.key != NULL) - filter_opts.route.key = $3.key; + | REPLYTO routespec { + filter_opts.rt = PF_REPLYTO; + } + | DUPTO routespec { + filter_opts.rt = PF_DUPTO; } | not RECEIVEDON if_item { if (filter_opts.rcv) { @@ -3733,122 +3689,16 @@ pool_opt : BITMASK { } ; -route_host : STRING { - /* try to find @if0 address specs */ - if (strrchr($1, '@') != NULL) { - if (($$ = host($1, pf->opts)) == NULL) { - yyerror("invalid host for route spec"); - YYERROR; - } - free($1); - } else { - $$ = calloc(1, sizeof(struct node_host)); - if ($$ == NULL) - err(1, "route_host: calloc"); - $$->ifname = $1; - $$->addr.type = PF_ADDR_NONE; - set_ipmask($$, 128); - $$->next = NULL; - $$->tail = $$; - } - } - | STRING '/' STRING { - char *buf; - - if (asprintf(&buf, "%s/%s", $1, $3) == -1) - err(1, "host: asprintf"); - free($1); - if (($$ = host(buf, pf->opts)) == NULL) { - /* error. "any" is handled elsewhere */ - free(buf); - yyerror("could not parse host specification"); - YYERROR; - } - free(buf); - } - | '<' STRING '>' { - if (strlen($2) >= PF_TABLE_NAME_SIZE) { - yyerror("table name '%s' too long", $2); - free($2); - YYERROR; - } - $$ = calloc(1, sizeof(struct node_host)); - if ($$ == NULL) - err(1, "host: calloc"); - $$->addr.type = PF_ADDR_TABLE; - if (strlcpy($$->addr.v.tblname, $2, - sizeof($$->addr.v.tblname)) >= - sizeof($$->addr.v.tblname)) - errx(1, "host: strlcpy"); - free($2); - $$->next = NULL; - $$->tail = $$; - } - | dynaddr '/' NUMBER { - struct node_host *n; - - if ($3 < 0 || $3 > 128) { - yyerror("bit number too big"); - YYERROR; - } - $$ = $1; - for (n = $1; n != NULL; n = n->next) - set_ipmask(n, $3); - } - | '(' STRING host ')' { - struct node_host *n; - - $$ = $3; - /* XXX check masks, only full mask should be allowed */ - for (n = $3; n != NULL; n = n->next) { - if ($$->ifname) { - yyerror("cannot specify interface twice " - "in route spec"); - YYERROR; - } - if (($$->ifname = strdup($2)) == NULL) - errx(1, "host: strdup"); - } - free($2); - } - ; - -route_host_list : route_host optweight optnl { - if ($2 > 0) { - struct node_host *n; - for (n = $1; n != NULL; n = n->next) - n->weight = $2; - } - $$ = $1; - } - | route_host_list comma route_host optweight optnl { - if ($1->af == 0) - $1->af = $3->af; - if ($1->af != $3->af) { - yyerror("all pool addresses must be in the " - "same address family"); +routespec : redirpool pool_opts { + if (filter_opts.rt != PF_NOPFROUTE) { + yyerror("cannot respecify " + "route-to/reply-to/dup-to"); YYERROR; } - $1->tail->next = $3; - $1->tail = $3->tail; - if ($4 > 0) { - struct node_host *n; - for (n = $3; n != NULL; n = n->next) - n->weight = $4; - } - $$ = $1; - } - ; - -routespec : route_host optweight { - if ($2 > 0) { - struct node_host *n; - for (n = $1; n != NULL; n = n->next) - n->weight = $2; - } - $$ = $1; + filter_opts.rroute.rdr = $1; + memcpy(&filter_opts.rroute.pool_opts, &$2, + sizeof(filter_opts.rroute.pool_opts)); } - | '{' optnl route_host_list '}' { $$ = $3; } ; timeout_spec : STRING NUMBER @@ -4709,7 +4559,7 @@ expand_rule(struct pf_rule *r, int keepr error += collapse_redirspec(&r->rdr, r, rdr, 0); error += collapse_redirspec(&r->nat, r, nat, 0); - error += collapse_redirspec(&r->route, r, rroute, 1); + error += collapse_redirspec(&r->route, r, rroute, 0); /* disallow @if in from or to for the time being */ if ((src_host->addr.type == PF_ADDR_ADDRMASK && @@ -5955,7 +5805,7 @@ filteropts_to_rule(struct pf_rule *r, st yyerror("af-to can only be used with direction in"); return (1); } - if ((opts->marker & FOM_AFTO) && opts->route.rt) { + if ((opts->marker & FOM_AFTO) && opts->rt) { yyerror("af-to cannot be used together with " "route-to, reply-to, dup-to"); return (1); Index: sys/net/if_pfsync.c =================================================================== RCS file: /cvs/src/sys/net/if_pfsync.c,v retrieving revision 1.278 diff -u -p -r1.278 if_pfsync.c --- sys/net/if_pfsync.c 24 Aug 2020 15:30:58 -0000 1.278 +++ sys/net/if_pfsync.c 19 Oct 2020 01:56:57 -0000 @@ -612,7 +612,8 @@ pfsync_state_import(struct pfsync_state st->rtableid[PF_SK_STACK] = ntohl(sp->rtableid[PF_SK_STACK]); /* copy to state */ - bcopy(&sp->rt_addr, &st->rt_addr, sizeof(st->rt_addr)); + st->rt_addr = sp->rt_addr; + st->rt = sp->rt; st->creation = getuptime() - ntohl(sp->creation); st->expire = getuptime(); if (ntohl(sp->expire)) { @@ -643,7 +644,6 @@ pfsync_state_import(struct pfsync_state st->rule.ptr = r; st->anchor.ptr = NULL; - st->rt_kif = NULL; st->pfsync_time = getuptime(); st->sync_state = PFSYNC_S_NONE; @@ -1843,6 +1843,7 @@ pfsync_undefer(struct pfsync_deferral *p { struct pfsync_softc *sc = pfsyncif; struct pf_pdesc pdesc; + struct pf_state *s = pd->pd_st; NET_ASSERT_LOCKED(); @@ -1852,35 +1853,33 @@ pfsync_undefer(struct pfsync_deferral *p TAILQ_REMOVE(&sc->sc_deferrals, pd, pd_entry); sc->sc_deferred--; - CLR(pd->pd_st->state_flags, PFSTATE_ACK); + CLR(s->state_flags, PFSTATE_ACK); if (drop) m_freem(pd->pd_m); else { - if (pd->pd_st->rule.ptr->rt == PF_ROUTETO) { + if (s->rt == PF_ROUTETO) { if (pf_setup_pdesc(&pdesc, - pd->pd_st->key[PF_SK_WIRE]->af, - pd->pd_st->direction, pd->pd_st->rt_kif, + s->key[PF_SK_WIRE]->af, + s->direction, s->kif, pd->pd_m, NULL) != PF_PASS) { m_freem(pd->pd_m); goto out; } - switch (pd->pd_st->key[PF_SK_WIRE]->af) { + switch (s->key[PF_SK_WIRE]->af) { case AF_INET: - pf_route(&pdesc, - pd->pd_st->rule.ptr, pd->pd_st); + pf_route(&pdesc, s); break; #ifdef INET6 case AF_INET6: - pf_route6(&pdesc, - pd->pd_st->rule.ptr, pd->pd_st); + pf_route6(&pdesc, s); break; #endif /* INET6 */ default: - unhandled_af(pd->pd_st->key[PF_SK_WIRE]->af); + unhandled_af(s->key[PF_SK_WIRE]->af); } pd->pd_m = pdesc.m; } else { - switch (pd->pd_st->key[PF_SK_WIRE]->af) { + switch (s->key[PF_SK_WIRE]->af) { case AF_INET: ip_output(pd->pd_m, NULL, NULL, 0, NULL, NULL, 0); @@ -1892,12 +1891,12 @@ pfsync_undefer(struct pfsync_deferral *p break; #endif /* INET6 */ default: - unhandled_af(pd->pd_st->key[PF_SK_WIRE]->af); + unhandled_af(s->key[PF_SK_WIRE]->af); } } } out: - pf_state_unref(pd->pd_st); + pf_state_unref(s); pool_put(&sc->sc_pool, pd); } Index: sys/net/pf.c =================================================================== RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.1094 diff -u -p -r1.1094 pf.c --- sys/net/pf.c 24 Jul 2020 18:17:15 -0000 1.1094 +++ sys/net/pf.c 19 Oct 2020 01:56:57 -0000 @@ -1122,12 +1122,6 @@ pf_find_state(struct pf_pdesc *pd, struc } *state = s; - if (pd->dir == PF_OUT && s->rt_kif != NULL && s->rt_kif != pd->kif && - ((s->rule.ptr->rt == PF_ROUTETO && - s->rule.ptr->direction == PF_OUT) || - (s->rule.ptr->rt == PF_REPLYTO && - s->rule.ptr->direction == PF_IN))) - return (PF_PASS); return (PF_MATCH); } @@ -1186,7 +1180,8 @@ pf_state_export(struct pfsync_state *sp, /* copy from state */ strlcpy(sp->ifname, st->kif->pfik_name, sizeof(sp->ifname)); - memcpy(&sp->rt_addr, &st->rt_addr, sizeof(sp->rt_addr)); + sp->rt = st->rt; + sp->rt_addr = st->rt_addr; sp->creation = htonl(getuptime() - st->creation); expire = pf_state_expires(st); if (expire <= getuptime()) @@ -3433,29 +3428,13 @@ pf_set_rt_ifp(struct pf_state *s, struct struct pf_rule *r = s->rule.ptr; int rv; - s->rt_kif = NULL; - if (!r->rt) + if (r->rt == PF_NOPFROUTE) return (0); - switch (af) { - case AF_INET: - rv = pf_map_addr(AF_INET, r, saddr, &s->rt_addr, NULL, sns, - &r->route, PF_SN_ROUTE); - break; -#ifdef INET6 - case AF_INET6: - rv = pf_map_addr(AF_INET6, r, saddr, &s->rt_addr, NULL, sns, - &r->route, PF_SN_ROUTE); - break; -#endif /* INET6 */ - default: - rv = 1; - } - - if (rv == 0) { - s->rt_kif = r->route.kif; - s->natrule.ptr = r; - } + rv = pf_map_addr(af, r, saddr, &s->rt_addr, NULL, sns, + &r->route, PF_SN_ROUTE); + if (rv == 0) + s->rt = r->rt; return (rv); } @@ -5986,15 +5965,13 @@ pf_rtlabel_match(struct pf_addr *addr, s /* pf_route() may change pd->m, adjust local copies after calling */ void -pf_route(struct pf_pdesc *pd, struct pf_rule *r, struct pf_state *s) +pf_route(struct pf_pdesc *pd, struct pf_state *s) { struct mbuf *m0, *m1; struct sockaddr_in *dst, sin; struct rtentry *rt = NULL; struct ip *ip; struct ifnet *ifp = NULL; - struct pf_addr naddr; - struct pf_src_node *sns[PF_SN_MAX]; int error = 0; unsigned int rtableid; @@ -6004,11 +5981,11 @@ pf_route(struct pf_pdesc *pd, struct pf_ return; } - if (r->rt == PF_DUPTO) { + if (s->rt == PF_DUPTO) { if ((m0 = m_dup_pkt(pd->m, max_linkhdr, M_NOWAIT)) == NULL) return; } else { - if ((r->rt == PF_REPLYTO) == (r->direction == pd->dir)) + if ((s->rt == PF_REPLYTO) == (s->direction == pd->dir)) return; m0 = pd->m; } @@ -6021,44 +5998,31 @@ pf_route(struct pf_pdesc *pd, struct pf_ ip = mtod(m0, struct ip *); - memset(&sin, 0, sizeof(sin)); - dst = &sin; - dst->sin_family = AF_INET; - dst->sin_len = sizeof(*dst); - dst->sin_addr = ip->ip_dst; - rtableid = m0->m_pkthdr.ph_rtableid; - if (pd->dir == PF_IN) { if (ip->ip_ttl <= IPTTLDEC) { - if (r->rt != PF_DUPTO) + if (s->rt != PF_DUPTO) pf_send_icmp(m0, ICMP_TIMXCEED, ICMP_TIMXCEED_INTRANS, 0, - pd->af, r, pd->rdomain); + pd->af, s->rule.ptr, pd->rdomain); goto bad; } ip->ip_ttl -= IPTTLDEC; } - if (s == NULL) { - memset(sns, 0, sizeof(sns)); - if (pf_map_addr(AF_INET, r, - (struct pf_addr *)&ip->ip_src, - &naddr, NULL, sns, &r->route, PF_SN_ROUTE)) { - DPFPRINTF(LOG_ERR, - "%s: pf_map_addr() failed", __func__); - goto bad; - } + memset(&sin, 0, sizeof(sin)); + dst = &sin; + dst->sin_family = AF_INET; + dst->sin_len = sizeof(*dst); + dst->sin_addr.s_addr = s->rt_addr.v4.s_addr; + rtableid = m0->m_pkthdr.ph_rtableid; - if (!PF_AZERO(&naddr, AF_INET)) - dst->sin_addr.s_addr = naddr.v4.s_addr; - ifp = r->route.kif ? - r->route.kif->pfik_ifp : NULL; - } else { - if (!PF_AZERO(&s->rt_addr, AF_INET)) - dst->sin_addr.s_addr = - s->rt_addr.v4.s_addr; - ifp = s->rt_kif ? s->rt_kif->pfik_ifp : NULL; + rt = rtalloc(sintosa(dst), RT_RESOLVE, rtableid); + if (!rtisvalid(rt)) { + ipstat_inc(ips_noroute); + goto bad; } + + ifp = if_get(rt->rt_ifidx); if (ifp == NULL) goto bad; @@ -6074,12 +6038,6 @@ pf_route(struct pf_pdesc *pd, struct pf_ } ip = mtod(m0, struct ip *); } - - rt = rtalloc(sintosa(dst), RT_RESOLVE, rtableid); - if (!rtisvalid(rt)) { - ipstat_inc(ips_noroute); - goto bad; - } /* A locally generated packet may have invalid source address. */ if ((ntohl(ip->ip_src.s_addr) >> IN_CLASSA_NSHIFT) == IN_LOOPBACKNET && (ifp->if_flags & IFF_LOOPBACK) == 0) @@ -6105,9 +6063,9 @@ pf_route(struct pf_pdesc *pd, struct pf_ */ if (ip->ip_off & htons(IP_DF)) { ipstat_inc(ips_cantfrag); - if (r->rt != PF_DUPTO) + if (s->rt != PF_DUPTO) pf_send_icmp(m0, ICMP_UNREACH, ICMP_UNREACH_NEEDFRAG, - ifp->if_mtu, pd->af, r, pd->rdomain); + ifp->if_mtu, pd->af, s->rule.ptr, pd->rdomain); goto bad; } @@ -6131,8 +6089,9 @@ pf_route(struct pf_pdesc *pd, struct pf_ ipstat_inc(ips_fragmented); done: - if (r->rt != PF_DUPTO) + if (s->rt != PF_DUPTO) pd->m = NULL; + if_put(ifp); rtfree(rt); return; @@ -6144,15 +6103,13 @@ bad: #ifdef INET6 /* pf_route6() may change pd->m, adjust local copies after calling */ void -pf_route6(struct pf_pdesc *pd, struct pf_rule *r, struct pf_state *s) +pf_route6(struct pf_pdesc *pd, struct pf_state *s) { struct mbuf *m0; struct sockaddr_in6 *dst, sin6; struct rtentry *rt = NULL; struct ip6_hdr *ip6; struct ifnet *ifp = NULL; - struct pf_addr naddr; - struct pf_src_node *sns[PF_SN_MAX]; struct m_tag *mtag; unsigned int rtableid; @@ -6162,11 +6119,11 @@ pf_route6(struct pf_pdesc *pd, struct pf return; } - if (r->rt == PF_DUPTO) { + if (s->rt == PF_DUPTO) { if ((m0 = m_dup_pkt(pd->m, max_linkhdr, M_NOWAIT)) == NULL) return; } else { - if ((r->rt == PF_REPLYTO) == (r->direction == pd->dir)) + if ((s->rt == PF_REPLYTO) == (s->direction == pd->dir)) return; m0 = pd->m; } @@ -6178,42 +6135,31 @@ pf_route6(struct pf_pdesc *pd, struct pf } ip6 = mtod(m0, struct ip6_hdr *); - memset(&sin6, 0, sizeof(sin6)); - dst = &sin6; - dst->sin6_family = AF_INET6; - dst->sin6_len = sizeof(*dst); - dst->sin6_addr = ip6->ip6_dst; - rtableid = m0->m_pkthdr.ph_rtableid; - if (pd->dir == PF_IN) { if (ip6->ip6_hlim <= IPV6_HLIMDEC) { - if (r->rt != PF_DUPTO) + if (s->rt != PF_DUPTO) pf_send_icmp(m0, ICMP6_TIME_EXCEEDED, ICMP6_TIME_EXCEED_TRANSIT, 0, - pd->af, r, pd->rdomain); + pd->af, s->rule.ptr, pd->rdomain); goto bad; } ip6->ip6_hlim -= IPV6_HLIMDEC; } - if (s == NULL) { - memset(sns, 0, sizeof(sns)); - if (pf_map_addr(AF_INET6, r, (struct pf_addr *)&ip6->ip6_src, - &naddr, NULL, sns, &r->route, PF_SN_ROUTE)) { - DPFPRINTF(LOG_ERR, - "%s: pf_map_addr() failed", __func__); - goto bad; - } - if (!PF_AZERO(&naddr, AF_INET6)) - pf_addrcpy((struct pf_addr *)&dst->sin6_addr, - &naddr, AF_INET6); - ifp = r->route.kif ? r->route.kif->pfik_ifp : NULL; - } else { - if (!PF_AZERO(&s->rt_addr, AF_INET6)) - pf_addrcpy((struct pf_addr *)&dst->sin6_addr, - &s->rt_addr, AF_INET6); - ifp = s->rt_kif ? s->rt_kif->pfik_ifp : NULL; + memset(&sin6, 0, sizeof(sin6)); + dst = &sin6; + dst->sin6_family = AF_INET6; + dst->sin6_len = sizeof(*dst); + pf_addrcpy((struct pf_addr *)&dst->sin6_addr, &s->rt_addr, AF_INET6); + rtableid = m0->m_pkthdr.ph_rtableid; + + rt = rtalloc(sin6tosa(dst), RT_RESOLVE, rtableid); + if (!rtisvalid(rt)) { + ip6stat_inc(ip6s_noroute); + goto bad; } + + ifp = if_get(rt->rt_ifidx); if (ifp == NULL) goto bad; @@ -6231,11 +6177,7 @@ pf_route6(struct pf_pdesc *pd, struct pf if (IN6_IS_SCOPE_EMBED(&dst->sin6_addr)) dst->sin6_addr.s6_addr16[1] = htons(ifp->if_index); - rt = rtalloc(sin6tosa(dst), RT_RESOLVE, rtableid); - if (!rtisvalid(rt)) { - ip6stat_inc(ip6s_noroute); - goto bad; - } + /* A locally generated packet may have invalid source address. */ if (IN6_IS_ADDR_LOOPBACK(&ip6->ip6_src) && (ifp->if_flags & IFF_LOOPBACK) == 0) @@ -6253,15 +6195,16 @@ pf_route6(struct pf_pdesc *pd, struct pf ifp->if_output(ifp, m0, sin6tosa(dst), rt); } else { ip6stat_inc(ip6s_cantfrag); - if (r->rt != PF_DUPTO) + if (s->rt != PF_DUPTO) pf_send_icmp(m0, ICMP6_PACKET_TOO_BIG, 0, - ifp->if_mtu, pd->af, r, pd->rdomain); + ifp->if_mtu, pd->af, s->rule.ptr, pd->rdomain); goto bad; } done: - if (r->rt != PF_DUPTO) + if (s->rt != PF_DUPTO) pd->m = NULL; + if_put(ifp); rtfree(rt); return; @@ -6271,7 +6214,6 @@ bad: } #endif /* INET6 */ - /* * check TCP checksum and set mbuf flag * off is the offset where the protocol header starts @@ -7287,14 +7229,14 @@ done: pd.m = NULL; break; default: - if (r->rt) { + if (s && s->rt) { switch (pd.af) { case AF_INET: - pf_route(&pd, r, s); + pf_route(&pd, s); break; #ifdef INET6 case AF_INET6: - pf_route6(&pd, r, s); + pf_route6(&pd, s); break; #endif /* INET6 */ } Index: sys/net/pfvar.h =================================================================== RCS file: /cvs/src/sys/net/pfvar.h,v retrieving revision 1.496 diff -u -p -r1.496 pfvar.h --- sys/net/pfvar.h 24 Aug 2020 15:30:58 -0000 1.496 +++ sys/net/pfvar.h 19 Oct 2020 01:56:57 -0000 @@ -133,7 +133,7 @@ enum { PFTM_TCP_FIRST_PACKET, PFTM_TCP_O */ #define PF_FRAG_ENTRY_LIMIT 64 -enum { PF_NOPFROUTE, PF_ROUTETO, PF_DUPTO, PF_REPLYTO }; +enum { PF_NOPFROUTE = 0 , PF_ROUTETO, PF_DUPTO, PF_REPLYTO }; enum { PF_LIMIT_STATES, PF_LIMIT_SRC_NODES, PF_LIMIT_FRAGS, PF_LIMIT_TABLES, PF_LIMIT_TABLE_ENTRIES, PF_LIMIT_PKTDELAY_PKTS, PF_LIMIT_MAX }; @@ -762,7 +762,6 @@ struct pf_state { struct pf_sn_head src_nodes; struct pf_state_key *key[2]; /* addresses stack and wire */ struct pfi_kif *kif; - struct pfi_kif *rt_kif; u_int64_t packets[2]; u_int64_t bytes[2]; int32_t creation; @@ -797,6 +796,7 @@ struct pf_state { u_int16_t if_index_out; pf_refcnt_t refcnt; u_int16_t delay; + u_int8_t rt; }; /* @@ -852,7 +852,7 @@ struct pfsync_state { u_int8_t proto; u_int8_t direction; u_int8_t log; - u_int8_t pad0; + u_int8_t rt; u_int8_t timeout; u_int8_t sync_flags; u_int8_t updates; @@ -1798,8 +1798,8 @@ int pf_state_key_attach(struct pf_state_ int pf_translate(struct pf_pdesc *, struct pf_addr *, u_int16_t, struct pf_addr *, u_int16_t, u_int16_t, int); int pf_translate_af(struct pf_pdesc *); -void pf_route(struct pf_pdesc *, struct pf_rule *, struct pf_state *); -void pf_route6(struct pf_pdesc *, struct pf_rule *, struct pf_state *); +void pf_route(struct pf_pdesc *, struct pf_state *); +void pf_route6(struct pf_pdesc *, struct pf_state *); void pf_init_threshold(struct pf_threshold *, u_int32_t, u_int32_t); int pf_delay_pkt(struct mbuf *, u_int);