On Monday, June 9, 2025 5:19:53 PM UTC Mark Johnston wrote: > On Mon, Jun 09, 2025 at 04:42:55AM +0000, Paul Vixie wrote: > > ... > I've made a number of inline comments.
tyvm! this is precisely the kind of review i'd hoped for. see inline. > > - so->so_fibnum = head->so_fibnum; > > + if ((so->so_fibnum = head->so_fibnum) == 0) > > + so->so_fibnum = fibnum; an accept()'ed socket [which this is] inherits a fibnum from its listen() socket which inherits from its creating process. the former can be overridden with setsockopt(SO_FIB) and the latter by setfib(). here, we are making this inherited or crafted [whichever it was] fibnum a fallback, so that in the case of a TCP SYN we can preserve the interface's fibnum [in the mbuf header] as the accept() socket's fibnum. i anticipate further complexity in the future, such that there may someday be more than than one fallback priority. so: > The double assignment looks odd to me. It'd be neater to write: > > so->so_fibnum == head->so_fibnum != 0 ? head->so_fibnum : fibnum; it would not be future proof. as written, the extra memory write will be coalesced in the cache's write buffer, so any performance loss would only occur on processors made before ~Y2K. if we're to neaten it, i'd say: if (head->so_fibnum != 0) so->so_fibnum = head->so_fibnum; else so->so_fibnum = fibnum; ...so as to support future more-levels fallbacking. i dislike the read before write at least as much as you dislike the write after write, because the "if" will emit some kind of conditional branch in the machine code [as would your proposed trinary conditional), which then subjects us to branch prediction and occasional pipeline stalls. write coalescence is simply faster on processors made after ~Y2K. but i'll adapt as shown, optimizing for clarity and future- proofing. i'll also make the corresponding change to tcp_input.c. > > -ifa_ifwithaddr(const struct sockaddr *addr) > > +ifa_ifwithaddr_getfib(const struct sockaddr *addr, uint16_t *fibnum) > > @@ -1834,17 +1834,23 @@ ifa_ifwithaddr(const struct sockaddr *addr) > > done: > > + if (fibnum != NULL && ifp != NULL) > > + *fibnum = ifp->if_fib; > > Rather than complicating this function to add an additional return > value, why not let the caller get the fib number directly? They can > just get ifa->ifa_ifp->if_fib. i wasn't willing to retrieve it outside the NET_EPOCH lock, so i changed the call chain to pull it inside a NET_EPOCH_ASSERT() block. > Similarly, I think adding this extra helper is unnecessary. Callers > which want the FIB number can use ifa_ifwithaddr() and extract it > directly. That's how in6_pcbbind_avail() handles fetching ifaddr flags, > for instance. i did not do a general audit to see if other potential race conditions already exist. if you're sure this is a norm we should be living with, i'll make the change you propose. > > --- a/sys/netinet/in_pcb.c > > +++ b/sys/netinet/in_pcb.c > > @@ -646,7 +646,8 @@ in_pcballoc(struct socket *so, struct inpcbinfo > > *pcbinfo)> > > inp->inp_pcbinfo = pcbinfo; > > inp->inp_socket = so; > > inp->inp_cred = crhold(so->so_cred); > > > > - inp->inp_inc.inc_fibnum = so->so_fibnum; > > + /* The FIB number is in both inp_inc and inp_socket; synchronize. */ > > + inp->inp_inc.inc_fibnum = inp->inp_socket->so_fibnum; > > so == inp->inp_socket, so this part of the change is redundant. nice. i'm loathe to reference "so" after its value has been copied. i realize that C doesn't have a borrow checker but i've been treating pointer copies as if they were destructive, and in many cases adding something like "so = NULL" after the copy to ensure that they are. i'll remove the comment but the code is correct. i'd like to see the so_cred dereference also made from inp->inp_socket if that wouldn't break anything. > > @@ -741,6 +743,8 @@ in_pcbbind(struct inpcb *inp, struct sockaddr_in *sin, > > int flags,> > > } > > if (anonport) > > > > inp->inp_flags |= INP_ANONPORT; > > > > + /* The FIB number is in both inp_inc and inp_socket; synchronize. */ > > + sosetfib(inp->inp_socket, inp->inp_inc.inc_fibnum); > > Why is there no such call in the IPv6 case? an oversight -- thanks. > > static int > > in_pcbbind_avail(struct inpcb *inp, const struct in_addr laddr, > > - ifa_ifwithaddr_check((const struct sockaddr *)&sin) == 0) > > + ifa_ifwithaddr_check_getfib((const struct sockaddr *)&sin, > > + fibnum) == 0) > > This excludes the IN_MULTICAST() case--is that intentional? i don't know. the multicast source address isn't checked against the interface table. if that's an oversight i'll fix that and subsequently retrieve the interface FIB. in the spirit of minimalism i only altered the case where an interface is known to possess the local address. > > + lookupflags, fibnum, cred); > > Indentation on continuing lines should be by four spaces. will fix. thanks for your investment of time in this project. -- Paul Vixie