On Monday, June 9, 2025 5:19:53 PM UTC Mark Johnston wrote:
> On Mon, Jun 09, 2025 at 04:42:55AM +0000, Paul Vixie wrote:
> > ...
> I've made a number of inline comments.

tyvm! this is precisely the kind of review i'd hoped for. see inline.

> > -   so->so_fibnum = head->so_fibnum;
> > +   if ((so->so_fibnum = head->so_fibnum) == 0)
> > +           so->so_fibnum = fibnum;

an accept()'ed socket [which this is] inherits a fibnum from its listen() 
socket which inherits from its creating process. the former can be overridden 
with setsockopt(SO_FIB) and the latter by setfib(). here, we are making this 
inherited or crafted [whichever it was] fibnum a fallback, so that in the case 
of a TCP SYN we can preserve the interface's fibnum [in the mbuf header] as 
the accept() socket's fibnum. i anticipate further complexity in the future, 
such that there may someday be more than than one fallback priority. so:

> The double assignment looks odd to me.  It'd be neater to write:
> 
>     so->so_fibnum == head->so_fibnum != 0 ? head->so_fibnum : fibnum;

it would not be future proof. as written, the extra memory write will be 
coalesced in the cache's write buffer, so any performance loss would only 
occur on processors made before ~Y2K. if we're to neaten it, i'd say:

if (head->so_fibnum != 0)
        so->so_fibnum = head->so_fibnum;
else
        so->so_fibnum = fibnum;

...so as to support future more-levels fallbacking. i dislike the read before 
write at least as much as you dislike the write after write, because the "if" 
will emit some kind of conditional branch in the machine code [as would your 
proposed trinary conditional), which then subjects us to branch prediction and 
occasional pipeline stalls. write coalescence is simply faster on processors 
made after ~Y2K. but i'll adapt as shown, optimizing for clarity and future-
proofing. i'll also make the corresponding change to tcp_input.c.

> > -ifa_ifwithaddr(const struct sockaddr *addr)
> > +ifa_ifwithaddr_getfib(const struct sockaddr *addr, uint16_t *fibnum)
> > @@ -1834,17 +1834,23 @@ ifa_ifwithaddr(const struct sockaddr *addr)
> >  done:
> > +   if (fibnum != NULL && ifp != NULL)
> > +           *fibnum = ifp->if_fib;
> 
> Rather than complicating this function to add an additional return
> value, why not let the caller get the fib number directly?  They can
> just get ifa->ifa_ifp->if_fib.

i wasn't willing to retrieve it outside the NET_EPOCH lock, so i changed the 
call chain to pull it inside a NET_EPOCH_ASSERT() block.

> Similarly, I think adding this extra helper is unnecessary.  Callers
> which want the FIB number can use ifa_ifwithaddr() and extract it
> directly.  That's how in6_pcbbind_avail() handles fetching ifaddr flags,
> for instance.

i did not do a general audit to see if other potential race conditions already 
exist. if you're sure this is a norm we should be living with, i'll make the 
change you propose.

> > --- a/sys/netinet/in_pcb.c
> > +++ b/sys/netinet/in_pcb.c
> > @@ -646,7 +646,8 @@ in_pcballoc(struct socket *so, struct inpcbinfo
> > *pcbinfo)> 
> >     inp->inp_pcbinfo = pcbinfo;
> >     inp->inp_socket = so;
> >     inp->inp_cred = crhold(so->so_cred);
> > 
> > -   inp->inp_inc.inc_fibnum = so->so_fibnum;
> > +   /* The FIB number is in both inp_inc and inp_socket; synchronize. */
> > +   inp->inp_inc.inc_fibnum = inp->inp_socket->so_fibnum;
> 
> so == inp->inp_socket, so this part of the change is redundant.

nice. i'm loathe to reference "so" after its value has been copied. i realize 
that C doesn't have a borrow checker but i've been treating pointer copies as 
if they were destructive, and in many cases adding something like "so = NULL" 
after the copy to ensure that they are. i'll remove the comment but the code 
is correct.

i'd like to see the so_cred dereference also made from inp->inp_socket if that 
wouldn't break anything.

> > @@ -741,6 +743,8 @@ in_pcbbind(struct inpcb *inp, struct sockaddr_in *sin,
> > int flags,> 
> >     }
> >     if (anonport)
> >     
> >             inp->inp_flags |= INP_ANONPORT;
> > 
> > +   /* The FIB number is in both inp_inc and inp_socket; synchronize. */
> > +   sosetfib(inp->inp_socket, inp->inp_inc.inc_fibnum);
> 
> Why is there no such call in the IPv6 case?

an oversight -- thanks.

> >  static int
> >  in_pcbbind_avail(struct inpcb *inp, const struct in_addr laddr,
> > -               ifa_ifwithaddr_check((const struct sockaddr *)&sin) == 
0)
> > +               ifa_ifwithaddr_check_getfib((const struct sockaddr 
*)&sin,
> > +                                           fibnum) == 0)
> 
> This excludes the IN_MULTICAST() case--is that intentional?

i don't know. the multicast source address isn't checked against the interface 
table. if that's an oversight i'll fix that and subsequently retrieve the 
interface FIB. in the spirit of minimalism i only altered the case where an 
interface is known to possess the local address.

> > +                                    lookupflags, fibnum, cred);
> 
> Indentation on continuing lines should be by four spaces.

will fix.

thanks for your investment of time in this project.

-- 
Paul Vixie



Reply via email to