On Fri, Jun 30, 2023 at 11:57:06AM +0200, Alexandr Nedvedicky wrote:
> Hello,
> 
> I'm not familiar enough with relayd, so perhaps other folks
> here might provide better way to troubleshoot the issue.
> 
> On Fri, Jun 30, 2023 at 11:10:44AM +0300, Kapetanakis Giannis wrote:
> > Hello,
> >
> > This happened to me twice.
> > OpenBSD 7.3 with syspatches.
> >
> > I have a pair of carp/pfsync/pf/relayd firewall-load balancers with many 
> > redirects (only) on them.
> >
> > I wanted to do maintenance of some hosts bellow load balancers.
> > After a while relayd crashed on Master firewall only.
> 
>     when you say crash: does it mean the relayd was terminated
>     by system because of memory/stack/program violation?
>     if it is the case is there any chance to collect core file?
> 
>     or was it rather voluntary exit, when relayd called its function fatal()
> 
>     the 'No such file or director' error code, which comes from DIOCRGETTSTATS
>     ioctl() come from line 1746 in sys/net/pf_table.c:
> 
> 1731 int
> 1732 pfr_get_tstats(struct pfr_table *filter, struct pfr_tstats *tbl, int 
> *size,
> 1733         int flags)
> 1734 {
> 1735         struct pfr_ktable       *p;
> 1736         struct pfr_ktableworkq   workq;
> 1737         int                      n, nn;
> 1738         time_t                   tzero = gettime();
> 1739
> 1740         /* XXX PFR_FLAG_CLSTATS disabled */
> 1741         ACCEPT_FLAGS(flags, PFR_FLAG_ALLRSETS);
> 1742         if (pfr_fix_anchor(filter->pfrt_anchor))
> 1743                 return (EINVAL);
> 1744         n = nn = pfr_table_count(filter, flags);
> 1745         if (n < 0)
> 1746                 return (ENOENT);
> 
> 
>     the pfr_table_count() function fails if and only if desired ruleset
>     does not exists.
> 
> 2177 int
> 2178 pfr_table_count(struct pfr_table *filter, int flags)
> 2179 {
> 2180         struct pf_ruleset *rs;
> 2181
> 2182         if (flags & PFR_FLAG_ALLRSETS)
> 2183                 return (pfr_ktable_cnt);
> 2184         if (filter->pfrt_anchor[0]) {
> 2185                 rs = pf_find_ruleset(filter->pfrt_anchor);
> 2186                 return ((rs != NULL) ? rs->tables : -1);
> 2187         }
> 2188         return (pf_main_ruleset.tables);
> 2189 }
> 
>     I wonder if it would help if adjust a fatal() line in relayd
>     to also capture table name and anchor it is trying to find.
>     diff which adjusts a call to fatal is below.
> 
>     if you don't want to build the whole tree and do in-place
>     build you will need to adjust CFLAGS and LDFLAGS. Something
>     like that will be needed:
> 
>       cd /path/to/your/src/usr.sbin/relayd
>       export CFLAGS='-I/path/to/your/src/sys -I/path/to/your/src/lib/libutil
>       export LDFLAGS='-L /path/to/your/src/lib/libutil'
>       make
> 
> 
> </snip>
> 
> >
> > same logs on Backup firewall so far, but after a minute or so:
> >
> > Jun 30 01:47:46 ll1 relayd[61766]: pfe: check_table: cannot get table 
> > stats: No such file or directory
>     this is where I'd like to see what table relayd is trying
>     to look up. The process 61766 then exits using call `exit(1)` 
>     on behalf of function fatal()
> 
> > Jun 30 01:47:46 ll1 relayd[94434]: ca exiting, pid 94434
> > Jun 30 01:47:46 ll1 relayd[83189]: ca exiting, pid 83189
> > Jun 30 01:47:46 ll1 relayd[9023]: ca exiting, pid 9023
> > Jun 30 01:47:46 ll1 relayd[89820]: ca exiting, pid 89820
> > Jun 30 01:47:46 ll1 relayd[94676]: ca exiting, pid 94676
> > Jun 30 01:47:46 ll1 relayd[1820]: hce exiting, pid 1820
> > Jun 30 01:47:46 ll1 relayd[52103]: lost child: pid 61766 exited abnormally
>     parent relayd process noticed the child took exit(1)
>     because it could not find table. 
> 
>     once you'll be able to run patched relayd can you try to reproduce
>     the issue?
> 
>     also it will help if you will collect additional data.
> 
>       pfctl -vsA > anchors-before
>       # reproduce the issue wait for relayd to exit/crrash
>       pfctl -vsA > anchors-after
> 
>     those data, together with output from adjusted call
>     to fatal() should help us to better understand
>     what's going on.
> 
> thanks for your help
> regards
> sashan
> 
> --------8<---------------8<---------------8<------------------8<--------
> diff --git a/usr.sbin/relayd/pfe_filter.c b/usr.sbin/relayd/pfe_filter.c
> index 347048ece56..e1ae050b768 100644
> --- a/usr.sbin/relayd/pfe_filter.c
> +++ b/usr.sbin/relayd/pfe_filter.c
> @@ -632,7 +632,8 @@ check_table(struct relayd *env, struct rdr *rdr, struct 
> table *table)
>               goto toolong;
>  
>       if (ioctl(env->sc_pf->dev, DIOCRGETTSTATS, &io) == -1)
> -             fatal("%s: cannot get table stats", __func__);
> +             fatal("%s: cannot get table stats for %s@%s", __func__,
> +                 io.pfrio_table.pfrt_name, io.pfrio_table.pfrt_anchor);
>  
>       return (tstats.pfrts_match);
>  

I agree printing this info is useful.
OK claudio@ to improve the error message.

-- 
:wq Claudio

Reply via email to