Hello,

I took a closer look at relayd logs you obtained and tried to locate
the origin of those messages in source code. the exact mechanism is
still mystery to me.

I'll start with brief summary:

> Jun 30 01:47:46 ll1 relayd[61766]: pfe: check_table: cannot get table stats: 
> No such file or directory
    line above is what pfe process reports right before it exits:

633
634         if (ioctl(env->sc_pf->dev, DIOCRGETTSTATS, &io) == -1)
635                 fatal("%s: cannot get table stats for %s@%s", __func__,
636                     io.pfrio_table.pfrt_name, io.pfrio_table.pfrt_anchor);
637

    snippet above comes from pfe_filter.c:check_table() function,
    which is called from  pfe.c:pfe_statistics(). pfe_statics()
    function is being called periodically from timer. I've noticed
    there is function pfe_disable_events() which should disable
    timer (stops calling to pfe_statistics()). However pfe_disable_events()
    is unused we never call it.

> Jun 30 01:45:59 ll1 relayd[52103]: incremented the demote state of group 
> '0relay'
    line above comes from here carp_demote_ioctl() here:

214         else
215                 log_info("%s the demote state of group '%s'",
216                     (demote > 0) ? "incremented" : "decremented", group);

    carp_demote_ioctl() is being called carp_demote_shutdown() which
    itself is being called from  parent_shutdown():

 373 void
 374 parent_shutdown(struct relayd *env)
 375 {
 376         config_purge(env, CONFIG_ALL);
 377
 378         proc_kill(env->sc_ps);
 379         control_cleanup(&env->sc_ps->ps_csock);
 380         carp_demote_shutdown();
 381
 382         free(env->sc_ps);
 383         free(env);
 384
 385         log_info("parent terminating, pid %d", getpid());
 386
 387         exit(0);
 388 }

    the relayd parent process is going to piecefully exit anyway. The parent
    exit is confirmed by those lines in log:
> Jun 30 01:47:46 ll1 relayd[52103]: decremented the demote state of group 
> '0relay'
> Jun 30 01:47:46 ll1 relayd[52103]: parent terminating, pid 52103

    the only way how we could arrive to parent_shutdown() function
    is after receiving IMSG_CTL_SHUTDOWN which is sent on behalf
    of command 'relactl stop' which I have no idea where it got
    called from.

    anyway I suspect we must disable periodic event in `pfe`
    process to avoid unexpected exit via call to fatal().

can you give a try to diff below?

thanks and
regards
sashan

--------8<---------------8<---------------8<------------------8<--------
diff --git a/usr.sbin/relayd/pfe.c b/usr.sbin/relayd/pfe.c
index 3a97b749c4b..ad9c9cdc0cc 100644
--- a/usr.sbin/relayd/pfe.c
+++ b/usr.sbin/relayd/pfe.c
@@ -93,6 +93,7 @@ pfe_init(struct privsep *ps, struct privsep_proc *p, void 
*arg)
 void
 pfe_shutdown(void)
 {
+       pfe_disable_events();
        flush_rulesets(env);
        config_purge(env, CONFIG_ALL);
 }

Reply via email to