Hello, I will try your diff, but since I have to completely turn off mail service it might take a while.
Meanwhile, just a wild guess from my side, although I'm not a dev: It seems to me that a table is being removed, specifically the table that has the hosts for the redirect. It's like after some active sessions expire (1-2min delay), the table is being removed like it's not persistent. Why did the table was removed on the first place? Maybe because there was no active host inside that table (table empty). Then some statistics is being called on that table and it exits since it's not there. If that's the case then it's should indeed not call statistics on the disabled table. regards, Giannis ps. I cannot replicate (without new diff) if the load balancer does not have active sessions on the redirect when I disable. It also does not happen on the backup load balancer On 03/07/2023 19:18, Alexandr Nedvedicky wrote: > Hello, > > I took a closer look at relayd logs you obtained and tried to locate > the origin of those messages in source code. the exact mechanism is > still mystery to me. > > I'll start with brief summary: > >> Jun 30 01:47:46 ll1 relayd[61766]: pfe: check_table: cannot get table stats: >> No such file or directory > line above is what pfe process reports right before it exits: > > 633 > 634 if (ioctl(env->sc_pf->dev, DIOCRGETTSTATS, &io) == -1) > 635 fatal("%s: cannot get table stats for %s@%s", __func__, > 636 io.pfrio_table.pfrt_name, io.pfrio_table.pfrt_anchor); > 637 > > snippet above comes from pfe_filter.c:check_table() function, > which is called from pfe.c:pfe_statistics(). pfe_statics() > function is being called periodically from timer. I've noticed > there is function pfe_disable_events() which should disable > timer (stops calling to pfe_statistics()). However pfe_disable_events() > is unused we never call it. > >> Jun 30 01:45:59 ll1 relayd[52103]: incremented the demote state of group >> '0relay' > line above comes from here carp_demote_ioctl() here: > > 214 else > 215 log_info("%s the demote state of group '%s'", > 216 (demote > 0) ? "incremented" : "decremented", group); > > carp_demote_ioctl() is being called carp_demote_shutdown() which > itself is being called from parent_shutdown(): > > 373 void > 374 parent_shutdown(struct relayd *env) > 375 { > 376 config_purge(env, CONFIG_ALL); > 377 > 378 proc_kill(env->sc_ps); > 379 control_cleanup(&env->sc_ps->ps_csock); > 380 carp_demote_shutdown(); > 381 > 382 free(env->sc_ps); > 383 free(env); > 384 > 385 log_info("parent terminating, pid %d", getpid()); > 386 > 387 exit(0); > 388 } > > the relayd parent process is going to piecefully exit anyway. The parent > exit is confirmed by those lines in log: >> Jun 30 01:47:46 ll1 relayd[52103]: decremented the demote state of group >> '0relay' >> Jun 30 01:47:46 ll1 relayd[52103]: parent terminating, pid 52103 > the only way how we could arrive to parent_shutdown() function > is after receiving IMSG_CTL_SHUTDOWN which is sent on behalf > of command 'relactl stop' which I have no idea where it got > called from. > > anyway I suspect we must disable periodic event in `pfe` > process to avoid unexpected exit via call to fatal(). > > can you give a try to diff below? > > thanks and > regards > sashan > > --------8<---------------8<---------------8<------------------8<-------- > diff --git a/usr.sbin/relayd/pfe.c b/usr.sbin/relayd/pfe.c > index 3a97b749c4b..ad9c9cdc0cc 100644 > --- a/usr.sbin/relayd/pfe.c > +++ b/usr.sbin/relayd/pfe.c > @@ -93,6 +93,7 @@ pfe_init(struct privsep *ps, struct privsep_proc *p, void > *arg) > void > pfe_shutdown(void) > { > + pfe_disable_events(); > flush_rulesets(env); > config_purge(env, CONFIG_ALL); > }