Hello,
On Wed, Jul 05, 2023 at 03:33:36PM +0300, Kapetanakis Giannis wrote: > In any case we shouldn't get table statistics for tables that are down. > > log_debug are just added by for my debugging, but the > if (!rdr->table->up) continue; > should probably go in. > > G > > Index: pfe.c > =================================================================== > RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v > retrieving revision 1.90 > diff -u -p -r1.90 pfe.c > --- pfe.c 14 Sep 2020 11:30:25 -0000 1.90 > +++ pfe.c 5 Jul 2023 12:27:37 -0000 > @@ -790,6 +790,12 @@ pfe_statistics(int fd, short events, voi > getmonotime(&tv_now); > > TAILQ_FOREACH(rdr, env->sc_rdrs, entry) { > + if (!rdr->table->up) { > + log_debug("%s: table: %s is down. continuing", > __func__, rdr->conf.name); > + continue; > + } > + //bilias > + log_debug("%s: table: %s, up: %d id: %d", __func__, > rdr->conf.name, rdr->table->up, rdr->conf.table_id); > cnt = check_table(env, rdr, rdr->table); > if (rdr->conf.backup_id != EMPTY_TABLE) > cnt += check_table(env, rdr, rdr->backup); > > this change might be valid fix. however I'm afraid it will uncover new issues. in relayd. It just seems like a tip of iceberg. I think I've found explanation how table can disappear when host gets disabled. I hope someone who is more familiar with relayd can fill some details I'm still missing here. snippet here comes from disable_host() function in pfe.c: 602 if (host->up == HOST_UP) { 603 if ((table = table_find(env, host->conf.tableid)) == NULL) 604 fatalx("%s: invalid table id", __func__); 605 table->up--; 606 table->conf.flags |= F_CHANGED; 607 } 608 609 host->up = HOST_UNKNOWN; ... 624 if (!host->conf.parentid) { 625 /* Disable all children */ 626 SLIST_FOREACH(h, &host->children, child) 627 disable_host(c, id, h); 628 pfe_sync(); 629 } at line 605 table->up counter may get to 0 if host is the only/last host in table. if there is no parent, we are going disable children too calling to pfe_sync() at line 628. pfe_sync() function updates rdr rules in pf. The logic there is still kind of blurry to me. 687 TAILQ_FOREACH(rdr, env->sc_rdrs, entry) { 688 rdr->conf.flags &= ~(F_BACKUP); 689 rdr->conf.flags &= ~(F_DOWN); 690 691 if (rdr->conf.flags & F_DISABLE || 692 (rdr->table->up == 0 && rdr->backup->up == 0)) { 693 rdr->conf.flags |= F_DOWN; 694 active = NULL; ... 714 if (rdr->conf.flags & F_DOWN) { 715 if (rdr->conf.flags & F_ACTIVE_RULESET) { 716 flush_table(env, rdr); 717 log_debug("%s: disabling ruleset", __func__); 718 rdr->conf.flags &= ~(F_ACTIVE_RULESET); 719 id.id = rdr->conf.id; 720 imsg.hdr.type = IMSG_CTL_PULL_RULESET; 721 imsg.hdr.len = sizeof(id) + IMSG_HEADER_SIZE; 722 imsg.data = &id; 723 sync_ruleset(env, rdr, 0); 724 control_imsg_forward(env->sc_ps, &imsg); 725 } here we walk through the list of redirect rules. at line 693 we mark redirect to be down if there is no table and no back up. if redirect is down and has active rules we are going to flush those rules at line 723 by calling sync_ruleset(env, rdr, 0); the flush operation done by sync_ruleset() reads as follows: 332 void 333 sync_ruleset(struct relayd *env, struct rdr *rdr, int enable) ... 352 if (transaction_init(env, anchor) == -1) { 353 log_warn("%s: transaction init failed", __func__); 354 return; 355 } 356 357 if (!enable) { 358 if (transaction_commit(env) == -1) 359 log_warn("%s: remove rules transaction failed", 360 __func__); 361 else 362 log_debug("%s: rules removed", __func__); 363 return; 364 } 365 as you can see the rules are removed by committing empty ryleset. committing empty ruleset makes anchor empty, because commit operation removes also tables which are not persistent. One can emulate that by using pfctl on command line. we start by loading simple rdr rule which reference table bar: pf# cat bar.conf pass in on vio0 from 192.168.3.0/24 to 10.164.0.3 rdr-to <bar> pf# pfctl -a relayd/bar -f bar.conf the next step is to check if table bar got created. pf# pfctl -a relayd/bar -sT -vg ----r-- bar relayd/bar the 'r' flag means table is referenced only. Such table is not reported unless '-sT' option comes with '-vg'. Now let's add address to bar table and see what happens: pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10 1/1 addresses added. pf# pfctl -a relayd/bar -sT -vg --a-r-- bar relayd/bar there is 'a' flag which indicates table became active. The active table is visible for '-sT' (show tables): pf# pfctl -a relayd/bar -sT bar now let's see what happens if we clear rules from relayd/bar anchor by committing empty ruleset: pf# echo '' | pfctl -a relayd/bar -f - pf# pfctl -a relayd/bar -sT -vg pfctl: Anchor does not exist both table and relayd/bar are gone. I thing this might be the mechanism which creates surprise for relayd on the other hand if I create the bar table first before the rule which references the table is loaded, then table is created with persistent flag. So the order in which load rules and tables does matter: pf# pfctl -a relayd/bar -sT -vg pfctl: Anchor does not exist pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10 1 table created. 1/1 addresses added. pf# pfctl -a relayd/bar -sT -vg -pa---- bar relayd/bar pf# pfctl -a relayd/bar -f bar.conf pf# pfctl -a relayd/bar -sT -vg -pa-r-- bar relayd/bar pf# echo '' | pfctl -a relayd/bar -f - pf# pfctl -a relayd/bar -sT -vg -pa---- bar relayd/bar looks like order matters. If we load/create table before rules, then table gets its 'persistent' flag. If table is loaded after rules, then it loses its persistent flag and may get lost when rules are flushed. can you check table flags on your relay box? pfctl -a relayd/... -sT -vg just to see if my speculation is good enough or completely off. thanks and regards sashan