On 06/07/2023 12:59, Alexandr Nedvedicky wrote: > Hello, > > > On Wed, Jul 05, 2023 at 03:33:36PM +0300, Kapetanakis Giannis wrote: >> In any case we shouldn't get table statistics for tables that are down. >> >> log_debug are just added by for my debugging, but the >> if (!rdr->table->up) continue; >> should probably go in. >> >> G >> >> Index: pfe.c >> =================================================================== >> RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v >> retrieving revision 1.90 >> diff -u -p -r1.90 pfe.c >> --- pfe.c 14 Sep 2020 11:30:25 -0000 1.90 >> +++ pfe.c 5 Jul 2023 12:27:37 -0000 >> @@ -790,6 +790,12 @@ pfe_statistics(int fd, short events, voi >> getmonotime(&tv_now); >> >> TAILQ_FOREACH(rdr, env->sc_rdrs, entry) { >> + if (!rdr->table->up) { >> + log_debug("%s: table: %s is down. continuing", >> __func__, rdr->conf.name); >> + continue; >> + } >> + //bilias >> + log_debug("%s: table: %s, up: %d id: %d", __func__, >> rdr->conf.name, rdr->table->up, rdr->conf.table_id); >> cnt = check_table(env, rdr, rdr->table); >> if (rdr->conf.backup_id != EMPTY_TABLE) >> cnt += check_table(env, rdr, rdr->backup); >> >> > this change might be valid fix. however I'm afraid it will uncover new issues. > in relayd. It just seems like a tip of iceberg. > > I think I've found explanation how table can disappear when host gets > disabled. I hope someone who is more familiar with relayd can fill > some details I'm still missing here. > > snippet here comes from disable_host() function in pfe.c: > > 602 if (host->up == HOST_UP) { > 603 if ((table = table_find(env, host->conf.tableid)) == NULL) > 604 fatalx("%s: invalid table id", __func__); > 605 table->up--; > 606 table->conf.flags |= F_CHANGED; > 607 } > 608 > 609 host->up = HOST_UNKNOWN; > ... > 624 if (!host->conf.parentid) { > 625 /* Disable all children */ > 626 SLIST_FOREACH(h, &host->children, child) > 627 disable_host(c, id, h); > 628 pfe_sync(); > 629 } > > at line 605 table->up counter may get to 0 if host is the only/last > host in table. > > if there is no parent, we are going disable children too calling > to pfe_sync() at line 628. > > pfe_sync() function updates rdr rules in pf. The logic there is still > kind of blurry to me. > > 687 TAILQ_FOREACH(rdr, env->sc_rdrs, entry) { > 688 rdr->conf.flags &= ~(F_BACKUP); > 689 rdr->conf.flags &= ~(F_DOWN); > 690 > 691 if (rdr->conf.flags & F_DISABLE || > 692 (rdr->table->up == 0 && rdr->backup->up == 0)) { > 693 rdr->conf.flags |= F_DOWN; > 694 active = NULL; > ... > 714 if (rdr->conf.flags & F_DOWN) { > 715 if (rdr->conf.flags & F_ACTIVE_RULESET) { > 716 flush_table(env, rdr); > 717 log_debug("%s: disabling ruleset", > __func__); > 718 rdr->conf.flags &= ~(F_ACTIVE_RULESET); > 719 id.id = rdr->conf.id; > 720 imsg.hdr.type = IMSG_CTL_PULL_RULESET; > 721 imsg.hdr.len = sizeof(id) + > IMSG_HEADER_SIZE; > 722 imsg.data = &id; > 723 sync_ruleset(env, rdr, 0); > 724 control_imsg_forward(env->sc_ps, &imsg); > 725 } > > here we walk through the list of redirect rules. at line 693 we mark > redirect to be down if there is no table and no back up. > > if redirect is down and has active rules we are going to flush > those rules at line 723 by calling sync_ruleset(env, rdr, 0); > the flush operation done by sync_ruleset() reads as follows: > > 332 void > 333 sync_ruleset(struct relayd *env, struct rdr *rdr, int enable) > ... > 352 if (transaction_init(env, anchor) == -1) { > 353 log_warn("%s: transaction init failed", __func__); > 354 return; > 355 } > 356 > 357 if (!enable) { > 358 if (transaction_commit(env) == -1) > 359 log_warn("%s: remove rules transaction failed", > 360 __func__); > 361 else > 362 log_debug("%s: rules removed", __func__); > 363 return; > 364 } > 365 > > as you can see the rules are removed by committing empty ryleset. > committing empty ruleset makes anchor empty, because commit operation > removes also tables which are not persistent. One can emulate that by using > pfctl on command line. > we start by loading simple rdr rule which reference table bar: > > pf# cat bar.conf > pass in on vio0 from 192.168.3.0/24 to 10.164.0.3 rdr-to <bar> > pf# pfctl -a relayd/bar -f bar.conf > > the next step is to check if table bar got created. > > pf# pfctl -a relayd/bar -sT -vg > ----r-- bar relayd/bar > > the 'r' flag means table is referenced only. Such table is > not reported unless '-sT' option comes with '-vg'. Now let's > add address to bar table and see what happens: > > pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10 > 1/1 addresses added. > pf# pfctl -a relayd/bar -sT -vg > --a-r-- bar relayd/bar > > there is 'a' flag which indicates table became active. The active > table is visible for '-sT' (show tables): > > pf# pfctl -a relayd/bar -sT > bar > > now let's see what happens if we clear rules from relayd/bar anchor > by committing empty ruleset: > > pf# echo '' | pfctl -a relayd/bar -f - > pf# pfctl -a relayd/bar -sT -vg > pfctl: Anchor does not exist > > both table and relayd/bar are gone. I thing this might be the mechanism > which creates surprise for relayd > > on the other hand if I create the bar table first before the > rule which references the table is loaded, then table is created > with persistent flag. So the order in which load rules and > tables does matter: > > pf# pfctl -a relayd/bar -sT -vg > pfctl: Anchor does not exist > > pf# pfctl -a relayd/bar -t bar -T add 192.168.1.10 > 1 table created. > 1/1 addresses added. > > pf# pfctl -a relayd/bar -sT -vg > -pa---- bar relayd/bar > > pf# pfctl -a relayd/bar -f bar.conf > pf# pfctl -a relayd/bar -sT -vg > -pa-r-- bar relayd/bar > > pf# echo '' | pfctl -a relayd/bar -f - > pf# pfctl -a relayd/bar -sT -vg > -pa---- bar relayd/bar > > looks like order matters. If we load/create table before rules, > then table gets its 'persistent' flag. If table is loaded after > rules, then it loses its persistent flag and may get lost when > rules are flushed. > > can you check table flags on your relay box? > pfctl -a relayd/... -sT -vg > just to see if my speculation is good enough or completely off. > > thanks and > regards > sashan
Hi sashan, Your logic seems fine to me. Indeed it acts like they are persistent. They don't have the p flag. # pfctl -a 'relayd/dir-imap' -sT -vg --a-r-- dir-imap relayd/dir-imap # pfctl -a 'relayd/dir-sieve' -sT -vg --a-r-- dir-sieve relayd/dir-sieve # pfctl -a 'relayd/dir-lmtp' -sT -vg --a-r-- dir-lmtp relayd/dir-lmtp note that in pfe_filter.c in init_tables() is see 73: tables[i].pfrt_flags |= PFR_TFLAG_PERSIST; Giannis ps. I've send a diff about statistics in tech@ It does more than just checking table->up status, but your approach is much better.