Re: relayd crashing some times

Kapetanakis Giannis Wed, 05 Jul 2023 05:34:52 -0700

Updates:

1) I've managed to exit the backup firewall


2) adding -d -vvv
didn't print any more log_debug concerning our fatal() on table stats

it did print other log_debug but not from the shutdown() path.

it exited again with only printing:

pfe: check_table: cannot get table stats for dir2-lmtp@relayd/dir2-lmtp: No 
such file or directory

hce_notify_done: db1 (script ok)
hce_notify_done: db2 (script ok)
hce_notify_done: db3 (script ok)
pfe_statistics: table: ldap, up: 2 id: 1
pfe_statistics: table: mail, up: 2 id: 2
pfe_statistics: table: mx-smtps, up: 2 id: 3
pfe_statistics: table: mx-subm, up: 2 id: 4
pfe_statistics: table: dir-imap, up: 2 id: 5
pfe_statistics: table: dir-pop, up: 2 id: 6
pfe_statistics: table: dir-lmtp, up: 2 id: 7
pfe_statistics: table: dir-sieve, up: 2 id: 8
pfe_statistics: table: imap-smtp, up: 2 id: 9
pfe_statistics: table: sql, up: 3 id: 10
pfe_statistics: table: radius, up: 2 id: 11
pfe_statistics: table: radacct, up: 2 id: 12
pfe_statistics: table: dir2-imap, up: 0 id: 13
pfe_statistics: table: dir2-pop, up: 0 id: 14
pfe_statistics: table: dir2-lmtp, up: 0 id: 15
pfe: check_table: cannot get table stats for dir2-lmtp@relayd/dir2-lmtp: No 
such file or directory
hce exiting, pid 4773

in these logs my log with table name/up/id was before checking if it's up.

I did the following. I believe it's needed although it doesn't solve why the 
table is missing.

In any case we shouldn't get table statistics for tables that are down.

log_debug are just added by for my debugging, but the
if (!rdr->table->up) continue;
should probably go in.

G

Index: pfe.c
===================================================================
RCS file: /cvs/src/usr.sbin/relayd/pfe.c,v
retrieving revision 1.90
diff -u -p -r1.90 pfe.c
--- pfe.c       14 Sep 2020 11:30:25 -0000      1.90
+++ pfe.c       5 Jul 2023 12:27:37 -0000
@@ -790,6 +790,12 @@ pfe_statistics(int fd, short events, voi
        getmonotime(&tv_now);
 
        TAILQ_FOREACH(rdr, env->sc_rdrs, entry) {
+               if (!rdr->table->up) {
+                       log_debug("%s: table: %s is down. continuing", 
__func__, rdr->conf.name);
+                       continue;
+               }
+               //bilias
+               log_debug("%s: table: %s, up: %d id: %d", __func__, 
rdr->conf.name, rdr->table->up, rdr->conf.table_id);
                cnt = check_table(env, rdr, rdr->table);
                if (rdr->conf.backup_id != EMPTY_TABLE)
                        cnt += check_table(env, rdr, rdr->backup);


On 05/07/2023 13:42, Alexandr Nedvedicky wrote:
> Hello,
>
> On Wed, Jul 05, 2023 at 11:36:26AM +0300, Kapetanakis Giannis wrote:
>> Tried to replicate the issue today with running relayd in debug mode in 
>> order to print more details.
>>
>> /usr/sbin/relayd -d -v
>     I did poke to sources. try to increase verbosity by using more 'v':
>
>       /usr/sbin/relayd -d -vvv
>
>     single '-v' does not seem to be enough to make log_debug() to print
>     anything at least '-vv' is required.
>
>     please retry with '-vv' at least.
>
>> when relayd exited it only printed:
>> pfe: check_table: cannot get table stats for dir-sieve@relayd/dir-sieve: No 
>> such file or directory
>>
>> nothing from:
>> kill_tables():
>> log_debug("%s: deleted %d tables", __func__, cnt);
>>
>> or
>> flush_rulesets():
>> log_debug("%s: flushed rules", __func__);
>>
>> are you sure table delete/removal is coming from there?
>     I did use 'grep DIOC' on relayd sources to see which pf ioctls
>     are being used there. The only place where relayd calls
>     DIOCRCLRTABLES is kill_tables() function. The only way to
>     get there is via
>       pfe_shutdown()
>           flush_rulesets()
>               kill_tables()
>     this is the only call stack I can think of when looking at source code.
>
>     also keep in mind the log message is displayed after all tables are
>     removed. so in theory if pfe_statistics() timer fires while tables
>     are being flushed it may find out table just got deleted and do exit
>     via fatal(). On the other hand this sounds unlikely given the
>     stats collection timer runs every minute only.
>> In any case it shouldn't try to get stats for empty tables.
>> Maybe a check should be added in pfe_statistics() ?
>>
>> G
>>
> thanks and
> regards
> sashan

Re: relayd crashing some times

Reply via email to