Re: Resource temporarily

2021-12-23 Thread natan
W dniu 22.12.2021 o 21:01, Phil Stracchino pisze:
> On 12/22/21 12:55, Wietse Venema wrote:
>> In this case Postfix is (also) overloading the MySQL server.
>>
>> - Get a more powerful system (or VM) for the MySQL server.
>>
>> - Reduce the workload per MySQL server (spread the load across
>>    multiple servers).
>
>
>
>
> Perhaps first of all, make sure that mysqld is properly tuned.  90% of
> small MySQL/MariaDB deployment performance problems can be resolved
> simply by properly tuning it for the available resources.
>
> But if you're overloading a single MySQL instance, consider using a
> Galera cluster (either MySQL or MariaDB) behind ProxySQL or HAproxy.
> Read performance on a Galera cluster scales approximately linearly
> with the number of nodes, and nodes can be more-or-less transparently
> added and dropped on demand.
>
> (Also, this gives you transparent DB redundancy in the case that a
> node crashes or needs to be taken offline for maintenance.)
>
>
I had galera-claster with 3 nodes and haproxy

--



Re: Resource temporarily

2021-12-23 Thread natan
W dniu 23.12.2021 o 01:53, raf pisze:
> On Wed, Dec 22, 2021 at 11:25:10AM +0100, natan  wrote:
>
>> W dniu 21.12.2021 o 18:15, Wietse Venema pisze:
>> 10.x.x.10 - is gallera klaster wirth 3 nodes (and max_con set to 1500
>> for any nodes)
>>
>> when I get this eror I check number of connections
>>
>> smtpd : 125
>>
>> smtp  inet  n   -   -   -   1   postscreen
>> smtpd pass  -   -   -   -   -   smtpd -o
>> receive_override_options=no_address_mappings
>>
>> and total: amavis+lmtp-dovecot+smtpd-o
>> receive_override_options=no_address_mappings : 335
>> from: ps -e|grep smtpd |wc -l
>>
 but:
 for local lmt port:10025 - 5 connection
 for incomming from amavis port: 10027- 132 connections
 smtpd - 60 connections (
 ps -e|grep smtpd - 196 connections
>>> 1) You show two smtpd process counts. What we need are the
>>> internet-related smtpd processes counts.
>>>
>>> 2) Network traffic is not constant. What we need are process counts
>>> at the time that postscreen logs the warnings.
>>>
> 2) Your kernel cannot support the default_process_limit of 1200.
> In that case a higher default_process_limit would not help. Instead,
> kernel configuration or more memory (or both) would help.
 5486 ?Ss 6:05 /usr/lib/postfix/sbin/master
 cat /proc/5486/limits
>>> Those are PER-PROCESS resource limits. I just verified that postscreen
>>> does not run into the "Max open files" limit of 4096 as it tries
>>> to hand off a connection, because that would result in an EMFILE
>>> (Too many open files) kernel error code.
>>>
>>> Additionally there are SYSTEM-WIDE limits for how much the KERNEL
>>> can handle. These are worth looking at when you're trying to handle
>>> big traffic on a small (virtual) machine. 
>>>
>>> Wietse
>> How I check ?
> Googling "linux system wide resource limits" shows a
> lot of things including
> https://www.tecmint.com/increase-set-open-file-limits-in-linux/
> which mentions sysctl, /etc/sysctl.conf, ulimit, and
> /etc/security/limits.conf.
>
> Then I realised that the problem is with process limits,
> not open file limits, but the same methods apply.
>
> On my VM, the hard and soft process limits are 3681:
>
>   # ulimit -Hu
>   3681
>   # ulimit -Su
>   3681
>
> Perhaps yours is less than that.
>
> To change it permanently, add something like the
> following to /etc/security/limits.conf (or to a file in
> /etc/security/limits.d/):
>
>   * hard nproc 4096
>   * soft nproc 4096
>
> Note that this is assuming Linux, and assuming that your
> server will be OK with increasing the process limit. That
> might not be the case if it's a tiny VM being asked to
> do too much. Good luck.
>
> cheers,
> raf
>
Raf I have:
#ulimit -Hu
257577
# ulimit -Su
257577

7343 ?    Rs    24:22 /usr/lib/postfix/sbin/master

# cat /proc/7343/limits
Limit Soft Limit   Hard Limit  
Units
Max cpu time  unlimited    unlimited   
seconds  
Max file size unlimited    unlimited   
bytes
Max data size unlimited    unlimited   
bytes
Max stack size    8388608  unlimited   
bytes
Max core file size    0    unlimited   
bytes
Max resident set  unlimited    unlimited   
bytes
Max processes 257577   257577  
processes
Max open files    4096 4096
files
Max locked memory 65536    65536   
bytes
Max address space unlimited    unlimited   
bytes
Max file locks    unlimited    unlimited   
locks
Max pending signals   257577   257577  
signals  
Max msgqueue size 819200   819200  
bytes
Max nice priority 0    0   
Max realtime priority 0    0   
Max realtime timeout  unlimited    unlimited   
us   

this is real limits for /usr/lib/postfix/sbin/master
--



Re: Resource temporarily

2021-12-23 Thread raf
On Thu, Dec 23, 2021 at 09:52:05AM +0100, natan  wrote:

> W dniu 23.12.2021 o 01:53, raf pisze:
> > On Wed, Dec 22, 2021 at 11:25:10AM +0100, natan  wrote:
> >
> >> W dniu 21.12.2021 o 18:15, Wietse Venema pisze:
> >> 10.x.x.10 - is gallera klaster wirth 3 nodes (and max_con set to 1500
> >> for any nodes)
> >>
> >> when I get this eror I check number of connections
> >>
> >> smtpd : 125
> >>
> >> smtp  inet  n   -   -   -   1   postscreen
> >> smtpd pass  -   -   -   -   -   smtpd -o
> >> receive_override_options=no_address_mappings
> >>
> >> and total: amavis+lmtp-dovecot+smtpd-o
> >> receive_override_options=no_address_mappings : 335
> >> from: ps -e|grep smtpd |wc -l
> >>
>  but:
>  for local lmt port:10025 - 5 connection
>  for incomming from amavis port: 10027- 132 connections
>  smtpd - 60 connections (
>  ps -e|grep smtpd - 196 connections
> >>> 1) You show two smtpd process counts. What we need are the
> >>> internet-related smtpd processes counts.
> >>>
> >>> 2) Network traffic is not constant. What we need are process counts
> >>> at the time that postscreen logs the warnings.
> >>>
> > 2) Your kernel cannot support the default_process_limit of 1200.
> > In that case a higher default_process_limit would not help. Instead,
> > kernel configuration or more memory (or both) would help.
>  5486 ?Ss 6:05 /usr/lib/postfix/sbin/master
>  cat /proc/5486/limits
> >>> Those are PER-PROCESS resource limits. I just verified that postscreen
> >>> does not run into the "Max open files" limit of 4096 as it tries
> >>> to hand off a connection, because that would result in an EMFILE
> >>> (Too many open files) kernel error code.
> >>>
> >>> Additionally there are SYSTEM-WIDE limits for how much the KERNEL
> >>> can handle. These are worth looking at when you're trying to handle
> >>> big traffic on a small (virtual) machine. 
> >>>
> >>>   Wietse
> >> How I check ?
> > Googling "linux system wide resource limits" shows a
> > lot of things including
> > https://www.tecmint.com/increase-set-open-file-limits-in-linux/
> > which mentions sysctl, /etc/sysctl.conf, ulimit, and
> > /etc/security/limits.conf.
> >
> > Then I realised that the problem is with process limits,
> > not open file limits, but the same methods apply.
> >
> > On my VM, the hard and soft process limits are 3681:
> >
> >   # ulimit -Hu
> >   3681
> >   # ulimit -Su
> >   3681
> >
> > Perhaps yours is less than that.
> >
> > To change it permanently, add something like the
> > following to /etc/security/limits.conf (or to a file in
> > /etc/security/limits.d/):
> >
> >   * hard nproc 4096
> >   * soft nproc 4096
> >
> > Note that this is assuming Linux, and assuming that your
> > server will be OK with increasing the process limit. That
> > might not be the case if it's a tiny VM being asked to
> > do too much. Good luck.
> >
> > cheers,
> > raf
> >
> Raf I have:
> #ulimit -Hu
> 257577
> # ulimit -Su
> 257577
> 
> 7343 ?    Rs    24:22 /usr/lib/postfix/sbin/master
> 
> # cat /proc/7343/limits
> Limit Soft Limit   Hard Limit  
> Units
> Max cpu time  unlimited    unlimited   
> seconds  
> Max file size unlimited    unlimited   
> bytes
> Max data size unlimited    unlimited   
> bytes
> Max stack size    8388608  unlimited   
> bytes
> Max core file size    0    unlimited   
> bytes
> Max resident set  unlimited    unlimited   
> bytes
> Max processes 257577   257577  
> processes
> Max open files    4096 4096
> files
> Max locked memory 65536    65536   
> bytes
> Max address space unlimited    unlimited   
> bytes
> Max file locks    unlimited    unlimited   
> locks
> Max pending signals   257577   257577  
> signals  
> Max msgqueue size 819200   819200  
> bytes
> Max nice priority 0    0   
> Max realtime priority 0    0   
> Max realtime timeout  unlimited    unlimited   
> us   
> 
> this is real limits for /usr/lib/postfix/sbin/master
> --

That looks like it should be plenty of processes,
as long as the server can really support that many.

You could test it with something like this:

#!/usr/bin/env perl
use warnings;
use strict;
my $max_nprocs = 8000;
my $i = 0;
while ($i < $max_nprocs)
{
$i++;
my $pid = fork();
die "fork #$i failed: $!\n" unless defined $pid;
sleep(

Re: Resource temporarily

2021-12-23 Thread natan
W dniu 23.12.2021 o 12:12, raf pisze:
> On Thu, Dec 23, 2021 at 09:52:05AM +0100, natan  wrote:
>
>> W dniu 23.12.2021 o 01:53, raf pisze:
>>> On Wed, Dec 22, 2021 at 11:25:10AM +0100, natan  wrote:
>>>
 W dniu 21.12.2021 o 18:15, Wietse Venema pisze:
 10.x.x.10 - is gallera klaster wirth 3 nodes (and max_con set to 1500
 for any nodes)

 when I get this eror I check number of connections

 smtpd : 125

 smtp  inet  n   -   -   -   1   postscreen
 smtpd pass  -   -   -   -   -   smtpd -o
 receive_override_options=no_address_mappings

 and total: amavis+lmtp-dovecot+smtpd-o
 receive_override_options=no_address_mappings : 335
 from: ps -e|grep smtpd |wc -l

>> but:
>> for local lmt port:10025 - 5 connection
>> for incomming from amavis port: 10027- 132 connections
>> smtpd - 60 connections (
>> ps -e|grep smtpd - 196 connections
> 1) You show two smtpd process counts. What we need are the
> internet-related smtpd processes counts.
>
> 2) Network traffic is not constant. What we need are process counts
> at the time that postscreen logs the warnings.
>
>>> 2) Your kernel cannot support the default_process_limit of 1200.
>>> In that case a higher default_process_limit would not help. Instead,
>>> kernel configuration or more memory (or both) would help.
>> 5486 ?Ss 6:05 /usr/lib/postfix/sbin/master
>> cat /proc/5486/limits
> Those are PER-PROCESS resource limits. I just verified that postscreen
> does not run into the "Max open files" limit of 4096 as it tries
> to hand off a connection, because that would result in an EMFILE
> (Too many open files) kernel error code.
>
> Additionally there are SYSTEM-WIDE limits for how much the KERNEL
> can handle. These are worth looking at when you're trying to handle
> big traffic on a small (virtual) machine. 
>
>   Wietse
 How I check ?
>>> Googling "linux system wide resource limits" shows a
>>> lot of things including
>>> https://www.tecmint.com/increase-set-open-file-limits-in-linux/
>>> which mentions sysctl, /etc/sysctl.conf, ulimit, and
>>> /etc/security/limits.conf.
>>>
>>> Then I realised that the problem is with process limits,
>>> not open file limits, but the same methods apply.
>>>
>>> On my VM, the hard and soft process limits are 3681:
>>>
>>>   # ulimit -Hu
>>>   3681
>>>   # ulimit -Su
>>>   3681
>>>
>>> Perhaps yours is less than that.
>>>
>>> To change it permanently, add something like the
>>> following to /etc/security/limits.conf (or to a file in
>>> /etc/security/limits.d/):
>>>
>>>   * hard nproc 4096
>>>   * soft nproc 4096
>>>
>>> Note that this is assuming Linux, and assuming that your
>>> server will be OK with increasing the process limit. That
>>> might not be the case if it's a tiny VM being asked to
>>> do too much. Good luck.
>>>
>>> cheers,
>>> raf
>>>
>> Raf I have:
>> #ulimit -Hu
>> 257577
>> # ulimit -Su
>> 257577
>>
>> 7343 ?    Rs    24:22 /usr/lib/postfix/sbin/master
>>
>> # cat /proc/7343/limits
>> Limit Soft Limit   Hard Limit  
>> Units
>> Max cpu time  unlimited    unlimited   
>> seconds  
>> Max file size unlimited    unlimited   
>> bytes
>> Max data size unlimited    unlimited   
>> bytes
>> Max stack size    8388608  unlimited   
>> bytes
>> Max core file size    0    unlimited   
>> bytes
>> Max resident set  unlimited    unlimited   
>> bytes
>> Max processes 257577   257577  
>> processes
>> Max open files    4096 4096
>> files
>> Max locked memory 65536    65536   
>> bytes
>> Max address space unlimited    unlimited   
>> bytes
>> Max file locks    unlimited    unlimited   
>> locks
>> Max pending signals   257577   257577  
>> signals  
>> Max msgqueue size 819200   819200  
>> bytes
>> Max nice priority 0    0   
>> Max realtime priority 0    0   
>> Max realtime timeout  unlimited    unlimited   
>> us   
>>
>> this is real limits for /usr/lib/postfix/sbin/master
>> --
> That looks like it should be plenty of processes,
> as long as the server can really support that many.
>
> You could test it with something like this:
>
>   #!/usr/bin/env perl
>   use warnings;
>   use strict;
>   my $max_nprocs = 8000;
>   my $i = 0;
>   while ($i < $max_nprocs)
>   {
>   $i++;
>   my $pid = fork(

Re: After network outage postfix found not running

2021-12-23 Thread Wietse Venema
Bob Proulx:
> Wietse Venema wrote:
> > Bob Proulx:
> > > Any ideas on why postfix would not be running after such an event on
> > > two of the systems but okay on the others?
> > 
> > LOGS. Postfix logs a sh*load, including processes that fail to
> > start. If the systems were unable to record this in LOGS, then you
> > will never know.
> 
> I guess we will never know then.  Because I showed the relevant logs.
> I would have showed more but the large message was rejected due to
> size.  But there wasn't anything more clueful than the logs I showed.

Postfix was only the messenger of bad news. It does not
spontaneously self-destruct.

Wietse


Re: [PATCH 2/3] Fix parallel build dependencies

2021-12-23 Thread Christian Göttsche
On Wed, 22 Dec 2021 at 22:21, Wietse Venema  wrote:
>
> Christian G?ttsche:
> > Plugin shared util objects require the global util object to be build.
> >
> What was the make command?

/usr/bin/make -j2 LD_LIBRARY_PATH=$(pwd)/lib:${LD_LIBRARY_PATH}

see https://salsa.debian.org/cgzones/postfix-dev/-/jobs/2304623/raw
for a failed build log


Re: After network outage postfix found not running

2021-12-23 Thread Matus UHLAR - fantomas

Bob Proulx:
> Any ideas on why postfix would not be running after such an event on
> two of the systems but okay on the others?



Wietse Venema wrote:

LOGS. Postfix logs a sh*load, including processes that fail to
start. If the systems were unable to record this in LOGS, then you
will never know.


On 22.12.21 21:41, Bob Proulx wrote:

I guess we will never know then.  Because I showed the relevant logs.
I would have showed more but the large message was rejected due to
size.  But there wasn't anything more clueful than the logs I showed.

It's not terribly important.  It was just an oddity.  Because Postfix
is so very reliable that it was unusual to see on two systems it had
stopped.  But again it is very unusual to have the root file system
blocking for so long.


it's still possible that:
- postfix was killed by e.g. OOM killer, in which case it could not log
that.
- the logs were lost because of systemd's log limits

there are multiple lined of postfix/master.

it also could be systemd restarting postfix and giving up after some time

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
BSE = Mad Cow Desease ... BSA = Mad Software Producents Desease


Re: After network outage postfix found not running

2021-12-23 Thread Wietse Venema
Demi Marie Obenour:
> My intuition is that either some timeout somewhere got hit, or that
> some I/O failed (rather than being queued forever) and caused an error
> paging in some code.  That would cause Postfix to die with SIGBUS.

If the file system was unavailable, then yes, failure to page in 
some code would be fatal.

> Do you have Postfix set to automatically be restarted if it crashes?

I expect that the restart would fail for the same reason as you
describe above.

Wietse


Re: [PATCH 2/3] Fix parallel build dependencies

2021-12-23 Thread Wietse Venema
Christian G?ttsche:
> On Wed, 22 Dec 2021 at 22:21, Wietse Venema  wrote:
> >
> > Christian G?ttsche:
> > > Plugin shared util objects require the global util object to be build.
> > >
> > What was the make command?
> 
> /usr/bin/make -j2 LD_LIBRARY_PATH=$(pwd)/lib:${LD_LIBRARY_PATH}
> 
> see https://salsa.debian.org/cgzones/postfix-dev/-/jobs/2304623/raw
> for a failed build log

The bug is that you're linking Postfix database plugins with
libpostfix-util or libpostfix-global. That is not supported.

You have:

AUXLIBS_CDB="-lcdb -L../../lib -L. -lpostfix-util" \
AUXLIBS_LDAP="-lldap -llber -L../../lib -L. -lpostfix-util 
-lpostfix-global" \
AUXLIBS_LMDB="-llmdb -L../../lib -L. -lpostfix-util" \
AUXLIBS_MYSQL="-lmysqlclient -L../../lib -L. -lpostfix-util 
-lpostfix-global" \
AUXLIBS_PCRE="-lpcre -L../../lib -L. -lpostfix-util" \
AUXLIBS_PGSQL="-lpq -L../../lib -L. -lpostfix-util -lpostfix-global" \
AUXLIBS_SQLITE="-lsqlite3 -L../../lib -L. -lpostfix-util -lpostfix-global 
-lpthread" \

You should have:

AUXLIBS_CDB="-lcdb"
AUXLIBS_LDAP="-lldap -llber"
AUXLIBS_LMDB="-llmdb"
AUXLIBS_MYSQL="-lmysqlclient"
AUXLIBS_PCRE="-lpcre"
AUXLIBS_PGSQL="-lpq"
AUXLIBS_SQLITE="-lsqlite3"

Also the following is unnecessary:

make -j2 LD_LIBRARY_PATH=$(pwd)/lib:${LD_LIBRARY_PATH}

Instead, remove the LD_LIBRARY_PATH stuff do this:

make -j2

I'll add a check to makedefs to fail the build with an UNSUPPORTED
error if it sees that database plugins are linked with libpostfix-*.

I'll also fix the makedefs check to reject LD_LIBRARY_PATH settings.

Wietse


Re: [PATCH 2/3] Fix parallel build dependencies

2021-12-23 Thread Christian Göttsche
On Thu, 23 Dec 2021 at 20:49, Wietse Venema  wrote:
>
> Christian G?ttsche:
> > On Wed, 22 Dec 2021 at 22:21, Wietse Venema  wrote:
> > >
> > > Christian G?ttsche:
> > > > Plugin shared util objects require the global util object to be build.
> > > >
> > > What was the make command?
> >
> > /usr/bin/make -j2 LD_LIBRARY_PATH=$(pwd)/lib:${LD_LIBRARY_PATH}
> >
> > see https://salsa.debian.org/cgzones/postfix-dev/-/jobs/2304623/raw
> > for a failed build log
>
> The bug is that you're linking Postfix database plugins with
> libpostfix-util or libpostfix-global. That is not supported.
>
> You have:
>
> AUXLIBS_CDB="-lcdb -L../../lib -L. -lpostfix-util" \
> AUXLIBS_LDAP="-lldap -llber -L../../lib -L. -lpostfix-util 
> -lpostfix-global" \
> AUXLIBS_LMDB="-llmdb -L../../lib -L. -lpostfix-util" \
> AUXLIBS_MYSQL="-lmysqlclient -L../../lib -L. -lpostfix-util 
> -lpostfix-global" \
> AUXLIBS_PCRE="-lpcre -L../../lib -L. -lpostfix-util" \
> AUXLIBS_PGSQL="-lpq -L../../lib -L. -lpostfix-util -lpostfix-global" \
> AUXLIBS_SQLITE="-lsqlite3 -L../../lib -L. -lpostfix-util -lpostfix-global 
> -lpthread" \
>
> You should have:
>
> AUXLIBS_CDB="-lcdb"
> AUXLIBS_LDAP="-lldap -llber"
> AUXLIBS_LMDB="-llmdb"
> AUXLIBS_MYSQL="-lmysqlclient"
> AUXLIBS_PCRE="-lpcre"
> AUXLIBS_PGSQL="-lpq"
> AUXLIBS_SQLITE="-lsqlite3"
>

Thanks, this works.

> Also the following is unnecessary:
>
> make -j2 LD_LIBRARY_PATH=$(pwd)/lib:${LD_LIBRARY_PATH}
>
> Instead, remove the LD_LIBRARY_PATH stuff do this:
>
> make -j2
>

True, seems to be not necessary.

> I'll add a check to makedefs to fail the build with an UNSUPPORTED
> error if it sees that database plugins are linked with libpostfix-*.
>
> I'll also fix the makedefs check to reject LD_LIBRARY_PATH settings.
>
> Wietse

Thanks, please disregard those two sent patches.


Re: message_size_limit documentation

2021-12-23 Thread Wietse Venema
Scott Kitterman:
> Currently, postconf.5 has this to say about message_size_limit:
>
> message_size_limit (default: 1024)
>
> The maximal size in bytes of a message, including envelope information.
>
> Note: be careful when making changes. Excessively small values will result
> in the loss of non-delivery notifications, when a bounce message size exceeds
> the local or remote MTA's message size limit.
>
>
> It documents the default, but not the maximum.

The maximum is determined by (kernel) resource limits, file system sizes, 
and...

>  Apparently there is one (and
> who would care, one of Debian's users, apparently [1]).  I'm not particularly
> confused about why there would be a maximum, but it might be reasonable to
> document what it is.  Perhaps add something like "Maximum value is
> 2147483647." at the end of the note so that users don't have to find out the
> hard way:
>
> fatal: bad numerical configuration: message_size_limit = 2147483648

That is the LONG_MAX value for 32-bit machines. It's much bifgger
for 64-bit systems.

I guess we could put that in the manpage. I have ab old wishlist
item to migrate file sizes from to off_t (which is 64 bits on
most systems).

But that is a lot of effort, and I was kind-of hoping that 32-bit
systems will go away.

Wietse

> Scott K
>
> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960272
>
>
>


Re: [PATCH 2/3] Fix parallel build dependencies

2021-12-23 Thread Wietse Venema
Christian G?ttsche:
> > I'll add a check to makedefs to fail the build with an UNSUPPORTED
> > error if it sees that database plugins are linked with libpostfix-*.
> >
> > I'll also fix the makedefs check to reject LD_LIBRARY_PATH settings.
> 
> Thanks, please disregard those two sent patches.

No problem.  

I did take your \' fixes.  This text was written when my primary
platform was nroff based. I have verified that the backslash wasn't
needed there, too.

So we have taught each other that less can be better.

Wietse


Re: message_size_limit documentation

2021-12-23 Thread Scott Kitterman
On Thursday, December 23, 2021 3:51:57 PM EST Wietse Venema wrote:
> Scott Kitterman:
> > Currently, postconf.5 has this to say about message_size_limit:
> > 
> > message_size_limit (default: 1024)
> > 
> > The maximal size in bytes of a message, including envelope
> > information.
> > 
> > Note: be careful when making changes. Excessively small values will
> > result
> > 
> > in the loss of non-delivery notifications, when a bounce message size
> > exceeds the local or remote MTA's message size limit.
> > 
> > 
> > It documents the default, but not the maximum.
> 
> The maximum is determined by (kernel) resource limits, file system sizes,
> and...
> 
> >  Apparently there is one (and
> > 
> > who would care, one of Debian's users, apparently [1]).  I'm not
> > particularly confused about why there would be a maximum, but it might be
> > reasonable to document what it is.  Perhaps add something like "Maximum
> > value is 2147483647." at the end of the note so that users don't have to
> > find out the hard way:
> > 
> > fatal: bad numerical configuration: message_size_limit = 2147483648
> 
> That is the LONG_MAX value for 32-bit machines. It's much bifgger
> for 64-bit systems.
> 
> I guess we could put that in the manpage. I have ab old wishlist
> item to migrate file sizes from to off_t (which is 64 bits on
> most systems).
> 
> But that is a lot of effort, and I was kind-of hoping that 32-bit
> systems will go away.

Thanks.  I don't think it's worth a lot of effort.  I'd imagine it's a pretty 
niche use case to send multi-gigabyte files via SMTP.  People do do it though 
(clearly or there wouldn't be a bug).

I wrestled with a few options for a simple explanation, but didn't come up 
with anything I particularly liked.  I think it's correct that there's a hole 
in the documentation, but I don't have a good recommendation on how to fill it.

Scott K




Re: After network outage postfix found not running

2021-12-23 Thread Viktor Dukhovni
Could a watchdog timer have killed master(8) if it were suspended
long enough?

> On 23 Dec 2021, at 1:57 pm, Wietse Venema  wrote:
> 
>> My intuition is that either some timeout somewhere got hit, or that
>> some I/O failed (rather than being queued forever) and caused an error
>> paging in some code.  That would cause Postfix to die with SIGBUS.
> 
> If the file system was unavailable, then yes, failure to page in 
> some code would be fatal.

-- 
Viktor.



Re: message_size_limit documentation

2021-12-23 Thread Wietse Venema
Scott Kitterman:
> Thanks.  I don't think it's worth a lot of effort.  I'd imagine it's a pretty 
> niche use case to send multi-gigabyte files via SMTP.  People do do it though 
> (clearly or there wouldn't be a bug).
> 
> I wrestled with a few options for a simple explanation, but didn't come up 
> with anything I particularly liked.  I think it's correct that there's a hole 
> in the documentation, but I don't have a good recommendation on how to fill 
> it.

In Postfix 3.7 I have updated the text for message_size_limit.

message_size_limit (default: 1024)
   The maximal size in bytes of a message, including envelope information.
   The  value cannot exceed LONG_MAX (typically, a 32-bit or 64-bit signed
   integer).

Ditto for mailbox_size_limit.

Wietse


Re: After network outage postfix found not running

2021-12-23 Thread Bob Proulx
Matus UHLAR - fantomas wrote:
> it's still possible that:
> - postfix was killed by e.g. OOM killer, in which case it could not log that.

I disable the OOM with vm.overcommit_memory = 2 so that particular
thing won't be it.

> - the logs were lost because of systemd's log limits

That is possible.  The two failing systems were ones running systemd.
I am not a fan.  I am looking at rsyslog logging.

> there are multiple lined of postfix/master.
> 
> it also could be systemd restarting postfix and giving up after some time

I don't believe systemd will try to restart postfix.

Good ideas though.  Thank you for brainstorming along with me.

Bob


Re: After network outage postfix found not running

2021-12-23 Thread Bob Proulx
Wietse Venema wrote:
> Postfix was only the messenger of bad news. It does not
> spontaneously self-destruct.

I have always found Postfix to be extremely reliable and robust.
Which was why this happening on two different systems was such an oddity.

Bob


Re: After network outage postfix found not running

2021-12-23 Thread Bob Proulx
Viktor Dukhovni wrote:
> Could a watchdog timer have killed master(8) if it were suspended
> long enough?

Seems plausible.  I could see something in the code timing out since
things would be blocked waiting for I/O for so long.a

> Demi Marie Obenour:
> > My intuition is that either some timeout somewhere got hit, or that
> > some I/O failed (rather than being queued forever) and caused an error
> > paging in some code.  That would cause Postfix to die with SIGBUS.
> 
> If the file system was unavailable, then yes, failure to page in
> some code would be fatal.

This is a good brainstorm.  I wasn't thinking about the swap side of
memory.  It seems very plausible to me that a paged out block might
have been needed.  And that might have timed out and been reported as
a an I/O failure.  Which would have killed the process.  Or possibly
the reverse.  The system may have tried to page out a block and the
writing of that block may have timed out as well.

> > Do you have Postfix set to automatically be restarted if it crashes?

No.  Postfix is very reliable and robust.  It has never been needed.

And I think I will resist the urge to add automated restarting of
postfix now too.  Because this was a very unusual situation.  I know
we always fight the last war.  I doubt this will be a repeating
problem.  But it would add a layer of snag that another admin might
not be expecting.

Plus I have now learned that if the network is offline for any
significant time then all affected systems should be rebooted as a
precautionary.  And a reboot is always okay.  Systems reboot just
fine.

Instead I think I will add a watchdog of some sort that would
automatically detect this type of network attached storage outage and
then automatically reboot the system if it detects that it is
recovering from such a state.  That's harder to do.  But it solves the
problem for the entire system globally.

> I expect that the restart would fail for the same reason as you
> describe above.

I would expect that it would block waiting for I/O and simply wait to
start.  It would stack up as another process that increases the load
average.  And then eventually when the disk request was serviced then
it would continue and start then.

Thank you everyone for brainstorming along with me.  It's a good
learning experience.  And I think I know I need a way to detect that
the network attached block storage has been offline too long and that
the system when recovered from that needs to be rebooted.

Bob


Re: After network outage postfix found not running

2021-12-23 Thread jdebert
On Thu, 23 Dec 2021 17:16:10 -0700
Bob Proulx  wrote:

> Wietse Venema wrote:
> > Postfix was only the messenger of bad news. It does not
> > spontaneously self-destruct.  
> 
> I have always found Postfix to be extremely reliable and robust.
> Which was why this happening on two different systems was such an
> oddity.
> 
> Bob

From my own observations on debian:

systemd's default config does not wait for the network before starting
postfix and will not retry. If it is actually set up to wait, then
systemd is ignoring that bit.

--


Re: Resource temporarily

2021-12-23 Thread raf
On Thu, Dec 23, 2021 at 12:34:20PM +0100, natan  wrote:

> W dniu 23.12.2021 o 12:12, raf pisze:
> > That looks like it should be plenty of processes,
> > as long as the server can really support that many.
> >
> > You could test it with something like this:
> >
> > #!/usr/bin/env perl
> > use warnings;
> > use strict;
> > my $max_nprocs = 8000;
> > my $i = 0;
> > while ($i < $max_nprocs)
> > {
> > $i++;
> > my $pid = fork();
> > die "fork #$i failed: $!\n" unless defined $pid;
> > sleep(10), exit(0) if $pid == 0;
> > }
> > print "$i forks succeeded\n";
> >
> > For example, a VM here reports 7752 for ulimit -Su,
> > but the above script failed on the 3470th fork.
> >
> > cheers,
> > raf
> >
> in machine with postfix
> 
> time ./1.py
> 12000 forks succeeded
> 
> real    0m1,365s
> user    0m0,088s
> sys    0m1,276s

That looks like it should be enough.
Sorry, I'm out of ideas.

cheers,
raf