Nikolaos Milas: > On 8/6/2022 5:44 ?.?., Wietse Venema wrote: > > Possible causes (there may be more): > > > > - There is a problem with the network connection between mailgw1 > > and mailgw1 that causes some connections to have excessive retries. > > This could be a data-dependent problem. About 20 years ago, someone > > fixed a Postfix networking problem by replacing bad network hardware > > (a different port on a switch). > > > > - There is a problem in the file system that causes delays in the > > fsync() system call. Postfix will not reply that the mesasage has > > been "received" before that system call completes successfully. If > > you are using a networked file system, see my previous point. fsync() > > performance also depends on how a disk drive manages its cache. > > > > - vmail2 is using header_checks, body_checks, or smtpd_milters that > > take an insane amount of time. These are by their nature data-dependent. > > Sorry, no time to examine up your Postfix configuration. > > Thank you Wietse for this analysis. > > My big question is why this happens *ONLY* to particular messages, esp. > those originating from wetransfer.com and sharepointonline.com, and it > happens *consistently* to those.
Did I say data-dependent? I thought so. > These messages (with delays) are a *very* small percentage of the total > mail we receive. > > Moreover, we don't do any header / body? etc checks. All such > checks I could swear that vmail has header or body checks, along with smtpd_milters. > deliver mail to the final recipient. > > As I wrote to Victor, the only milter is the DKIM signing one, used for > outgoing messages. I agree that DKIM is unlikely to use upo 255 seconds even if you had turned on signature checks by mistake. > How could we further investigate the issue, e.g. by more thoroughly > (i.e. in high detail) logging postifx activity on mails from those domains? Betetr network monitoring. Something is taking 255 seconds to time out. Wietse