> Da: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Per conto di Victor Duchovni > Inviato: giovedì 25 settembre 2008 22.22 > A: postfix-users@postfix.org > Oggetto: Re: R: read timeout on cleanup socket on two > different machines > > > On Thu, Sep 25, 2008 at 04:16:25PM -0400, Wietse Venema wrote: > > > I would not be surprised if there was some network rate limiting > > feature that you inadvertantly turned on, and that limits the data > > rate to 1 byte/second under some conditions. > > This said, it is just one possibility, the truth will emerge from > tcpdump... I just wanted to offer an alternative scenario to kernel > bugs in the socket I/O subsystem and thus encourage the effort to > generate the tcpdumps so we can find out the real cause.
Sorry for replying so late. Obviusly since I was looking forward for the problem to happen again, it took soo long to show up again. At the end of the present message you can find the original problem description, since time passed by. In short I found at least that, as was suggested, the problem is due to something dramaticly slowing down the network connection, but only from-to the remote server involved in the problem. I'm writing this reply mainly for reference, since the problem turns out to be network related. I sniffed the traffic from/to the server involved, and found that when the problem happens, the connection is heavily slowed down. After the DATA command is issued, I see that almost exactly one packet is issued from the remote server every minute, each carrying exactly 100 bytes (including frame). Since the problem is happening on two distinct postfix machines, connected with two different providers; and it happens only when receiving a message from the same mailing list, only from time to time, only during night hours, I would suppose that there's something in the network path that heavily shapes the traffic. At least, I don't know anything on my two postfix servers that can cause this behaviour. Here is a partial capture of the tcp session, somewhere in the middle of the message body: 02:11:15.006532 IP (tos 0x0, ttl 46, id 47066, offset 0, flags [DF], proto: TCP (6), length: 86) squid-cache.org.54737 > 192.168.0.100.smtp: . 2334:2380(46) ack 134 win 65535 0x0000: 4500 0056 b7da 4000 2e06 a212 0ca0 2509 [EMAIL PROTECTED] 0x0010: c0a8 0064 d5d1 0019 32f0 330d b1de 244a ...d....2.3...$J 0x0020: 5010 ffff 468c 0000 6f72 2061 7420 7468 P...F...or.at.th 0x0030: 6520 7665 7279 206c 6561 7374 2c20 746f e.very.least,.to 0x0040: 2076 6572 6966 7920 6974 7320 7072 6573 .verify.its.pres 0x0050: 656e en 02:11:15.006582 IP (tos 0x0, ttl 64, id 43044, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.0.100.smtp > squid-cache.org.54737: ., cksum 0x1d47 (correct), 134:134(0) ack 2380 win 46 <nop,nop,timestamp 1599101060 4005670333> 0x0000: 4500 0034 a824 4000 4006 9fea c0a8 0064 [EMAIL PROTECTED]@......d 0x0010: 0ca0 2509 0019 d5d1 b1de 244a 32f0 333b ..%.......$J2.3; 0x0020: 8010 002e 1d47 0000 0101 080a 5f50 5884 .....G......_PX. 0x0030: eec1 adbd .... 02:12:17.735768 IP (tos 0x0, ttl 46, id 55626, offset 0, flags [DF], proto: TCP (6), length: 86) squid-cache.org.54737 > 192.168.0.100.smtp: . 2380:2426(46) ack 134 win 65535 0x0000: 4500 0056 d94a 4000 2e06 80a2 0ca0 2509 [EMAIL PROTECTED] 0x0010: c0a8 0064 d5d1 0019 32f0 333b b1de 244a ...d....2.3;..$J 0x0020: 5010 ffff 4328 0000 6e20 7468 6520 6375 P...C(..n.the.cu 0x0030: 7272 656e 7420 332e 302e 5354 4142 4c45 rrent.3.0.STABLE 0x0040: 3130 0d0a 7061 636b 6167 652e 0d0a 6874 10..package...ht 0x0050: 7470 tp 02:12:17.735831 IP (tos 0x0, ttl 64, id 43045, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.0.100.smtp > squid-cache.org.54737: ., cksum 0x2808 (correct), 134:134(0) ack 2426 win 46 <nop,nop,timestamp 1599163796 4005670333> 0x0000: 4500 0034 a825 4000 4006 9fe9 c0a8 0064 [EMAIL PROTECTED]@......d 0x0010: 0ca0 2509 0019 d5d1 b1de 244a 32f0 3369 ..%.......$J2.3i 0x0020: 8010 002e 2808 0000 0101 080a 5f51 4d94 ....(......._QM. 0x0030: eec1 adbd .... 02:13:18.694435 IP (tos 0x0, ttl 46, id 63615, offset 0, flags [DF], proto: TCP (6), length: 86) squid-cache.org.54737 > 192.168.0.100.smtp: . 2426:2472(46) ack 134 win 65535 0x0000: 4500 0056 f87f 4000 2e06 616d 0ca0 2509 [EMAIL PROTECTED] 0x0010: c0a8 0064 d5d1 0019 32f0 3369 b1de 244a ...d....2.3i..$J 0x0020: 5010 ffff 9feb 0000 7777 2e73 7175 6964 P.......ww.squid 0x0030: 2d63 6163 6865 2e6f 7267 2f56 6572 7369 -cache.org/Versi 0x0040: 6f6e 732f 7633 2f33 2e30 2f0d 0a0d 0a49 ons/v3/3.0/....I 0x0050: 6620 f. 02:13:18.694521 IP (tos 0x0, ttl 64, id 43046, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.0.100.smtp > squid-cache.org.54737: ., cksum 0x39b3 (correct), 134:134(0) ack 2472 win 46 <nop,nop,timestamp 1599224762 4005670333> 0x0000: 4500 0034 a826 4000 4006 9fe8 c0a8 0064 E..4.&@[EMAIL PROTECTED] 0x0010: 0ca0 2509 0019 d5d1 b1de 244a 32f0 3397 ..%.......$J2.3. 0x0020: 8010 002e 39b3 0000 0101 080a 5f52 3bba ....9......._R;. 0x0030: eec1 adbd .... 02:14:19.059935 IP (tos 0x0, ttl 46, id 3332, offset 0, flags [DF], proto: TCP (6), length: 86) squid-cache.org.54737 > 192.168.0.100.smtp: . 2472:2518(46) ack 134 win 65535 0x0000: 4500 0056 0d04 4000 2e06 4ce9 0ca0 2509 [EMAIL PROTECTED] 0x0010: c0a8 0064 d5d1 0019 32f0 3397 b1de 244a ...d....2.3...$J 0x0020: 5010 ffff 240f 0000 7469 6c6c 206f 6363 P...$...till.occ 0x0030: 7572 7320 636f 756c 6420 796f 7520 6164 urs.could.you.ad 0x0040: 6420 6173 206d 7563 6820 696e 666f 2061 d.as.much.info.a 0x0050: 626f bo 02:14:19.059993 IP (tos 0x0, ttl 64, id 43047, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.0.100.smtp > squid-cache.org.54737: ., cksum 0x4db0 (correct), 134:134(0) ack 2518 win 46 <nop,nop,timestamp 1599285134 4005670333> 0x0000: 4500 0034 a827 4000 4006 9fe7 c0a8 0064 E..4.'@[EMAIL PROTECTED] 0x0010: 0ca0 2509 0019 d5d1 b1de 244a 32f0 33c5 ..%.......$J2.3. 0x0020: 8010 002e 4db0 0000 0101 080a 5f53 278e ....M......._S'. 0x0030: eec1 adbd .... The original problem description: > I operate two very different postfix machines. One is heavy > loaded and with > a decent hardware, the other is my home machine. Both have > CentOS5 with > postfix-2.3.3, amavis, spamassassin and clamav. On both > machines there is a > mail account signed on the same mailing list (in particular, > the popular > Squid web proxy daemon mailing list). > >From time to time, one or both of these accounts exhibit the > same problem > while receiving a message from the mentioned mailing list. > A message is received saying (I paste a transcript of the > error I receive > from my home machine, but the problem on the other is the same): > > Return-Path: <[EMAIL PROTECTED]> > From: [EMAIL PROTECTED] (Mail Delivery System) > To: [EMAIL PROTECTED] (Postmaster) > Subject: Postfix SMTP server: errors from squid-cache.org[12.160.37.9] > > Transcript of session follows. > > Out: 220 barattolo.rinnanet.it ESMTP Postfix > In: HELO squid-cache.org > Out: 250 barattolo.rinnanet.it > In: MAIL > FROM:<[EMAIL PROTECTED]> > Out: 250 2.1.0 Ok > In: RCPT TO:<[EMAIL PROTECTED]> > Out: 250 2.1.5 Ok > In: DATA > Out: 354 End data with <CR><LF>.<CR><LF> > Out: 451 4.3.0 Error: queue file write error > > Session aborted, reason: lost connection > > > Having a look at the logs, I find: > Sep 24 06:51:13 barattolo postfix/smtpd[5832]: connect from > squid-cache.org[12.160.37.9] > ... > Sep 24 06:52:08 barattolo postfix/smtpd[5832]: NOQUEUE: > filter: RCPT from > squid-cache.org[12.160.37.9]: <squid-cache.org[12.160.37.9]>: > Client host > triggers FILTER smtp-amavis:[127.0.0.1]:10024; > from=<[EMAIL PROTECTED]> > to=<[EMAIL PROTECTED]> proto=SMTP helo=<squid-cache.org> > Sep 24 06:52:08 barattolo postfix/smtpd[5832]: 2928F10000E: > client=squid-cache.org[12.160.37.9] > ... > Sep 24 07:52:07 barattolo postfix/cleanup[5848]: warning: > 2928F10000E: read > timeout on cleanup socket > ... > Sep 24 08:01:48 barattolo postfix/smtpd[5832]: disconnect from > squid-cache.org[12.160.37.9] > > I'm tempted to think that this is a mailing list's manager > problem, and to > forget about it, but I would like to be sure that the fault > is not partly or > totally mine. > Any suggestions? Thanks Luigi