Hi,

I've been able to get the problem down to a minimum (I think). It has been a quite arduous work... And I assure you that the machines that were "hanging" the connections where doing VERY STRANGE THINGS :S

The client seemed to be cutting the connection just after the DATA command (as the logs had revealed), without sending lines (because the spooling to disk message was not appearing). Sending DATA and cutting connection was not reproducing the hung connection. After a couple of hours trying to reproduce, an idea strucj me, maybe I should not send the complete DATA. So here the script to hang the connection:

#!/usr/bin/perl
my $to = $ARGV[0] or die "Usage: $0 email";
use Net::SMTP::TLS;
my $mailer = new Net::SMTP::TLS(
       'HOST_TO_HANG',
       Hello => 'localhost',
       NoTLS => 0
       );
$mailer->mail('');
$mailer->to('[EMAIL PROTECTED]');
$mailer->{sock}->write('DATA');
$mailer->{sock}->close();
print "do a top on your SMTP server\n";

Note that I write DATA without sending the terminator. (It's a shame Test::SMTP does not support STARTTLS... I'm working on it...).

After that I decided to see which was the plugin doing the harm... If in the test script you set NoTLS => 1, then the process does not hang. So it seems tls... Will it only be TLS? Deactivated all plugins and ran the test: doesn't hang. So the thing will be interaction between plugins...

Finally the set of plugins got reduced to tls + custom_plugin. custom_plugin is a plugin that we use to do Pop Before SMTP authentication. It connects to a MySQL db to see if the connecting IP is in the rely list... I've shaved the plugin down to the bare minumum to get the process to hang... Just do a DBI->connect.

The config used is:
tls /.../xxx.pem /.../xxx.pem /.../rootca.crt
dbi_connector
relay_all
rcpt_ok

relay_all just does
sub hook_rcpt {
  return (OK);
}

rcpt_ok is the standard one

dbi_connector is:
#!perl -w

sub hook_rcpt {
  require DBI or die "Can't load DBI";
  my ($self, $transaction, $recipient, %param) = @_;

my $dbh = DBI->connect('DBI:mysql:database=xxx;host=localhost;port=3306', 'xxx','xxx') or
     $self->log(LOGDEBUG, 'Could not connect ' . DBI->errstr()) ;

  return (DECLINED);
}

It doesn't matter if the DBI connects or does'nt connect successfully. Just run the test script and... you get a qp child doing lots of writes per second to a broken pipe...

I've tried with sqlite3, not connecting to a db, and it doesn't happen... looks like it could have to do with dbd::mysql and how it cleans up?

We are using QP version 0.40 on Debian Etch. Perl modules are standard Etch ones.

I've been trying to profile the code, to see the call path... but I'm having problems... Any pointers on how to debug or to profile the code on a spawned qp child?

Can somebody try to reproduce this? Are there any pointers on using DBI in your plugins (maybe the first one is: DON'T :p)?

Isn't it strange that it's only reproducable with a non-terminated DATA?

Any thoughts? Any Ideas?

Thanks in advance,

Jose Luis Martinez
CAPSiDE
[EMAIL PROTECTED]


Reply via email to