Mark <[EMAIL PROTECTED]> wrote:
>
> $pid = open2 (*READER, *WRITER, "/usr/local/sa/bin/spamc -f -d 127.0.0.1 -u test");
> print WRITER $text;
> close (WRITER);
> $body .= $_ while (<READER>);
> close (READER);
> 
> This works flawlessly on not all that large files; but when I tried it
> on a file over 1M, the whole process hangs at "print WRITER $text;". 

Yes, that is unfortunate behavior.

The problem you are seeing is that pipes have a fixed, limited size in
unix.  However, that size varies from unix to unix, but it is usually
not more than about 4K.

As long as the program that is reading from the other end of the pipe,
continues to read, then your side (writing) will never block for very
long.  The data will continue to "flow" through the pipe.

However, if the reader blocks for some reason, then after you've filled
the pipe up (with 4K or 8K or whatever is your pipe's data limit), you
will also block.

The reason that this works fine for files up to 256K in size is that
spamc is very well-behaved for such input.  Spamc needs to see the
entire message before it can render a decision, so it constantly reads
and reads, and produces no output.  After reading all of that, it then
generates plenty of output, and because you have switched to reading
from your other pipe at that point, you are able to receive all this
data without trouble.

However, spamc will not process messages larger than 256K unless you
specify a flag (-s size).  The reason of course is that resource
consumption on such huge messages causes the scanning process to slow
down, and also because spam messages are generally not that large.

At any rate, once spamc realizes that the message you are feeding it is
too large, it begins feeding it back to you, before it has read the
entire message (because it is not going to bother to feed it to spamd). 
As such, spamc is now both reading from its incoming pipe, and writing
to its outgoing pipe.  But, you are not reading from your end of that
outgoing pipe, so eventually spamc blocks, and because you are not going
to begin reading until you finish writing, eventually both pipes fill,
and a classic deadlock situation results.

I suggest that you either don't send such large messages to spamc (which
would save a lot of time since the results will be nil anyway), or
switch to a different read/write programming model, where you use
"select" to read and write both of your pipes in tandem (which is a lot
of programming work for probably little gain).

-- 
   [EMAIL PROTECTED] (Fuzzy Fox)     || "Good judgment comes from experience.
sometimes known as David DeSimone  ||  Experience comes from bad judgment."


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the
same time. Free trial click here: http://www.vmware.com/wl/offer/345/0
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to