Hi,
Currently I have MIMEDefang set up to call Spam Assassin for all incoming 
messages. I am trying to set up Bayes for Mailman lists so I have the 
script 'mmlearn' (attached) which runs sa-learn on pickled emails.

The problem is that for certain messages sa-learn crashes. I have attached
 a tar file of 2 examples (so they don't get marked as spam :)

[midget 14:31] ~ >sudo su -m mailman -c 'env HOME=/usr/local/mailman sa-learn 
-u mailman -D --showdots --mbox --spam' <crashmsg1
debug: SpamAssassin version 3.0.4
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/home/darius/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/games', keeping.
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/home/darius/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/sbin', keeping.
debug: Final PATH set to: 
/home/darius/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:/home/darius/bin:/usr/sbin:/sbin:/usr/sbin:/sbin
debug: using "/usr/local/etc/mail/spamassassin/init.pre" for site rules init.pre
debug: config: read file /usr/local/etc/mail/spamassassin/init.pre
debug: using "/usr/local/share/spamassassin" for default rules dir
debug: config: read file /usr/local/share/spamassassin/10_misc.cf
debug: config: read file /usr/local/share/spamassassin/20_anti_ratware.cf
debug: config: read file /usr/local/share/spamassassin/20_body_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_compensate.cf
debug: config: read file /usr/local/share/spamassassin/20_dnsbl_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_drugs.cf
debug: config: read file /usr/local/share/spamassassin/20_fake_helo_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_head_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_html_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_meta_tests.cf
debug: config: read file /usr/local/share/spamassassin/20_phrases.cf
debug: config: read file /usr/local/share/spamassassin/20_porn.cf
debug: config: read file /usr/local/share/spamassassin/20_ratware.cf
debug: config: read file /usr/local/share/spamassassin/20_uri_tests.cf
debug: config: read file /usr/local/share/spamassassin/23_bayes.cf
debug: config: read file /usr/local/share/spamassassin/25_body_tests_es.cf
debug: config: read file /usr/local/share/spamassassin/25_hashcash.cf
debug: config: read file /usr/local/share/spamassassin/25_spf.cf
debug: config: read file /usr/local/share/spamassassin/25_uribl.cf
debug: config: read file /usr/local/share/spamassassin/30_text_de.cf
debug: config: read file /usr/local/share/spamassassin/30_text_fr.cf
debug: config: read file /usr/local/share/spamassassin/30_text_nl.cf
debug: config: read file /usr/local/share/spamassassin/30_text_pl.cf
debug: config: read file /usr/local/share/spamassassin/50_scores.cf
debug: config: read file /usr/local/share/spamassassin/60_whitelist.cf
debug: using "/usr/local/etc/mail/spamassassin" for site rules dir
debug: using "/usr/local/mailman/.spamassassin/user_prefs" for user prefs file
debug: config: read file /usr/local/mailman/.spamassassin/user_prefs
debug: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x8a0bd90)
debug: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::Hashcash=HASH(0x8a1b794)
debug: plugin: loading Mail::SpamAssassin::Plugin::SPF from @INC
debug: plugin: registered Mail::SpamAssassin::Plugin::SPF=HASH(0x8a33edc)
debug: plugin: Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x8a0bd90) implements 
'parse_config'
debug: plugin: Mail::SpamAssassin::Plugin::Hashcash=HASH(0x8a1b794) implements 
'parse_config'
debug: bayes: 98028 tie-ing to DB file R/O 
/usr/local/mailman/.spamassassin/bayes_toks
debug: bayes: 98028 tie-ing to DB file R/O 
/usr/local/mailman/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: Score set 2 chosen.
debug: Initialising learner
debug: Syncing Bayes and expiring old tokens...
debug: lock: 98028 created 
/usr/local/mailman/.spamassassin/bayes.lock.midget.dons.net.au.98028
debug: lock: 98028 trying to get lock on /usr/local/mailman/.spamassassin/bayes 
with 0 retries
debug: lock: 98028 link to /usr/local/mailman/.spamassassin/bayes.lock: link ok
debug: bayes: 98028 tie-ing to DB file R/W 
/usr/local/mailman/.spamassassin/bayes_toks
debug: bayes: 98028 tie-ing to DB file R/W 
/usr/local/mailman/.spamassassin/bayes_seen
debug: bayes: found bayes db version 3
debug: refresh: 98028 refresh /usr/local/mailman/.spamassassin/bayes.lock
debug: Syncing complete.
debug: Learning Spam
debug: received-header: parsed as [ ip=80.68.88.245 rdns=server1.aladan.net 
helo=server1.aladan.net by=midget.dons.net.au ident= [EMAIL PROTECTED] intl=0 
id=j610d5Id081088 auth= ]
debug: received-header: parsed as [ ip=195.47.42.5 rdns=195.47.42.5 
helo=195.47.42.5 by=server1.aladan.net ident= envfrom= intl=0 id=j610ctAg015021 
auth= ]
debug: is DNS available? 0
debug: received-header: parsed as [ ip=92.132.203.224 rdns= helo= 
by=195.47.42.5 ident= envfrom= intl=0 id=2142659851detailing23659 auth= ]
debug: received-header: cannot use DNS, do not trust any hosts from here on
debug: received-header: relay 80.68.88.245 trusted? no internal? no
debug: received-header: relay 195.47.42.5 trusted? no internal? no
debug: received-header: relay 92.132.203.224 trusted? no internal? no
debug: metadata: X-Spam-Relays-Trusted:
debug: metadata: X-Spam-Relays-Untrusted: [ ip=80.68.88.245 
rdns=server1.aladan.net helo=server1.aladan.net by=midget.dons.net.au ident= 
[EMAIL PROTECTED] intl=0 id=j610d5Id081088 auth= ] [ ip=195.47.42.5 
rdns=195.47.42.5 helo=195.47.42.5 by=server1.aladan.net ident= envfrom= intl=0 
id=j610ctAg015021 auth= ] [ ip=92.132.203.224 rdns= helo= by=195.47.42.5 ident= 
envfrom= intl=0 id=2142659851detailing23659 auth= ]
debug: ---- MIME PARSER START ----
debug: main message type: text/plain
debug: parsing normal part
debug: added part, type: text/plain
debug: ---- MIME PARSER END ----
debug: decoding: other encoding type (7bit), ignoring
debug: uri found: http://uhdzu.azwpd9alp2az7ts.zorromf.info
debug: refresh: 98028 refresh /usr/local/mailman/.spamassassin/bayes.lock
debug: tokenize: header tokens for Mime-Version = " 1.0 (Apple Message 
framework v728)"
debug: tokenize: header tokens for Content-Transfer-Encoding = " 7bit"
debug: tokenize: header tokens for *m = "  1681980078 575569277 195 47 42 5 "
debug: tokenize: header tokens for *c = " /plain; charset=US-ASCII; 
format=flowed"
debug: tokenize: header tokens for To = "U*all D*fucs.org.au D*org.au D*au"
debug: tokenize: header tokens for *F = "U*entertainers D*artdirectors.com 
D*com"
debug: tokenize: header tokens for *x = " Apple Mail (2.728)"
debug: tokenize: header tokens for *RT = " "
debug: tokenize: header tokens for *RU = " [ ip=80.68.88.245 
rdns=server1.aladan.net helo=server1.aladan.net by=midget.dons.net.au ident= 
[EMAIL PROTECTED] intl=0 id=j610d5Id081088 auth= ] [ ip=195.47.42.5 
rdns=195.47.42.5 helo=195.47.42.5 by=server1.aladan.net ident= envfrom= intl=0 
id=j610ctAg015021 auth= ] [ ip=92.132.203.224 rdns= helo= by=195.47.42.5 ident= 
envfrom= intl=0 id=2142659851detailing23659 auth= ]"
debug: tokenize: header tokens for *r = "   [92.132.203 ip*92.132.203.224 ] 
(port=4461 helo=[homesteaders]) by 195.47.42 ip*195.47.42.5    esmtp id 
2142659851detailing23659   [EMAIL PROTECTED]; "
debug: tokenize: header tokens for *r = "   [92.132.203 ip*92.132.203.224 ] 
(port=4461 helo=[homesteaders]) by 195.47.42 ip*195.47.42.5    esmtp id 
2142659851detailing23659   [EMAIL PROTECTED];     195.47.42 ip*195.47.42.5  
([195.47.42 ip*195.47.42.5 ]) by server1.aladan.net (8.13.1/8.13.1)         
<[EMAIL PROTECTED]>; "
Segmentation fault

If I move the bayes_toks file out of the way it doesn't crash - I could 
accept that it's a corrupt file but I get the same result with 2 
separate systems so perhaps something is causing the toks file to become
bogus.

I am running SA v3.0.4 on both systems. One system is FreeBSD 4.11 with 
Perl 5.6.2, and the other is FreeBSD 5.4 with Perl 5.8.7 (both built from
ports)

Any help greatly appreciated!
Thanks.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Attachment: mmlearn
Description: application/shellscript

#!/usr/bin/env PYTHONPATH=/usr/local/mailman/pythonlib:/usr/local/mailman python
import sys
import os
import pickle
import email
from Mailman.mm_cfg import DATA_DIR

if len(sys.argv) < 3:
        print 'Incorrect usage'
        print ' %s mbox pickle [pickle ...]' % sys.argv[0]
        sys.exit(1)

if sys.argv[1] == '-':
        mbox = sys.stdout
else:
        mbox = open(sys.argv[1], 'w')

for filename in sys.argv[2:]:
    if filename.endswith('.pck'):
        msg = pickle.load(open(filename, 'rb')).as_string(unixfrom=True)
    else:
        msg = open(filename, 'r').read()
    mbox.write(msg)
    if msg[-1] != '\n': mbox.write('\n')
    mbox.write('\n')
mbox.close()

Attachment: crash-salearn.tbz
Description: BZip2 compressed data

Attachment: pgphNvIZBv4Il.pgp
Description: PGP signature

Reply via email to