Re: procmail error or mine?
From: "Gene Heskett" <[EMAIL PROTECTED]> ===8<--- PROCMAILMATCH="X-Procmail: Matched on" PROCMAILHEADER="X-Procmail: " :0 fw * ^List-Id: .*(spamassassin\.apache.\org) | formail -A "$PROCMAILHEADER an SA list. Mail not processed." | :0 fw * ^TO_:.*([EMAIL PROTECTED]|users\.spamassassin\.apache\ .org) | formail -A "$PROCMAILMATCH SpamAssassin Users list" -i "Reply-to: users@spamassassin.apache.org" ===8<--- Its a direct case of plagerism Joanne since its your script :) These two lines are important. I may have left them out of some of the snips I sent you. PROCMAILMATCH="X-Procmail: Matched on" PROCMAILHEADER="X-Procmail: " {^_^} (The fellow I got them from left them out when he sent me the procmail rules. The problem's contagious, I suspect.)
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
From: "Matt Kettler" <[EMAIL PROTECTED]> I'm thinking of something like: score URIBL_SURBL 2.0 score URIBL_AB_SURBL 1.812 score URIBL_JP_SURBL 2.087 score URIBL_OB_SURBL 1.008 score URIBL_PH_SURBL 0.800 score URIBL_SC_SURBL 2.498 score URIBL_WS_SURBL 0.140 Whereas I am thinking of increasing some of the scores over all. {o.o} Each individual list maintains it's score total. But the additive effects are more limited. You could still go over 5 by hitting the highest scoring ones, but double- WS+OB would total 3.1 instead of 5.1. You'd still get a 6.585 for a double SC+JP, and >5 for many other 2-list combinations, but you wouldn't be so far over the line that bayes couldn't fix the occasional FP. Right now JP+SC scores 8.585, which even BAYES_00 can't bring back down under the 5.0 line. I trust the URIBLs a lot, I think they're great. But I don't trust them so much that two of them should be able to over-ride BAYES_00 without any other spam rules firing. I'm content with the scores as is BECAUSE they override Bayes_00. The really short virtually URI only spams get rather low Bayes scores as a general rule. So having the BLs toss them over the top is a good thing as I see it here. So your proposed solution has its own problems. At the very least you should explore the solution you propose to see if you can make it work. (By the way, is it users complaining or the folks at some of the vendor sites? I've been presuming it is user complaints that are Plaguing you.) It's user complaints.. not to mention my own mail ending up in the spam bin.. Both posted examples came from *MY* personal email, not my users. I'm sick of having to fetch conference announcements, product announcements, etc out of my spam bin. I shouldn't have to add whitelist rules for every vendor I do business with because some of the URIBLs are getting over-zealous. I can tell that. And I'm not sure of any given solution. Although I hope I remember to vamp on a vague idea I have below. It'd be a new concept, "whitelist_from_rcvd_score". That would be whitelist_from_rcvd with a custom score per whitelist entry. Couple this with a "don't complain to me, load it into this mailbox and it'll be fixed automagically" script and mailbox to get something that might work OK. The idea would need a lot of development and some level of per user rules to make it work. For an ISP I can see a lot of problems with the concept of "fully automatic." But if someone vets the account info maybe it'd be OK. I also note that with the "winterizewithscotts" site the company made a very logical and fatal mistake.s. Fair enough, I can see HOW the domain got listed by mistake. However, is a system which is prone to mistakes worth 6.008 points (default score for OB in SA 3.1.0 + uribl.com's suggested 3.0 score for URIBL_BLACK). Yes, given the fact that the mistakes seem to be very hard to make and one presumes some actual checking on the complaints. Sure I can customize. But I'm creating this public discussion on the list not to bash the URIBLs, but to get people thinking about better ways to score them to avoid score-inflation when FPs happen, while still keeping the spam scores reasonable. I was hoping some SOLUTIONS would come out. So far most of what I've gotten is a bunch of defensive garbage denying it ever happens, or trying to explain away the problem without giving it any serious consideration. Most of us do not see any problem, it would appear. I am SURE that any comparison between your site and mine is utterly spurious. Yours is a rather large ISP setting, I believe. Mine's two lonely little users tucked away at the end of some DSL wire out in South Eastern San Bernardino County using Earthlink as our ISP. (Too bad "ISP For Two" doesn't scan with "Tea For Two." There's a temptation to filk. {^_-}) Now lemme see on that whitelist_from_rcvd_scored vamping. I'll TRY to look at it from an ISP view with some per user rules capability. (Without the per user rules capability some means of tweaking the offending BL rules to flatten out the maximum score in a dynamic basis is needed.) Suppose each user can forward email through your authenticated smtp server to a "this ain't spam" analysis tool. It takes apart the message and looks at the scores. This may require the spam as attachment feature for SA so that the "original" makes it through as well as all the scoring. The scoring is dissected and the either the BLs that hit are downscored very slightly or the whitelist_from_rcvd_scored entry is added with a modest score "barely" sufficient to keep the item from being scored as spam. The user gets an entry that is guaranteed to prevent the message being scored as spam. AND a global entry is added that subtracts about 1/5th the range needed to make the message not spam is created. Each time a similar complaint comes from another user, not the same one, the whitelist level is cranked up by the initial delta amount until it no longer bugs p
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
>List Mail User wrote: >>> winterizewithscotts.com >>> >>> Scott's lawncare registered user updates. >>> >>> >> Matt, >> >> winterizewithscotts.com looks like a case of "affiliate" spamming or >> misuse of "sweepstakes" entries. >> See: >> http://forums.gottadeal.com/archive/index.php/t-14640.html >> http://forums.gottadeal.com/archive/index.php/t-13473.html >> http://www.acohardware.com/673.html >> > >One problem Paul.. There's no ID associated with any of the links in >those articles. There's no way for those articles to be useful spam, as >there's no way to track back who the affiliate was, so there's no >benefit to the affiliate. (unless of course they happen to own a >hardware store that sells fertilizers.) > > >I sincerely doubt your theory, at least with the evidence presented. If >you can find a link to winterizewithscotts that's got some kind of >affiliate tracker, I might believe you. > >Also, I'll point out a google-groups search for this domain name returns >0 hits. Therefore, no NANAS or NANAE reports. > >http://groups.google.com/groups?q=winterizewithscotts&hl=en&lr=&c2coff=1&safe=off&filter=0&sa=N&tab=wg > >If this was being spammed by an affiliate abuser, there'd be more >evidence of it than a couple of posts about it on forums and hardware >store websites with no affiliate-tracker. > > In each case, normal HTML gives a "referrer" page, so no affiliate ID is needed. Also, while some of those links will *still* take you to a "sweepstakes" page, the site winterizewithscotts.com has a simple page which note that the "promotion" is over (!) and a link to the main "Scotts" page. Clearly, this page had better not be the method for communicating with registered users, unless Scotts has just dropped all "support". I suggest you look at the three pages: http://www.winterizewithscotts.com/ http://www.winterizewithscotts.com/index.tbapp?page=intro and read http://www.acohardware.com/673.html The last of which is indeed a hardware store chain that *does* sell Scotts' fertilizers - exactly as you suggest is possible! Then check the link: http://groups.google.com/groups?q=acohardware&hl=en&lr=&c2coff=1&safe=off&filter=0&sa=N&tab=wg 2 NANAS Which doesn't show acohardware.com spamming itself, but does show it being forged by zombie Cialis spammers (looks like Yambo or maybe Mankani), which could easily lead someone to visit their page and sign up for the now ended Scotts' promotion as I had posited. Paul Shupak [EMAIL PROTECTED]
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
From: "List Mail User" <[EMAIL PROTECTED]> >List Mail User wrote: winterizewithscotts.com Scott's lawncare registered user updates. Matt, winterizewithscotts.com looks like a case of "affiliate" spamming or misuse of "sweepstakes" entries. See: http://forums.gottadeal.com/archive/index.php/t-14640.html http://forums.gottadeal.com/archive/index.php/t-13473.html http://www.acohardware.com/673.html One problem Paul.. There's no ID associated with any of the links in those articles. There's no way for those articles to be useful spam, as there's no way to track back who the affiliate was, so there's no benefit to the affiliate. (unless of course they happen to own a hardware store that sells fertilizers.) I sincerely doubt your theory, at least with the evidence presented. If you can find a link to winterizewithscotts that's got some kind of affiliate tracker, I might believe you. Also, I'll point out a google-groups search for this domain name returns 0 hits. Therefore, no NANAS or NANAE reports. http://groups.google.com/groups?q=winterizewithscotts&hl=en&lr=&c2coff=1&safe=off&filter=0&sa=N&tab=wg If this was being spammed by an affiliate abuser, there'd be more evidence of it than a couple of posts about it on forums and hardware store websites with no affiliate-tracker. In each case, normal HTML gives a "referrer" page, so no affiliate ID is needed. Also, while some of those links will *still* take you to a "sweepstakes" page, the site winterizewithscotts.com has a simple page which note that the "promotion" is over (!) and a link to the main "Scotts" page. Clearly, this page had better not be the method for communicating with registered users, unless Scotts has just dropped all "support". I suggest you look at the three pages: http://www.winterizewithscotts.com/ http://www.winterizewithscotts.com/index.tbapp?page=intro OK, the second one of those links to the mail-in coupon. It clearly says if you give an email address you will receive specific information directly related to the contest via email. Otherwise it will be by snail mail. It does NOT say you will receive other information as well. So subsequent mailings should constitute a spam message if they are not directly related to winning the contest. It is easy to see how they could have become listed and why they should have been. {^_^}
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
List Mail User wrote: >> > In each case, normal HTML gives a "referrer" page, so no affiliate > ID is needed. Paul.. None of those pages contain a link. The user would have to copy-paste or hand-type the url. That would defeat any referrer mechanism. (more extensive commentary directed off-list)
Re: Over-scoring of SURBL lists...
On Sun, Feb 19, 2006 at 02:20:05AM -0500, Matt Kettler wrote: > >> How can we keep the spam tagged, and try to mitigate the FPs by keeping > >> additive scores for multiple URIBLs more moderate? +20 worth of URIBL > >> hits is fine on spam, but astronomically high scores don't really help > >> SA when the tagging threshold is +5. However, they do hurt SA when > >> overlapping mistakes happen. > > Yes.. which is exactly who I was primarily trying to reach by posting > here on the spamassassin, before this turned into a large > misunderstanding between the URIBL operators and myself. I have two things related to this: 1- if the lists are indeed separate (ie: different sources, etc,) then having multiple rules makes sense. 2- the end result when generating scores is only as good as the input provided. so if your specific flow of mail has a lot of FPs for various rules, you ought to get involved in at least the score generation mass-check runs, and preferably the nightly runs (so we'd be able to deal with FPs and such during development instead of just by adjusting the score). this philosophy hasn't changed much in the time I've been working with SA. during score generation, rules that commonly hit the same mails (ie: high overlap) tend to have lower individual scores. this is especially true if the ham hit rate is non-zero on the rules. however, if you look at the STATISTICS* files, the SURBL rules all have a fairly low ham rate which led the perceptron to give the rules higher-than-average scores. so my guess is that the perceptron didn't see the issue that has been discussed, or didn't see it enough to have a large impact on scores (attempts are made to lower the FP rate, but a >0 rate is still likely.) related to this, I mentioned earlier in the thread about a bug I found in the reuse section of mass-check while generating some statistics. we used the reuse code to generate the 3.1 scores. however, due to the bug, rule hits were lost. so it's hard to say exactly what occured because of it, but the scores generated for network tests (those that enabled reuse anyway) are almost definitely miscalculated, and potentially very miscalculated (see the same previous post about the "way different" SURBL WS rule hits that I found). We're trying to get updates going for 3.1, and I'm hoping to get scores generated more frequently after that's setup. Perhaps the next set of scores will address your issue more directly? Is the problem more that in the past there weren't a large number of FPs and now there are? -- Randomly Generated Tagline: "My job is like an airplane pilot's -- when I'm doing it well, you might not even notice me, but my mistakes are often quite spectacular." - Unknown pgpSotyUoYb4W.pgp Description: PGP signature
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
>... > Matt, >> In each case, normal HTML gives a "referrer" page, so no affiliate >> ID is needed. > >Paul.. None of those pages contain a link. The user would have to >copy-paste or hand-type the url. That would defeat any referrer mechanism. Also, whether cut&paste generates a referral all depends on your browser and the setting used in some (e.g. Opera). > >Short of hiring the psychic friends network scotts does not have a way >of tracking these to back to an affiliate. Period. Nope; And if I can find them 6 months later in one chain's pages, would you link to bet they were also referenced by Orchard Supply, Home Depot, Lowes and dozens of other large chains and smaller businesses also. *And* some of these chains may well have used links which resulted in a referral from any browser. Beside, the referral is not needed to profit from the reference - As you pointed out yourself, a store selling the products gets its profit directly from the sale (they are still "affiliates", check the definition is the UCC); So who cares (including Scotts) if they have a method to track back to the referral (BTW. The receipt needed for any rebate would give Scotts that data - no psychic needed). > >This is NOT an affiliate spam. Stop making things up that are clearly >impossible. > >1) There is no tracker ID's, so it can't be tracked that way. >2) the examples posted aren't even anchor-tagged so a user would have to >re-enter it defeating any referrer mechansims. See above. >3) scotts itself sent out emails containing the link, so it's not an >"affiliate exclusive" domain. Rarely are "affiliate" domains exclusive - check some of your pill or warez spam in your corpus. Leo Kuvayev spams his domains, so do other people (usually using a "subdirectory" nowadays, not the old '?' method). > >Did you even LOOK at the pages you mentioned??? > > >> Also, while some of those links will *still* take you to a >> "sweepstakes" page, the site winterizewithscotts.com has a simple page >> which note that the "promotion" is over (!) and a link to the main "Scotts" >> page. Clearly, this page had better not be the method for communicating >> with registered users, unless Scotts has just dropped all "support". >> >It's not the primary page, it's an outdated promotion that scotts >included in their regular lawn-care update mail. A mail I subscribe to. > >They don't mention it anymore.. but how'd it get there in the first >place paul? > >> I suggest you look at the three pages: >> >> http://www.winterizewithscotts.com/ >> http://www.winterizewithscotts.com/index.tbapp?page=intro >> >> and read >> >> http://www.acohardware.com/673.html >> >> The last of which is indeed a hardware store chain that *does* sell >> Scotts' fertilizers - exactly as you suggest is possible! >> > >You're right.. Nobody has ever mentioned any product they sell on their >web without being an affiliate spammer. The "spamming" is unintentional, but if they link you directly to either of the sweepstakes or rebate mail-in forms without first offering a change to read the privacy policy, it will/did result in spam. > >> Then check the link: >> >> http://groups.google.com/groups?q=acohardware&hl=en&lr=&c2coff=1&safe=off&filter=0&sa=N&tab=wg >> >> 2 NANAS >> >> Which doesn't show acohardware.com spamming itself, but does show >> it being forged by zombie Cialis spammers (looks like Yambo or maybe >> Mankani), >> which could easily lead someone to visit their page and sign up for the now >> ended Scotts' promotion as I had posited. >> >Yeah, as a FROM ADDRESS!!! Not a URL! You know what that is - So do I; 99% of Internet users haven't any clue. Paste "acohardware.com" into a browser and you go to their web site. Also aren't you willing to consider any "normal" customer of a hardware chain that claims 69 stores might have gone to the site *on purpose*? > >That's not going to lead people to a specific obscure subpage on >acohardware.com's website. Weekly and monthy specials are on the first page - look at the timing of the complaints and the special offers by Scotts and the forgery spam - all in the same month - I don't believe it likely took more than one or two clicks at that time. Right now, today, it takes two clicks to get to the Scotts specific page at that site *and* there are no promotions or specials. Also, notice that the monthly specials will print "web coupons" with a single click over the item - It seems would have been easy to print the Scotts' rebate or sweepstakes coupon without ever having any chance to see a copy of the "privacy policy" contained on Scotts' own site. > >Paul. What's with the wild leaps of logic here?? Have you fallen >completely off your rocker? > >What kind of crack-smoking half-assed theory is this shit? > Come on now; Both pages at: http://forums.gottadeal.com/archive/index.php/t-13473.html http://fo
Exiscan + subject rewrite not working
I looked this up and can't see where I'm doing anything wrong, but the subject is not being rewritten. Some relevant data: /etc/mail/spamassassin/local.cf # How many hits before a message is considered spam. required_score 3.0 # Change the subject of suspected spam rewrite_header Subject *SPAM* Changing the required score value shows up in the mail headers. /etc/sysconfig/spamassassin SPAMDOPTIONS="-d -m5 -H/var/spool/spamd " I elimininated the -c option because of the following errors. Feb 18 20:24:59 WBServer spamd[18388]: info: setuid to nobody succeeded Feb 18 20:24:59 WBServer spamd[18388]: Creating default_prefs [//.spamassassin/user_prefs] Feb 18 20:24:59 WBServer spamd[18388]: Cannot write to //.spamassassin/user_prefs: No such file or directory Feb 18 20:24:59 WBServer spamd[18388]: Couldn't create readable default_prefs for [//.spamassassin/user_prefs] I'm not sure whether this has anything to do with the problem or not. I don't see where the setuid to nobody comes from. I added the /var/spool/spamd after the -H option and made spamd & the .spamassassin subdirectory world writeable to see if this would eliminate the error but it didn't. >From exiscan.conf acl_check_content: # Reject virus infested messages. # deny message = This message contains malware ($malware_name) #malware = * # Messages larger than 200k are accepted without spam scanning to reduce spamd load accept condition = ${if >{$message_size}{150k}{true}} # Always add X-Spam-Score and X-Spam-Report headers, using SA system- wide settings # (user "nobody"), no matter if over threshold or not. warn message = X-Spam-Score: $spam_score ($spam_bar) spam = sa:true warn message = X-Spam-Report: $spam_report spam = sa:true # Add X-Spam-Flag if spam is over system-wide threshold warn message = X-Spam-Flag: YES spam = sa # Reject spam messages with score over 10, using an extra condition. deny message = This message scored $spam_score points. Congratulations! spam = sa:true condition = ${if >{$spam_score_int}{100}{1}{0}} ## # Rewrite subject if email scored between 5 and 10. ## # For the subject tag, we prepare a new subject header in the # ACL, then swap it with the original Subject in the system filter. warn message =X-Exiscan-SA-New-Subject: ***SPAM***$h_subject condition = ${if >{$spam_score_int}{50}{1}{0}} # finally accept all the rest accept SpamAssassin version is 3.0.1, perl is 5.8.5, Exim is 4.4.3, exiscan patch is patch revision 28.
Re: Over-scoring of SURBL lists...
Hi! Yes, but the frequency of overlap in nonspam that I'm seeing at my site is disturbing. I've posted examples of this, and they keep getting ignored. You posted overlap in URIBL and SURBL, thats the same as posting overlap inside Spamcop and Spamhaus... This IS a real problem. I am not speculating. I've posted two real domains on this list that have had the problem for me in the past 7 days. ultraedit-updates.com: OB + uribl black (delisted from both at my request) winterizewithscotts.com: OB + uribl black (I have intentionally NOT submitted a delist request for this domain) Its two different list, URIBL is not even part of the SA disctibution as far as i know. The scoring is done based on several rules, and URIBL wasnt one of them. Its reather silly to say look i added another URIBL and now scoring is weird and twisted. What level of FP problem is going to make you guys wake up on this? Do I have to come up with a list of 1000 domains before you'll accept reality? Nonspam overlap in the URBLS is a real problem. No really, it's real. I'm not making this up. I see OB and URIBL, i dont see any OB+JP or any other sublist inside SURBL. It might sound childish but its not. We see the reports comming in daily, and we really dont share your feelings. Based on statistics i also say, its non existing. If you guys want to continue to pretend the problem doesn't exist, so be it. I'll adjust my own scores. But don't say I didn't warn you when you get bit in the ass by this later. You made your point. That some other people dont share your feelings doesnt mean you are not heard. It most likely means people have different opinions about this subject. But why do they so commonly overlap in NONSPAM too? And why does nobody care? Why does everyone insist the problem doesn't exist in spite of examples to the contrary. Again, those examples are NOT showing me there is FP overlap inside SURBL. SURBL and URIBL are two different things. At my site, the URBLs overlap. Period. They have overlapping spam hits, and they have overlapping FPs. The first half isn't a problem, but the second half IS. At my site RBLs overlap also. I think your main problem as i can tell is your score for both OB and URIBL. OB has a lot of FP's, and i tend to say it increased over the last months. When we look at the number of reports we get in on the whitelist alias. URIBL FP rate has always been high. Its up to you how you score it. Its not part of SA (yet) and scoring is based on your own personal feeling, right? I've seen nonspam overlaps between URIBL, WS, OB and JP in all sorts of random mixtures. Most commonly URIBL_BLACK + OB or URIBL_BLACK+WS, but I've seen some WS+OB and OB+JP before too. Its really been some time now that we got in a whitelist request with JP involved. I dont say we never get them, but its rather rare. With around 250.000 domains added its not really suprisingly theres overlap. And if you like, submit FP's to [EMAIL PROTECTED] So we can sort them. Thanks! Raymond.
Updated Pump and Dump rules. 2006-02-18
I just committed version 01.00.06 of this ruleset to: http://rulesemporium.com/rules/70_sare_stocks.cf It should appear within the hour. Enjoy. -Doc (SA/SARE/URIBL/SURBL -- Ninja)
Re: procmail error or mine?
On Sunday 19 February 2006 03:45, jdow wrote: >From: "Gene Heskett" <[EMAIL PROTECTED]> > >>>===8<--- >>>PROCMAILMATCH="X-Procmail: Matched on" >>>PROCMAILHEADER="X-Procmail: " >>> >>>:0 fw >>> >>>* ^List-Id: .*(spamassassin\.apache.\org) >>> >>>| formail -A "$PROCMAILHEADER an SA list. Mail not processed." >>>| >>>:0 fw >>> >>>* >>> ^TO_:.*([EMAIL PROTECTED]|users\.spamassassin\.apach >>>e\ .org) >>> >>>| formail -A "$PROCMAILMATCH SpamAssassin Users list" -i "Reply-to: >>> >>>users@spamassassin.apache.org" >>>===8<--- >> >> Its a direct case of plagerism Joanne since its your script :) > >These two lines are important. I may have left them out of some of >the snips I sent you. > >PROCMAILMATCH="X-Procmail: Matched on" >PROCMAILHEADER="X-Procmail: " > >{^_^} (The fellow I got them from left them out when he sent me the >procmail rules. The problem's contagious, I suspect.) See my original posting Joanne, they are indeed there, at the top. :) -- Cheers, Gene People having trouble with vz bouncing email to me should add the word 'online' between the 'verizon', and the dot which bypasses vz's stupid bounce rules. I do use spamassassin too. :-) Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2006 by Maurice Eugene Heskett, all rights reserved.
Re: A Spam Message That Got Through!
Evan Platt a écrit : > At 10:48 PM 2/17/2006, you wrote: > >> Today I got a spam message which seems, at least for a newbie like me, >> succeeded in passing SA for some reason! >> >> I'm calling SA through amavisd-new and have my Rules Du Jour updated >> (manual updates so far) >> >> I would like to block such messages therefore, I'm seeking your kind >> assistance in determining how it passed the "tests" and what am I >> supposed to do in order to prevent these messages? > > > If you'd like to *BLOCK* such messages, use the postfix header_checks to > block any mail that is both from AND to you. (I can't help you with the > syntax there, but I'm sure someone else can. > > he can't. postfix header_checks look at headers one at a time. he can use procmail/maildrop/... etc. whether this is safe is another issue.
Re: A Spam Message That Got Through!
Yousef Raffah a écrit : > Received: from emailmarketingmasters.com (i538754C0.versanet.de > [83.135.84.192]) by kansai.savoladns.com (Postfix) with SMTP id > 7B42810073 for <[EMAIL PROTECTED]>; Fri, 17 Feb 2006 18:43:21 +0300 (AST) you could - use njabl's dynablock to block the client (83.135.84.192). (Don't use sorbs). - reject helo when it's "emailmarketingmasters.com" - greylist clients that match /\d{5}/ (don't block as this is unsafe).
Re: Exiscan + subject rewrite not working
On Sun, 19 Feb 2006, Terry Miller wrote: > I looked this up and can't see where I'm doing anything wrong, but the > subject is not being rewritten. You should probably ask this question on the exim-users list. I suspect (but I am not certain) that exiscan doesn't support the message rewrite parts of SpamAssassin, and instead relies on Exim's own facilities for this.. Tony. -- f.a.n.finch <[EMAIL PROTECTED]> http://dotat.at/ BISCAY: WEST 5 OR 6 BECOMING VARIABLE 3 OR 4. SHOWERS AT FIRST. MODERATE OR GOOD.
Re: getmail?
On Thu, 09 Feb 2006, Gene Heskett spake: >>From re-reading a 'man fetchmail' I don't see the fileing ability. It > only presents it to localhost:25 and apparently sendmail takes it from > there. The comm thru port 25 is apparently bilateral as it can be told ?! Definitely not. > to summarily delete unwanted mail from the server, while sendmail at That's the *POP3* server. > the some time is deleting its copy. sendmail is extremely careful to never `delete a copy' of mail (I'd call that `losing mail'). It simply takes it and relays or delivers it. >>- run an MDA. so you could run procmail or maildrop or a (correct) >>script. In short, fetchmail runs a command (it pipes the message). > > eg sendmail?, which is running here. It only does that in extremis if a local SMTP server isn't running. > But, here is the headache: At no place in the various files sitting > in /etc/mail that serve to configure sendmail, is there an example of > how to configure sendmail to make use of these feature facilities. That's why there's a vast book covering it in agonizing detail, and docs in the source tree --- and probably in /usr/share/doc or wherever your distro keeps them --- explaining how to use the .mc M4 macro-expander to generate a configuration file. You probably need *none* of this, though: on virtually all Linux distros, procmail is invoked by sendmail to do delivery, so you can just put everything in .procmailrc. > Spamassassin 3.10 contains only very scant references to using it with > sendmail, apparently sanctioning only the procmail interface, which in > turn then is set to call spamc or spamassassin, adding needless time > wasting cpu cycles to what should be a pretty simple job. I fail to procmail is so lightweight that I was running it on a 386/25 a decade ago and not noticing the CPU hit. Just ignore it. Worrying about the CPU hit of the local delivery agent on a box that must run a vast CPU and memory hog like spamd is... quixotic, to say the least. > understand why (although it will take smarter people than me what with > sendmails configuration complexity) there is no readily published > recipe for incorporating spamc into the sendmail processing chain, > either by pipeing, or when the libmilter feature is there? Well, there is spamass-milter, which is fairly frequently mentioned on this list. -- `... follow the bouncing internment camps.' --- Peter da Silva
Re: Question on long scan times - 4500 seconds with a blip at 300.
Kevin Gagel: >> I'm finding that scans are taking as long as 798 seconds Daryl O'Shea: > Scan times of 798 seconds are probably a result of a bayes expiry. If > auto expiry is enabled (default) I'd disable it and run a manually > expiry as a cron job. This is a tally of SA scan times at my site from January 14th, rounded down to the nearest 100 seconds, ignoring those below 1000 seconds: 1000 30 1100 41 1200 14 1300 5 3000 3 3100 35 3200 11 4400 1 4500 3 Those are combined figures from two load-balanced spamd hosts. Both hosts exhibited similar behaviour, so the problem is caused by something they share: a flood of incoming email, MySQL Bayes, or DNS lookups. I doubt it's Bayes because I have per-user stats, which makes the expiry runs rare and quick. And the RBL timeout is 10 seconds. More interestingly, here are the times and counts for all the scan times under 1000 seconds on that same day: 100-140 10 300-310 128 950-980 5 So what is about the five minute mark that is so attractive to SA? I'm going to start putting "gosh this is taking a long time" debugging into spamd, but I would love to hear from anyone who has any ideas about it. Thanks in advance! -- _ Andrew Donkin Waikato University, Hamilton, New Zealand
Re: Over-scoring of SURBL lists...
Hi! And again, it's not the over-lap in-and-of-itself that's a problem. It's when the overlap matches nonspam that problems occur. I don't have any nonspam samples onhand with surbl overlap. Only surbl/uribl overlap. We get reports allmost daily, most of them are only listed in one single list. So i dont share the feelings its a bad idea to overscore... I can safely say its rather rare we get a FP notification with lets say 3-4 seperate lists involved... I don't think it's bad to overscore spam. I've never said that. I think it's REALLY bad to overscore rules which have potential to "group hit" nonspam. Potentionally *ALL* hits and rules van hit nonspam. The question is, do they, or is it working just fine as it is now and are we having a discussion for just the fuzz? Bye, Raymond.
ApacheCon EU 2006 (fwd)
I will, of course, be there ;) --j. --- Forwarded Message Date:Fri, 17 Feb 2006 15:29:10 -0500 From:Rich Bowen <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: ApacheCon EU 2006 The ApacheCon Planners are pleased to announce that ApacheCon Europe 2006 will be held in Dublin, Ireland, at the Burlington Hotel (http://www.jurysdoyle.com/ireland/doyle_burlington.htm), June 26-30. Further details to follow as they are available. CFP to follow shortly. Please feel free to spread this information far and wide. -- Rich Bowen [EMAIL PROTECTED] --- End of Forwarded Message
Re: procmail error or mine?
From: "Gene Heskett" <[EMAIL PROTECTED]> On Sunday 19 February 2006 03:45, jdow wrote: From: "Gene Heskett" <[EMAIL PROTECTED]> ===8<--- PROCMAILMATCH="X-Procmail: Matched on" PROCMAILHEADER="X-Procmail: " :0 fw * ^List-Id: .*(spamassassin\.apache.\org) | formail -A "$PROCMAILHEADER an SA list. Mail not processed." | :0 fw * ^TO_:.*([EMAIL PROTECTED]|users\.spamassassin\.apach e\ .org) | formail -A "$PROCMAILMATCH SpamAssassin Users list" -i "Reply-to: users@spamassassin.apache.org" ===8<--- Its a direct case of plagerism Joanne since its your script :) These two lines are important. I may have left them out of some of the snips I sent you. PROCMAILMATCH="X-Procmail: Matched on" PROCMAILHEADER="X-Procmail: " {^_^} (The fellow I got them from left them out when he sent me the procmail rules. The problem's contagious, I suspect.) See my original posting Joanne, they are indeed there, at the top. :) Then I don't know - ask the procmail folks. {^_^}
Re: Question on long scan times - 4500 seconds with a blip at 300.
From: "Andrew Donkin" <[EMAIL PROTECTED]> Kevin Gagel: I'm finding that scans are taking as long as 798 seconds Daryl O'Shea: Scan times of 798 seconds are probably a result of a bayes expiry. If auto expiry is enabled (default) I'd disable it and run a manually expiry as a cron job. This is a tally of SA scan times at my site from January 14th, rounded down to the nearest 100 seconds, ignoring those below 1000 seconds: 1000 30 1100 41 1200 14 1300 5 3000 3 3100 35 3200 11 4400 1 4500 3 Those are combined figures from two load-balanced spamd hosts. Both hosts exhibited similar behaviour, so the problem is caused by something they share: a flood of incoming email, MySQL Bayes, or DNS lookups. I doubt it's Bayes because I have per-user stats, which makes the expiry runs rare and quick. And the RBL timeout is 10 seconds. More interestingly, here are the times and counts for all the scan times under 1000 seconds on that same day: 100-140 10 300-310 128 950-980 5 So what is about the five minute mark that is so attractive to SA? None under 100? I'm going to start putting "gosh this is taking a long time" debugging into spamd, but I would love to hear from anyone who has any ideas about it. That length of time is suggestive of DNS timeouts. Are you actually able to reach all the BL DNS servers you are trying to use? {^_^}
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
List Mail User wrote: > Paul.. None of those pages contain a link. The user would have to > >copy-paste or hand-type the url. That would defeat any referrer mechanism. > > > Also, whether cut&paste generates a referral all depends on your > browser and the setting used in some (e.g. Opera). > > Yeah, provided they re-enter it in the same window, and their browser is configured to generate a referrer for that, it would work. But that's hardly very reliable, so most affiliate systems use trackers. It makes life easier on the company paying out marketing fees. > Short of hiring the psychic friends network scotts does not have a way > >of tracking these to back to an affiliate. Period. > > > Nope; And if I can find them 6 months later in one chain's pages, > would you link to bet they were also referenced by Orchard Supply, Home > Depot, Lowes and dozens of other large chains and smaller businesses also. Oh my god.. and next, these stores might put up the products they sell on their website!!! Oh my god! Affiliate marketing at it's finest! I bet if you go to lowes.com you can find a link to buy a bag of Scotts fertilizer on! http://www.lowes.com/lowes/lkn?action=productDetail&productId=0-446-36615&lpage=none See! It exists! Lowes is an affiliate spammer! So What? I'd assume every hardware store in the NATION had a link or mention of winterizewithscotts on their website, and on standees on their store floor next to the bags of fertilizer. > At this point, I no longer believe that the winterizewithscotts > case is a FP, Wait.. Let me get this straight. Scenario 1:) A hardware store hires a spammer to forge his stores domain name as the From: address of the spam advertising viagra. He's hoping this directs more consumers to his website by them copy-pasting the domain from the From: address of a spam into a web browser. When they get there, he hopes they'll find the promotion for the scott's sweepstakes. Based on that, he hopes they'll order the fertilizer from him, in order to enter the sweepstakes. Thereby making a profit. And this is more likely than a regular store owner who has mention of a sweepstakes on his site that happened to get randomly Joe-Jobbed by a spammer. Or is it merely the fact that the store owner mentions a sweepstakes on his website that makes him an affiliate spammer, without regard for any email ever being sent? Secnario 2) Someone posts a URL to a sweepstakes on a web forum. He has no links through which anyone can buy the product, so no chance of profit by sale. His URL contains no IDs and is not linked. At best, a copy-paste of the URL might create a referrer back to the forum, not any site he operates. Based on that, scotts company is going to track the referrer back to the fourum, and understand what person posted the URL and pay them for it? And this is more likely than just some excited average joe posting on the web because he found a way to register without buying any fertilizer? And based on those two scenarios, you've solidly concluded it's not a FP? I'm sorry, this discussion is over and I will take no further part. You're 100% convinced that this is a grandiose affiliate marketing scheme. I'm sorry, Those two scenarios are WAY too far fetched for me. You may as well have said that Elvis is alive and helping them along with Jimmy Hoffa, the aliens, and that the CIA is covering the whole thing up. It is equally plausible to me. Perhaps you should go work for the tabloids. If "proof" like that is convincing to you they should be able to find you a great job in "investigative reporting".
Re: Over-scoring of SURBL lists...
Theo Van Dinter wrote: > On Sun, Feb 19, 2006 at 02:20:05AM -0500, Matt Kettler wrote: > How can we keep the spam tagged, and try to mitigate the FPs by keeping additive scores for multiple URIBLs more moderate? +20 worth of URIBL hits is fine on spam, but astronomically high scores don't really help SA when the tagging threshold is +5. However, they do hurt SA when overlapping mistakes happen. >> Yes.. which is exactly who I was primarily trying to reach by posting >> here on the spamassassin, before this turned into a large >> misunderstanding between the URIBL operators and myself. >> > > I have two things related to this: > > 1- if the lists are indeed separate (ie: different sources, etc,) >then having multiple rules makes sense. > They're about 95% separate.. They're all separately maintained, and have a lot of different approaches to making sure a listing is valid. I don't think there's any direct cross-feeds where one spamtrap operator feeds their trap data multiple lists. However, there's some potential for duplicate input because of the end-user-reporting. Take Joe user, who gets a message he considers spam. He runs spamassassin -r on it, reporting the message to spamcop, and Razor (e8 is uri based, so relevant here. Pyzor, and DCC will also be reported, but less relevant). The Spamcop report would require multiple reports, but if it happens that feeds into SC and AB, which then re-check theURIs. He then pulls out a few URIs, and manualy reports them to URIBL. He then goes to rulesemporium.com and reports it to WS. If he's got an outblaze account, he could also report to OB. All of the above have differing degrees of check to make sure the link isn't a false report. So you need to have multiple failures occur in order for FPs to happen. But I found two examples in a search of 100 nonspam emails at work an 218 at home. Admittedly these were examples on separate sites, and with two lists which are generaly high-fp for me, but it shows that failures can cascade. While this is an "extreme" case, most of these lists have user reports as a small percent of their total input, it does show how the same message can have some That's why I'm suggesting we consider a base+offset approach to surbl. It allows each list to be scored independently, but also allows the perceptron to allocate scores that reflect the overlap. >related to this, I mentioned earlier in the thread about a bug I found >in the reuse section of mass-check while generating some statistics. >we used the reuse code to generate the 3.1 scores. however, due >to the bug, rule hits were lost. so it's hard to say exactly what >occured because of it, but the scores generated for network tests >(those that enabled reuse anyway) are almost definitely miscalculated, >and potentially very miscalculated (see the same previous post about >the "way different" SURBL WS rule hits that I found). > Yeah, that's bad.. What surprises me is the actual magnitude of the results. My own experience is that WS and OB both have FP problems, but they're on about the same level. URIBL_BLACK has at least 10x more FPs than all the surbl hosted lists combined, including WS... But you guys see less. > > We're trying to get updates going for 3.1, and I'm hoping to get scores > generated more frequently after that's setup. Perhaps the next set of > scores will address your issue more directly? Possibly. > Is the problem more that in the past there weren't a large number of FPs and > now there are? > In the past FPs were rare and always confined to one list. In the past 6 months I've seen a dramatic increase in FPs from WS, OB and BLACK.
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
From: "Matt Kettler" <[EMAIL PROTECTED]> List Mail User wrote: Paul.. None of those pages contain a link. The user would have to >copy-paste or hand-type the url. That would defeat any referrer mechanism. Also, whether cut&paste generates a referral all depends on your browser and the setting used in some (e.g. Opera). Yeah, provided they re-enter it in the same window, and their browser is configured to generate a referrer for that, it would work. But that's hardly very reliable, so most affiliate systems use trackers. It makes life easier on the company paying out marketing fees. Short of hiring the psychic friends network scotts does not have a way >of tracking these to back to an affiliate. Period. Nope; And if I can find them 6 months later in one chain's pages, would you link to bet they were also referenced by Orchard Supply, Home Depot, Lowes and dozens of other large chains and smaller businesses also. Oh my god.. and next, these stores might put up the products they sell on their website!!! Oh my god! Affiliate marketing at it's finest! I bet if you go to lowes.com you can find a link to buy a bag of Scotts fertilizer on! http://www.lowes.com/lowes/lkn?action=productDetail&productId=0-446-36615&lpage=none See! It exists! Lowes is an affiliate spammer! So What? I'd assume every hardware store in the NATION had a link or mention of winterizewithscotts on their website, and on standees on their store floor next to the bags of fertilizer. At this point, I no longer believe that the winterizewithscotts case is a FP, Wait.. Let me get this straight. Scenario 1:) A hardware store hires a spammer to forge his stores domain name as the From: address of the spam advertising viagra. He's hoping this directs more consumers to his website by them copy-pasting the domain from the From: address of a spam into a web browser. When they get there, he hopes they'll find the promotion for the scott's sweepstakes. Based on that, he hopes they'll order the fertilizer from him, in order to enter the sweepstakes. Thereby making a profit. And this is more likely than a regular store owner who has mention of a sweepstakes on his site that happened to get randomly Joe-Jobbed by a spammer. Or is it merely the fact that the store owner mentions a sweepstakes on his website that makes him an affiliate spammer, without regard for any email ever being sent? Secnario 2) Someone posts a URL to a sweepstakes on a web forum. He has no links through which anyone can buy the product, so no chance of profit by sale. His URL contains no IDs and is not linked. At best, a copy-paste of the URL might create a referrer back to the forum, not any site he operates. Based on that, scotts company is going to track the referrer back to the fourum, and understand what person posted the URL and pay them for it? And this is more likely than just some excited average joe posting on the web because he found a way to register without buying any fertilizer? And based on those two scenarios, you've solidly concluded it's not a FP? I'm sorry, this discussion is over and I will take no further part. You're 100% convinced that this is a grandiose affiliate marketing scheme. I'm sorry, Those two scenarios are WAY too far fetched for me. You may as well have said that Elvis is alive and helping them along with Jimmy Hoffa, the aliens, and that the CIA is covering the whole thing up. It is equally plausible to me. Perhaps you should go work for the tabloids. If "proof" like that is convincing to you they should be able to find you a great job in "investigative reporting". What it boils down to, Matt, is that it's easy enough to see that a fairly large number of people considered a "within policy" emailing from Scotts from that winterize address considered it to be spam and complained. The upshot of that is "tough". That's the way the world works; and, it's not likely to change. You proposed some alternate scores. Have you run them? Do they improve your overall results or do they make them worse? {^_^}
Re: Over-scoring of SURBL lists...
From: "Matt Kettler" <[EMAIL PROTECTED]> Theo Van Dinter wrote: On Sun, Feb 19, 2006 at 02:20:05AM -0500, Matt Kettler wrote: How can we keep the spam tagged, and try to mitigate the FPs by keeping additive scores for multiple URIBLs more moderate? +20 worth of URIBL hits is fine on spam, but astronomically high scores don't really help SA when the tagging threshold is +5. However, they do hurt SA when overlapping mistakes happen. Yes.. which is exactly who I was primarily trying to reach by posting here on the spamassassin, before this turned into a large misunderstanding between the URIBL operators and myself. I have two things related to this: 1- if the lists are indeed separate (ie: different sources, etc,) then having multiple rules makes sense. They're about 95% separate.. They're all separately maintained, and have a lot of different approaches to making sure a listing is valid. I don't think there's any direct cross-feeds where one spamtrap operator feeds their trap data multiple lists. However, there's some potential for duplicate input because of the end-user-reporting. This is a potential if a list will add a site on the basis of ONE spam report. When it takes ten or twenty or more spam reports then sites will get listed. Your Scotts example is an example of how a large number of people would be likely to consider it to be spam and complain. Upon receiving the complaints even a whois lookup to confirm it was Scotts would not absolve the company for their spam run. Their contest site did not ANYWHERE obvious say that you'd be receiving promotional emailings from Scotts as well as contest data. Thus Scotts DID spam. They got listed. Find a better example. Take Joe user, who gets a message he considers spam. He runs spamassassin -r on it, reporting the message to spamcop, and Razor (e8 is uri based, so relevant here. Pyzor, and DCC will also be reported, but less relevant). The Spamcop report would require multiple reports, but if it happens that feeds into SC and AB, which then re-check theURIs. He then pulls out a few URIs, and manualy reports them to URIBL. He then goes to rulesemporium.com and reports it to WS. If he's got an outblaze account, he could also report to OB. Average user is one of your customers. Do THEY run spamassassin -r? ... That's why I'm suggesting we consider a base+offset approach to surbl. It allows each list to be scored independently, but also allows the perceptron to allocate scores that reflect the overlap. You are suggesting something that may well be valid. What are your testing results from the suggestion? YOU control the scores on your site, in the final analysis. An /etc/mail/spamassassin/ZZZ_local.cf will get parsed last and can override the BL scores. Feed it your score suggestions and report the results. "I think" is interesting. "I tested it and got..." is vastly more compelling and interesting. "I'm from Missouri, show me." And I have found that people do not have to be from Missouri to feel "show me" rather than "it stands to reason" or "it should work better" or "I think." The only time the latter seems to win is when politics are involved. related to this, I mentioned earlier in the thread about a bug I found in the reuse section of mass-check while generating some statistics. we used the reuse code to generate the 3.1 scores. however, due to the bug, rule hits were lost. so it's hard to say exactly what occured because of it, but the scores generated for network tests (those that enabled reuse anyway) are almost definitely miscalculated, and potentially very miscalculated (see the same previous post about the "way different" SURBL WS rule hits that I found). Yeah, that's bad.. What surprises me is the actual magnitude of the results. My own experience is that WS and OB both have FP problems, but they're on about the same level. URIBL_BLACK has at least 10x more FPs than all the surbl hosted lists combined, including WS... But you guys see less. 12URIBL_BLACKB 8988 1.66 10.77 30.770.03 I begin to suspect that this indicates a vast dichotomy between a large ISP experience, or at least your specific environment's experience, and that of smaller machines. It MAY be that the type of folks you deal with are sharing quite different interests from the folks who go for "roll your own" or boutique ISPs. This is worth investigating. It may be that different scoring regimes are required for the different customer bases. I would not in the least be surprised if that is the case. I'd actually be surprised if it is not the case. We're trying to get updates going for 3.1, and I'm hoping to get scores generated more frequently after that's setup. Perhaps the next set of scores will address your issue more directly? Possibly. Is the problem more that in the past there weren't a large number of FPs and now there are? In the past FPs were rare and always confined to
Re: Over-scoring of SURBL lists...
jdow wrote: > > This is a potential if a list will add a site on the basis of ONE > spam report. When it takes ten or twenty or more spam reports then > sites will get listed. Your Scotts example is an example of how a > large number of people would be likely to consider it to be spam > and complain. Upon receiving the complaints even a whois lookup to > confirm it was Scotts would not absolve the company for their spam > run. Their contest site did not ANYWHERE obvious say that you'd be > receiving promotional emailings from Scotts as well as contest data. So question. Can anyone actually produce a promotional mailing containing a winterizewithscotts.com URL in it? Paul has theorized it may have happened. Did it? I've not seen a sample-spam yet. I personally never entered this contest. I got a link to it through their lawn-care-update email service. Something I very much did opt-in for. I've never gotten any other promotional materials from them, other than the newsletter I subscribe to. > Thus Scotts DID spam. They got listed. Find a better example. Did they? Are you sure? Jumping from "Reading with a skeptics mind I can see how their privacy policy could be construed to allow marketing material" is quite a bit different from "Scotts actually did send promotional email to unwitting customers". Quite frankly, the way *I* read the scotts privacy policy, they CANNOT send you promotional materials merely for entering the contest. I'd very much like to see a sample of one, if it really did happen. Anyone? > >> Take Joe user, who gets a message he considers spam. He runs >> spamassassin -r on it, reporting the message to spamcop, and Razor (e8 >> is uri based, so relevant here. Pyzor, and DCC will also be reported, >> but less relevant). The Spamcop report would require multiple reports, >> but if it happens that feeds into SC and AB, which then re-check >> theURIs. He then pulls out a few URIs, and manualy reports them to >> URIBL. He then goes to rulesemporium.com and reports it to WS. If he's >> got an outblaze account, he could also report to OB. > > Average user is one of your customers. Do THEY run spamassassin -r? > I did say it was an extreme example. I'm not talking about the common, I'm talking about what's possible in the worst-case. > ... > >> That's why I'm suggesting we consider a base+offset approach to surbl. >> It allows each list to be scored independently, but also allows the >> perceptron to allocate scores that reflect the overlap. > > You are suggesting something that may well be valid. What are your > testing results from the suggestion? YOU control the scores on your > site, in the final analysis. An /etc/mail/spamassassin/ZZZ_local.cf > will get parsed last and can override the BL scores. Feed it your > score suggestions and report the results. > I fully intend to do so when I'm next at my site. > > Vast increase From one in 100,000 to one in 1,000? That would be > dramatic and it would lead to a multiple list hit overlap issue, as well. > The overlap might be down in the one in 10,000 level. But with a million > mails a day to handle that's 100 complaints, more than any sane ISP would > enjoy handling. I'm processing about 4k messages per day, with about half of that being spam (I'm partially greylisting to reduce my spam totals before SA). I'm seeing enough to find a double-list between uribl and surbl once a week. > > At the moment you are focused on something you see as a sure cure. No, I see it as worthy of consideration. And testing, which I'm already planing on. Right now I'm already testing all those MULTI meta rules with negative scores in-hand (-1.5 for URIBL_OVERLAP, -0.5 for SURBL_MULTI1 and 2, -0.2 for the rest.) I can only test one approach at a time. > You might > be right. Only you are in the position to TEST your proposal. I don't see > anyone here rushing in to take the risk of it being wrong so that you can > point a finger when the idea backfires. (Hey, after 40 years in industry > a person learns about this trick and gets, perhaps, a little overly > cynical from repeated experience. {^_-}) > > You MIGHT also think out of the box. Are there other things that can be > done to mitigate the problem? I suspect there are. They'd require some > tool > construction. If there is somebody on the list wanting some > suggestions for > some perl hacking I can dredge my emails to Matt for some interesting > tool > ideas. Some might directly help Jeff more than Matt while others would > benefit someone in Matt's shoes more than most other folks. That would be interesting too.
Re: Over-scoring of SURBL lists...
On Thursday, February 16, 2006, 9:07:48 PM, Theo Dinter wrote: > I was going to tell you that the stats were real-time, but that's only > true for the SURBL rules. URIBL hits aren't being reused during the > weekly runs since 3.1 doesn't have those rules. > So I did a small tweek and generated my own live results for mails from > the past 2 weeks: > MSECSSPAM% HAM% S/ORANK SCORE NAME > 025247 41170.860 0.000.00 (all messages) > 0.0 85.9794 14.02060.860 0.000.00 (all messages as %) > 35.418 41.1930 0.1.000 0.900.00 URIBL_JP_SURBL > 34.665 40.3177 0.1.000 0.880.00 URIBL_SC_SURBL > 26.069 30.3204 0.1.000 0.800.00 URIBL_AB_SURBL > 28.024 32.5464 0.29150.991 0.610.00 URIBL_OB_SURBL > 48.113 55.7492 1.28730.977 0.550.00 URIBL_BLACK > 0.293 0.3406 0.1.000 0.470.00 URIBL_PH_SURBL > 0.000 0. 0.0.500 0.420.00 URIBL_RED > 0.000 0. 0.0.500 0.420.01 T_URIBL_XS_SURBL > 37.539 42.4763 7.26260.854 0.380.00 URIBL_WS_SURBL > 0.548 0.3446 1.79740.161 0.030.00 URIBL_GREY Should the ham hit rate of WS really be 7%? That seems rather high. May we ask you to please double check that result? Cheers, Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Question on long scan times - 4500 seconds with a blip at 300.
From: "Andrew Donkin" <[EMAIL PROTECTED]> This is a tally of SA scan times at my site from January 14th, rounded down to the nearest 100 seconds, ignoring those below 1000 seconds: 1000 30 1100 41 1200 14 1300 5 3000 3 3100 35 3200 11 4400 1 4500 3 Those are combined figures from two load-balanced spamd hosts. Both hosts exhibited similar behaviour, so the problem is caused by something they share: a flood of incoming email, MySQL Bayes, or DNS lookups. I doubt it's Bayes because I have per-user stats, which makes the expiry runs rare and quick. And the RBL timeout is 10 seconds. More interestingly, here are the times and counts for all the scan times under 1000 seconds on that same day: 100-140 10 300-310 128 950-980 5 Are you using large deprecated rule sets like blacklist.cf (sa-blacklist.cf) and/or bigevil.cf? If so, they should not be used. Gary V _ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
Re: Over-scoring of SURBL lists...
From: "Matt Kettler" <[EMAIL PROTECTED]> jdow wrote: This is a potential if a list will add a site on the basis of ONE spam report. When it takes ten or twenty or more spam reports then sites will get listed. Your Scotts example is an example of how a large number of people would be likely to consider it to be spam and complain. Upon receiving the complaints even a whois lookup to confirm it was Scotts would not absolve the company for their spam run. Their contest site did not ANYWHERE obvious say that you'd be receiving promotional emailings from Scotts as well as contest data. So question. Can anyone actually produce a promotional mailing containing a winterizewithscotts.com URL in it? Are you are proposing multiple BLs listed that site on a whim? Which BLs WILL list based on one complaint? My read is that the complaints obviously happened or the listing would not have happened. I can see why it might have happened. It's annoying that it happens. But I lay it down to overzealous marketdroids rather than overzealous BL folks. Also remember that CAN SPAM not withstanding (and in AOL speak) "NO SANE PERSON EVER RESPONDS TO A TAKE ME OFF THIS LIST ADDRESS FOR A MESSAGE THEY CONSIDER TO BE SPAM." I'll admit to moments of what might be insanity in sending emails to a certain Acura dealer in Houston and to Toyota corporate abuse address after receiving relatively large amounts of "warranty related" spam from that fscking dealer in Houston, a city I have visited maybe twice in my life. I never visited the dealer let alone bought anything there, and I don't even own a Toyota. Paul has theorized it may have happened. Did it? I've not seen a sample-spam yet. If the BL managers kept the offending emails that were relayed to them I expect you're about to be overwhelmed with examples. But you may be in luck. I'm not sure I'd keep the complaints around once I had checked it out. However, after one persistent "problem" you can bet I would and that I'd send a packet of the data to the person complaining about the listing being spurious. I personally never entered this contest. I got a link to it through their lawn-care-update email service. Something I very much did opt-in for. I've never gotten any other promotional materials from them, other than the newsletter I subscribe to. Personally, I kinda know more about spam filters than is healthy for an individual so I almost never opt in to marketing mailing lists. If I do I specifically whitelist them. (If they keep changing sending addresses, as one does, I soon ignore them and let them feed my spam bucket. I take that behavior to indicate they aren't worth reading.) I found that it is generally better to put a bookmark in my browser and when I get bored go visit it. I'm actually more sure to see that than the newsletters. But that's me, a goofy old bi . {^_-} Thus Scotts DID spam. They got listed. Find a better example. Did they? Are you sure? Jumping from "Reading with a skeptics mind I can see how their privacy policy could be construed to allow marketing material" is quite a bit different from "Scotts actually did send promotional email to unwitting customers". Well, wait a moment and view it from the standpoint of someone who received a promotional emailing after signing up for the contest. THEY might view the emailing as spam even if you might not due to using a more rigorous definition. Now, if that is the case how are the BL reviewers supposed to figure all this out? Quite frankly, the way *I* read the scotts privacy policy, they CANNOT send you promotional materials merely for entering the contest. I'd very much like to see a sample of one, if it really did happen. I kinda hope you get an avalanche. But am not sure you'd get it unless Jeff is REALLY annoyed by now. {^_-} Take Joe user, who gets a message he considers spam. He runs spamassassin -r on it, reporting the message to spamcop, and Razor (e8 is uri based, so relevant here. Pyzor, and DCC will also be reported, but less relevant). The Spamcop report would require multiple reports, but if it happens that feeds into SC and AB, which then re-check theURIs. He then pulls out a few URIs, and manualy reports them to URIBL. He then goes to rulesemporium.com and reports it to WS. If he's got an outblaze account, he could also report to OB. Average user is one of your customers. Do THEY run spamassassin -r? I did say it was an extreme example. I'm not talking about the common, I'm talking about what's possible in the worst-case. It would require someone interested enough in lawn care or racing to sign up for the offer or otherwise get suckered into giving Scotts an email address who is also motivated to complain to the BLs, and in fact multiple BLs before a single complaint is issued. I am not sure the set intersection for BL reporters and lawn care enthusiasts is all that large. And it would have to be large enough to trigger the mark as spam thresholds at the BL listers. If
RE: Exiscan + subject rewrite not working
You're right. You have to set a system filter, documentation is a little sparse on that. This link was helpful http://ws.edu.isoc.org/workshops/2004/CEDIA/presentaciones/bc/correo/exim/Ex imPrac.html It works now, thanks for the reply. -Original Message- From: Tony Finch [mailto:[EMAIL PROTECTED] Sent: Sunday, February 19, 2006 3:40 PM To: Terry Miller Cc: users@spamassassin.apache.org Subject: Re: Exiscan + subject rewrite not working On Sun, 19 Feb 2006, Terry Miller wrote: > I looked this up and can't see where I'm doing anything wrong, but the > subject is not being rewritten. You should probably ask this question on the exim-users list. I suspect (but I am not certain) that exiscan doesn't support the message rewrite parts of SpamAssassin, and instead relies on Exim's own facilities for this.. Tony. -- f.a.n.finch <[EMAIL PROTECTED]> http://dotat.at/ BISCAY: WEST 5 OR 6 BECOMING VARIABLE 3 OR 4. SHOWERS AT FIRST. MODERATE OR GOOD.
Re: Over-scoring of SURBL lists...
From: "Jeff Chan" <[EMAIL PROTECTED]> On Thursday, February 16, 2006, 9:07:48 PM, Theo Dinter wrote: I was going to tell you that the stats were real-time, but that's only true for the SURBL rules. URIBL hits aren't being reused during the weekly runs since 3.1 doesn't have those rules. So I did a small tweek and generated my own live results for mails from the past 2 weeks: MSECSSPAM% HAM% S/ORANK SCORE NAME 025247 41170.860 0.000.00 (all messages) 0.0 85.9794 14.02060.860 0.000.00 (all messages as %) 35.418 41.1930 0.1.000 0.900.00 URIBL_JP_SURBL 34.665 40.3177 0.1.000 0.880.00 URIBL_SC_SURBL 26.069 30.3204 0.1.000 0.800.00 URIBL_AB_SURBL 28.024 32.5464 0.29150.991 0.610.00 URIBL_OB_SURBL 48.113 55.7492 1.28730.977 0.550.00 URIBL_BLACK 0.293 0.3406 0.1.000 0.470.00 URIBL_PH_SURBL 0.000 0. 0.0.500 0.420.00 URIBL_RED 0.000 0. 0.0.500 0.420.01 T_URIBL_XS_SURBL 37.539 42.4763 7.26260.854 0.380.00 URIBL_WS_SURBL 0.548 0.3446 1.79740.161 0.030.00 URIBL_GREY Should the ham hit rate of WS really be 7%? That seems rather high. May we ask you to please double check that result? Indeed! I get this here out of 90,000 mail messages: 7URIBL_WS_SURBL 12628 2.16 13.90 39.910.03 That's hitting 40% of everything marked as spam and only 0.03% of everything marked as ham here. {o.o}
Re: Over-scoring of SURBL lists...
On Sun, Feb 19, 2006 at 07:19:28PM -0800, Jeff Chan wrote: > > 37.539 42.4763 7.26260.854 0.380.00 URIBL_WS_SURBL > > Should the ham hit rate of WS really be 7%? That seems rather > high. May we ask you to please double check that result? I was waiting for someone to ask me about that. Here's my net results from Saturday's run: 37.737 42.2554 7.28240.853 0.560.00 URIBL_WS_SURBL That is a valid percentage. 921 hits in 12647 total results = 7.2824%. I verified that all of the 921 hits are in fact listed in the header. -- Randomly Generated Tagline: "No one cares if you backup - only if you can restore." - W. Curtis Preston, Unix Backup & Recovery from O'Reilly pgppLety8PGSM.pgp Description: PGP signature
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
>... >List Mail User wrote: > >> Paul.. None of those pages contain a link. The user would have to >> >copy-paste or hand-type the url. That would defeat any referrer mechanism. >> >> >> Also, whether cut&paste generates a referral all depends on your >> browser and the setting used in some (e.g. Opera). >> >> >Yeah, provided they re-enter it in the same window, and their browser is >configured to generate a referrer for that, it would work. >But that's hardly very reliable, so most affiliate systems use trackers. >It makes life easier on the company paying out marketing fees. > >> Short of hiring the psychic friends network scotts does not have a way >> >of tracking these to back to an affiliate. Period. >> >> >> Nope; And if I can find them 6 months later in one chain's pages, >> would you link to bet they were also referenced by Orchard Supply, Home >> Depot, Lowes and dozens of other large chains and smaller businesses also. >Oh my god.. and next, these stores might put up the products they sell >on their website!!! Oh my god! Affiliate marketing at it's finest! I bet >if you go to lowes.com you can find a link to buy a bag of Scotts >fertilizer on! > >http://www.lowes.com/lowes/lkn?action=productDetail&productId=0-446-36615&lpage=none > >See! It exists! Lowes is an affiliate spammer! Only if they give your email address to someone else (like Scotts). Or if they mailed you a newsletter that contained links to other sites, *OR* transferred any of your personal data to other companies/sites for marketing purposes and received some kind of compensation for it. *** Oh my god! They do - See the fourth paragraph at: http://www.loews.com/loews.nsf/Legal.htm The one labeled "Links to Third Party Sites". Looks like it meets federal legal requirements, but smells like too many lawyers were involved in writing that page; It sure reads like they try to claim that they will not be responsible for spam which is expected by using their website's links. They even specifically state that they *will and do* transfer your personal information to at least some of those sites *and* that they make no attempt of checking on what those sites will do with the data which Loews gives them. It looks like anyone should avoid following any links from the Leows' site to any third party website based on that statement! OK - you win this round: Loews does indeed appear to be an affiliate spammer. (Just you didn't find the evidence - it was elsewhere.) > >So What? I'd assume every hardware store in the NATION had a link or >mention of winterizewithscotts on their website, and on standees on >their store floor next to the bags of fertilizer. > >> At this point, I no longer believe that the winterizewithscotts >> case is a FP, > >Wait.. Let me get this straight. > > >Scenario 1:) >A hardware store hires a spammer to forge his stores domain name as the >From: address of the spam advertising viagra. >He's hoping this directs more consumers to his website by them >copy-pasting the domain from the From: address of a spam into a web browser. >When they get there, he hopes they'll find the promotion for the scott's >sweepstakes. >Based on that, he hopes they'll order the fertilizer from him, in order >to enter the sweepstakes. Thereby making a profit. Huh? (Lookup "strawman" in a dictionary, please.) > >And this is more likely than a regular store owner who has mention of a >sweepstakes on his site that happened to get randomly Joe-Jobbed by a >spammer. Again, huh? > >Or is it merely the fact that the store owner mentions a sweepstakes on >his website that makes him an affiliate spammer, without regard for any >email ever being sent? I guess you don't get email from Home Depot - just sign up and it will come - including "specials" and links to suppliers web pages. If you didn't willingly agree, then they would be an "affiliate" spammer; They might be - I always write "NO EMAIL" on the forms. Probably similar to the Leows policy above - Consumer beware. > >Secnario 2) >Someone posts a URL to a sweepstakes on a web forum. He has no links >through which anyone can buy the product, so no chance of profit by >sale. His URL contains no IDs and is not linked. At best, a copy-paste >of the URL might create a referrer back to the forum, not any site he >operates. Based on that, scotts company is going to track the referrer >back to the fourum, and understand what person posted the URL and pay >them for it? Just because no one gets paid, doesn't mean consent was granted. And are you *sure* the "discount forum" doesn't get paid? The various "coupon" sites all over the web do; Try "discount coupon" in Yahoo! or Google - many of those sites have forums also. > >And this is more likely than just some excited average joe posting on >the web because he found a way to register without buying any fertilizer? Qui
Re: procmail error or mine?
On Sunday 19 February 2006 19:03, jdow wrote: >From: "Gene Heskett" <[EMAIL PROTECTED]> > >> On Sunday 19 February 2006 03:45, jdow wrote: >>>From: "Gene Heskett" <[EMAIL PROTECTED]> >>> >===8<--- >PROCMAILMATCH="X-Procmail: Matched on" >PROCMAILHEADER="X-Procmail: " > >:0 fw > >* ^List-Id: .*(spamassassin\.apache.\org) > >| formail -A "$PROCMAILHEADER an SA list. Mail not processed." >| >:0 fw > >* > ^TO_:.*([EMAIL PROTECTED]|users\.spamassassin\.apa >ch e\ .org) > >| formail -A "$PROCMAILMATCH SpamAssassin Users list" -i >| "Reply-to: > >users@spamassassin.apache.org" >===8<--- Its a direct case of plagerism Joanne since its your script :) >>> >>>These two lines are important. I may have left them out of some of >>>the snips I sent you. >>> >>>PROCMAILMATCH="X-Procmail: Matched on" >>>PROCMAILHEADER="X-Procmail: " >>> >>>{^_^} (The fellow I got them from left them out when he sent me >>> the procmail rules. The problem's contagious, I suspect.) >> >> See my original posting Joanne, they are indeed there, at the top. >> :) > >Then I don't know - ask the procmail folks. >{^_^} Its not as if I am being drowned in spam, its working pretty good I think. I refeed it the stuff between 5.0 and 9.9 thats spam, and do the same for FP but ham with borderline scores, and it seems to be getting better. Biggest problem is I can't keep up with the proliferation of lists that dummys CC: on the debian servers. I finally got upset and had anything that isn't caught by previous debian filters, tossed in the debian-user box if it came from debian.org at all. -- Cheers, Gene People having trouble with vz bouncing email to me should add the word 'online' between the 'verizon', and the dot which bypasses vz's stupid bounce rules. I do use spamassassin too. :-) Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2006 by Maurice Eugene Heskett, all rights reserved.
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
List Mail User wrote: > Huh? (Lookup "strawman" in a dictionary, please.) That's my understanding of what you were claiming happened. Yes, it looks like an absurdly weak argument. However, it's the argument you presented, as best I can make sense of your posts. Or are you admitting that you made those arguments intentionally as a straw man to confuse the issues? > Scenario 3:) > A hardware store posts a *direct* link to a rebate form (or had a > pad of them on or near a shelf); Customer prints (or takes a copy of) the > rebate form (with no opportunity to every see the Scotts' web page telling > him that any email he provides will be used for marketing); Customer buys > a Scotts' product at the store and mails in the printed form, Customer then > receives spam. QED > > Please read the rebate form, then read the Scotts' privacy policy. > As I read the form, and Scotts privacy policy, Scotts cannot send you additional marketing information by email just because you entered this contest. As I read it they can only send you requested mail, but it depends on how you interpret the sentence structure. Regardless, even if we accept your theory that their privacy policy allows it, it doesn't prove, or even suggest, that they did. I find your willingness to accept a twisted reading of a privacy policy as satisfactory proof of spamming activity rather disturbing.
Yup, I really believe this remove link will work!
If you wish to stop future mailings, or if you feel you have been wrongfully placed in our membership, send a blank e mail with No Thanks in the sub ject to [EMAIL PROTECTED] -- You can kinda tell the parts of the spam the spammers don't consider important... Loren
Re: Over-scoring of SURBL lists...
On Sunday, February 19, 2006, 8:07:30 PM, Theo Dinter wrote: > On Sun, Feb 19, 2006 at 07:19:28PM -0800, Jeff Chan wrote: >> > 37.539 42.4763 7.26260.854 0.380.00 URIBL_WS_SURBL >> >> Should the ham hit rate of WS really be 7%? That seems rather >> high. May we ask you to please double check that result? > I was waiting for someone to ask me about that. Here's my net results > from Saturday's run: > 37.737 42.2554 7.28240.853 0.560.00 URIBL_WS_SURBL > That is a valid percentage. 921 hits in 12647 total results = 7.2824%. > I verified that all of the 921 hits are in fact listed in the header. That seems an unusual result. Can you see if there's maybe one FP that's hitting a lot of newsletters or ads? Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: URIBL_BLACK + OB_SURBL double-listed nonspam domain
>... >List Mail User wrote: >> Huh? (Lookup "strawman" in a dictionary, please.) >That's my understanding of what you were claiming happened. Yes, it >looks like an absurdly weak argument. However, it's the argument you >presented, as best I can make sense of your posts. > >Or are you admitting that you made those arguments intentionally as a >straw man to confuse the issues? > >> Scenario 3:) >> A hardware store posts a *direct* link to a rebate form (or had a >> pad of them on or near a shelf); Customer prints (or takes a copy of) the >> rebate form (with no opportunity to every see the Scotts' web page telling >> him that any email he provides will be used for marketing); Customer buys >> a Scotts' product at the store and mails in the printed form, Customer then >> receives spam. QED >> >> Please read the rebate form, then read the Scotts' privacy policy. >> > >As I read the form, and Scotts privacy policy, Scotts cannot send you >additional marketing information by email just because you entered this >contest. As I read it they can only send you requested mail, but it >depends on how you interpret the sentence structure. > >Regardless, even if we accept your theory that their privacy policy >allows it, it doesn't prove, or even suggest, that they did. > >I find your willingness to accept a twisted reading of a privacy policy >as satisfactory proof of spamming activity rather disturbing. > Hold on here - there are two different pages/forms involved. One is for a contest - a copy still exists at: http://www.winterizewithscotts.com/index.tbapp?page=intro The other is a rebate form, with a copy at: http://www.winterizewithscotts.com/index.tbapp?page=rebate_page To get to the rebate form navigating on Scotts' own pages you must go through a page which contains a link to a privacy policy which states that you do consent to receive promotional materials. But the rebate form itself does not say any such thing and if a third party (Lowes, Home Depot, etc.) just provided a link to the rebate form, or a consumer used a "rebate site", they would never see the policy. In all likelyhood the "promotional material" is simply the mailings which you already get by choice, so you've never seen anything "extra". If you look, you'll see as I did that the contest page *does* have a link to the "privacy policy", and as such I consider any one who signed up to have made "informed consent" (even if they are Joe Average, and didn't know that was what they were doing). So once again, I'll say: Based on the REBATE form, I believe it is more likely than not that at least some consumers did receive unsolicited commercial email from Scotts - i.e. spam. Now lets leave all the arguing about specific cases behind, because I believe you have a point regarding the interactions of the various RBLs used in SA, not just for URIs but also for DNS_* and RCVD_* rules. From my reading of the FAQ and the code used in the perceptron and to create meta-rules, there is a strong bias against creating meta-rules after the fact (i.e. after the perceptron run) and an extremely strong bias against negaptive scoing meta-rules (a separate issue, but related to the exponent used in weighting the construction of such rules). I also believe that a better result would occur with more URI RBLs in use (currently the SBL is the only IP based RBL used for URIs, and hence for NS lookups). A naive construction of all possible meta-rules will quickly expand exponentially a huge number cases with simple "and" clauses (and counting clause meta-rules and other type of constructions would add even more cases). Also, the exist several RHS and IP URIs, which have a much higher FP rate than either any of the SURBLS (ignoring the recent 7% [ws] report) or URIBL, but which have proven very useful to me. Some of these include using the AHBL for URIs (out performs the RCVD rule for me in both spam hit and with a lower FP rate), the RFCI lists as URI rules, (armored clothing on) the SORBS spamtrap list, and the completewhois list; The last two having the advantage of being IP based so they catch nameservers on IPs just as the SBL does. The evaluation of meta-rules is *much* less expensive than that of the primary rules (simple expressions only to evaluate, no additional DNS lookups or server queries required); If we consider the 5 SURBLs (not counting XS), 2 URIBL lists (black and grey, ignoring red), 5 RFCI lists, completewhois, SORBS' spamtrap list, the SBL and the RAZOR "CF"/e8 test(s), we have at least 16 rules which take URIs into account; All ot the possible meta-rules "and"'d cases would give us millions of meta-rules, far too many. But by guessing which ones are reasonably related we could simply contruct the 31 rules created by the set of RFCI lists (1 choose 5 plus 2 choose 5...) the 323 rules co