Responding to a lot of questions here. The lists contain both host names
and IP addresses. IP addresses everyone understands. So I'll talk about
host names. Wells Fargo Bank - for example - (wellsfargo.com - is in the
white list as is all of Wells Fargo's hosts. This bank sends nothing but
100% good email. But to avoid spoofing of pointer records you have to
use Forward Confirmed RDNS (FcRDNS).
1.2.3.4 PTR --> mail.example.com
mail.example.com A --> 1.2.3.4
This is nearly impossible to spoof.
Same it true for yellow lists. If the FcRDNS resolves to hotmail.com,
yahoo.com, gmail.com then you can skip all other IP testing because the
IP address tells you nothing about if it is or isn't spam.
Warren Togami wrote:
On 09/28/2009 10:07 PM, Marc Perkel wrote:
I'd like to keep the name HOSTKARMA as standard.
If that's so, then we probably want that in the spamassassin rule
name. Your wiki page suggests JMF is the name. A number of people
probably already configured their spamassassin using your suggested
JMF rule names and they would need to be educated to remove it.
How about these for rule names, so the rule names are not too long?
RCVD_HOSTKARMA_BL Black
RCVD_HOSTKARMA_WL White
RCVD_HOSTKARMA_YL Yellow
RCVD_HOSTKARMA_BR Brown
I'm willing go go with whatever name works better for the community. I
will change my wiki to be consistent.
Hi Marc,
I appreciate your desire for everyone to wholly benefit from your
work, but please let us implement this for spamassassin in stages
starting from the lowest hanging fruit.
First please confirm that you approve of the above new rule names, if
you don't want it to be known as JMF.
Yes - or whatever works best. I can change my wiki to reflect consensus.
Hi Warren,
No one has actually implemented the rules for my blacklists correctly.
My lists support both IP and hostname lookups. The hostname assumes that
you have forward confirmed the RDNS so that you eliminate those who
might spoof.
Please explain in greater detail? Can this be determined wholly from
the Headers and message body after the MTA had passed the mail to the
MDA?
Yes - it does require 2 DNS calls to do this for FcRDNS. You need a PTR
call to get the RDNS and an A record call to confirm it.
Yellow means that the IP or hostname contains no useful information as
to spam or no spam. On my system once I determine a host is yellow I
skip all blacklists and whitelists tests. Yellow is for Yahoo, Hotmail,
Gmail, etc where the IP has no information and all host tests are
meaningless.
My NoBL list is similar to yellow except that you can skip black list
lookup but maybe might be whitelisted somewhere.
Please help me better understand, what are examples of a sequence of
events that would land an IP address on the NoBL?
NoBL is determined a number of ways. NoBL is what most RBLs call white
listing in that it means don't include it in any black list. To me white
list means a spam free source. People who remove their IP manually using
my form will be on the NoBL list. Or it might be what I have determined
that there is some good email coming from the IP and they may be a
candidate for white listing but I have yet to determine that. Yellow
listing is where I know they should not be black listed but I also know
they should not be white listed. (yahoo, gmail, hotmail). NoBL is where
I know they should not be black listed but might be white listed.
An important point to understand here is that I don't use my own lists
in Spam Assassin. I do most of my filtering with Exim rules. I use my
lists to avoid using SA to reduce system load. SA sees mostly yellow
listed hosts.
If you just want to score points then Black, White, and Brown can be
assigned points. Yellow should be zero points regardless of how it
tests.
I am aware that Yellow isn't useful for scores. It is however useful
for statistical analysis in masschecks, and it doesn't cost
spamassassin any more to print if it hits. In particular I'm looking
to see if there are any reliable trends of overlap between Yellow and
other spamassassin rules.
Fair enough. I just didn't want you assigning points to a yellow listing
because the results would be false.
I think the real power of my lists is in the host name lookups. It would
be worthwhile to implement that.
Please describe how this is more effective than IP lookups?
I don't have a list of IP addresses that Yahoo uses. However, if the
FcRDNS resolves to yahoo then I can skip all other RBL resting because I
know it's a yahoo source. Same is true of white and black listed host
names. On my system if a host name lookup returns yellow, then I add the
sending IP to my yellow lists for those using IP lookups. Same with the
other colors.
I think my white listing is very accurate at this point. The thing about
white servers is that they aren't evasive like spammers. There should be
some short circuiting options to reduce system load on SA for white
lookups.
Generally spamassassin does not short-circuit by default for any
reason. There is an option to do so, but I think it is only to stop
testing rules if the score goes beyond a certain point. Please file a
separate bug for this if it is important to you.
I'm just making a suggestion. SA is a high load program. If you are
processing a lot of email then you will need a lot of servers if you use
SA on everything. However if you can prescreen the email blocking what
you are sure is spam and passing what you are sure is good then you can
process a lot more email with far fewer servers.