Greetings, I'm trying to investigate whether SpamAssassin can be used in a non-spam application that we're trying to build. I've read lots of stuff on the website but I'm still not sure. I thought I would ask you, the experts.
The application needs to determine whether a certain domain name is "similar" to another domain name. We have a list of known domain names, and occasionally want to compare a "target" domain name to see if it is similar to any of the known domain names. The target might contain replacement characters ("1" instead of "I" or "L", zero instead of "O", gratuitious dots or hyphens, etc.) in much the same way that spammers try to get past spam filters. That's why I thought SpamAssassin might be appropriate. To give an example, we want to automatically detect that "my-d0m.a1n_name.com" is very close to "mydomainname.com". But from what I've read, I think it may not be appropriate for several reasons: 1) We probably would have much more ham (known domain names) than spam (close to a known domain name, but not legal) 2) We wouldn't have large amounts of ham or spam to feed through SpamAssassin to enable it to learn and improve 3) The "target" domain name would in most cases be a single token as far as SpamAssassin is concerned; unlike an email which likely contains hundreds of tokens from which to decide if it is spam What do you think? Would it take a lot of work to adapt SpamAssassin for this application? Does it seem like an appropriate tool to use? Thanks in advance, -Rick