Suggested Approach

2023-04-26 Thread Fortney, James T - CSCCS
Here is a possible new challenge for those who like what I believe is a "different" challenge. How would we have SA properly learn to identify such messages? In an eight minute period this morning, someone dumped 9 messages into a Google mail server using one of my addresses as the FROM and

Re: Fine-tuning SA URI extraction

2023-04-26 Thread Henrik K
On Thu, Apr 27, 2023 at 01:45:58AM +0200, Matija Nalis wrote: > > - complex but emulating browser behaviour better: > Add full handling of relative URIs. i.e. have push_uri() detect all > relative URIs and convert them to absolute URIs before adding them > to the list of URIs. If you would

Re: Fine-tuning SA URI extraction

2023-04-26 Thread Matija Nalis
On Wed, Apr 26, 2023 at 03:21:50PM -0400, Kris Deugau wrote: > http://deepnet.cx/~kdeugau/spamtools/cornell-birds.eml Thanks. Adding some dbg() in HTML.pm of my SA 3.4.6, it seems it is triggered this part of the email: "background" is deprecated (but still supported) HTML attribute: https://

Re: Fine-tuning SA URI extraction

2023-04-26 Thread Kris Deugau
Bill Cole wrote: On 2023-04-26 at 11:06:56 UTC-0400 (Wed, 26 Apr 2023 11:06:56 -0400) Kris Deugau is rumored to have said: Am I missing some configuration option that can do this, or am I left with doing one of:  - just suppressing lookups of the canonicalized URI  - removing the canonicalize

Re: Fine-tuning SA URI extraction

2023-04-26 Thread Bill Cole
On 2023-04-26 at 11:06:56 UTC-0400 (Wed, 26 Apr 2023 11:06:56 -0400) Kris Deugau is rumored to have said: Am I missing some configuration option that can do this, or am I left with doing one of: - just suppressing lookups of the canonicalized URI - removing the canonicalized URI from the DNS

Re: Fine-tuning SA URI extraction

2023-04-26 Thread Benny Pedersen
Kris Deugau skrev den 2023-04-26 17:06: ... Am I missing some configuration option that can do this, or am I left with doing one of: - just suppressing lookups of the canonicalized URI - removing the canonicalized URI from the DNSBL, even if the listing might be justified where the *NON*-canon

Fine-tuning SA URI extraction

2023-04-26 Thread Kris Deugau
SA has long gone to great lengths to extract URIs from things which are not strictly URIs, on the basis that mail clients do the same and SA needs to inspect such things for DNSBL lookups. I'm fine with this. However, once in a while I come across a case where something is clearly being extra