To address the issue of matching anchor text containing Unicode characters,
I've implemented a new rule option called unicode_text. This option
ensures that the anchor text is converted to Unicode before being compared
against the rule's regular expression. As a result, the following rule now
cor
Thanks for the detailed analysis of the uri_detail plugin bug. I
appreciate you taking the time to investigate this so thoroughly.
I'll open a bug report with the SpamAssassin project, including the details
from your analysis and a sample spam email that demonstrates the problem.
Thanks again fo
On Sun, 2 Feb 2025, Jimmy wrote:
dbg: uri: Not match:
text:\x{E0}\x{B8}\x{95}\x{E0}\x{B9}\x{88}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{B2}\x{E0}\x{B8}\x{A2}\x{E0}\x{B8}\x{B8}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B1}\x{E0}\x{B8}\x{99}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B5}
not matches the
patt
*When adding debug to source like this: *
* if (exists $rule->{text}) { next unless $info->{anchor_text};
my($op,$patt,$neg) = @{$rule->{text}}; my $match; for my $text
(@{ $info->{anchor_text} }) {if ( ($op eq '=~' && $text =~ $patt)
|| ($op
On Sun, 2 Feb 2025, Jimmy wrote:
Hello,
I am experiencing difficulties creating a rule to match UTF-8 anchor text
using the plugin, and I suspect there might be a bug related to UTF-8
matching.
For example, I attempted to use the following rule:
uri_detail UNICODE_LINK_TEXT text =~
/\\x{E0}\\
Hello,
I am experiencing difficulties creating a rule to match UTF-8 anchor text
using the plugin, and I suspect there might be a bug related to UTF-8
matching.
For example, I attempted to use the following rule:
uri_detail UNICODE_LINK_TEXT text =~
/\\x{E0}\\x{B8}\\x{97}\\x{E0}\\x{B8}\\x{B1}\\x