On Thu, 25 Feb 2021 12:13:59 -0500 Alan wrote:
> Bitcoin addresses start with either 1 or 3. Most do, but around 13% of those reported to the bitcoin abuse database are in the format starting with "bc". > It's less general specifically to avoid FPs. Personally I'm weighting > this pretty high so I don't want to trigger on non-obfuscated BTC > addresses. Now I come to think of it I think we've been here before, and allowing arbitrary spaces lead to a reported FP on ordinary text. If you still meta with A4A_PORNSCAM_WORD you can afford to take some risks with the address match though. Before __BITCOIN_ID was in the core rules I had my own version for the ^[13] format that checked for mixed case and an additional digit. If those conditions are not met it's most likely an FP. It's also possible to tighten the range down to {32,33} or even {33} without losing many matches: $ for n in `jot 12 25` ; do printf "$n" ; < bitcoinlist egrep "^[13].{${n}}$" | wc -l ; done 25 0 26 0 27 0 28 0 29 3 30 1 31 4 32 1659 33 50290 34 8