On Sat, 15 Jul 2017 13:13:31 -0500 (CDT) David B Funk wrote: > > On Sat, 15 Jul 2017, Antony Stone wrote:
> One observation; that list has over 10,000 entries which means that > you're going to be adding thousands of additional rules to SA on an > automated basis. > > Some time in the past other people had worked up automated mechanisms > to add large numbers of rules derived from example spam messages (Hi > Chris;) and there were performance issues (significant increase in SA > load time, memory usage, etc). I'm not an expert on perl internals, so I may be wide of the mark, but I would have thought that the most efficient way to do this using uri rule(s) would be to generate a single regex recursively so that scanning would be O(log(n)) in the number of entries rather than O(n). You start by stripping the http:// and then make a list of the all the first characters, then for each character you recurse. You end up with something like ^http://(a(...)|b(...)...|z(...)) Where each of the (...) contains a similar list of alternations to the top level. You can take this a bit further and detect when the all the strings in the current list start with a common sub-string - you can then generate the equivalent of a patricia trie in regex form. > Be aware, you may run into that situation. Using a URI-dnsbl avoids > that risk. The list contains full URLs, I presume there's a reason for that. For example: http://invoiceholderqq.com/85.exe http://invoiceholderqq.com/87.exe http://invoiceholderqq.com/93.exe http://inzt.net/08yhrf3 http://inzt.net/0ftce4