On Tue, 10 Aug 2010, Henrik K wrote:
Ok I did some more testing since this is an interesting experiment..
I dumped 15000 mail bodies into a file like SA sees them and feeded it
to simple Perl script.
Runtime for different methods (memory used including Perl itself):
- Single 70000 name regex, 20s (8MB)
- 7 regexes of 10000 names each, 141s (9MB)
- "Martin style", lookups from Perl hash, 8s (12MB)
Yeah looking at this I would go for the generic regex and test all
matches with names stored in Perl hash. Average count of "names" to
check per message was around 100, so using SQL directly would be
inefficient though possible.
This smells like a custom plugin, building a hash from database queries of
names added since plugin-local last-updated-datetime. Big initialization
hit unless you build persistence into the plugin, but minimal database
traffic primarily consisting of an "IF EXISTS()" query, and a few rows
queried every time a new patient is added to the system.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Windows Vista: Windows ME for the XP generation.
-----------------------------------------------------------------------
5 days until the 65th anniversary of the end of World War II