On Tue, 10 Aug 2010, Henrik K wrote:

Ok I did some more testing since this is an interesting experiment..

I dumped 15000 mail bodies into a file like SA sees them and feeded it to simple Perl script.

Runtime for different methods (memory used including Perl itself):

- Single 70000 name regex, 20s (8MB)
- 7 regexes of 10000 names each, 141s (9MB)
- "Martin style", lookups from Perl hash, 8s (12MB)

Yeah looking at this I would go for the generic regex and test all matches with names stored in Perl hash. Average count of "names" to check per message was around 100, so using SQL directly would be inefficient though possible.

This smells like a custom plugin, building a hash from database queries of names added since plugin-local last-updated-datetime. Big initialization hit unless you build persistence into the plugin, but minimal database traffic primarily consisting of an "IF EXISTS()" query, and a few rows queried every time a new patient is added to the system.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Windows Vista: Windows ME for the XP generation.
-----------------------------------------------------------------------
 5 days until the 65th anniversary of the end of World War II

Reply via email to