On Tue, 10 Aug 2010, Henrik K wrote:
On Tue, Aug 10, 2010 at 07:37:32AM -0700, John Hardin wrote:
On Tue, 10 Aug 2010, Henrik K wrote:
Ok I did some more testing since this is an interesting experiment..
I dumped 15000 mail bodies into a file like SA sees them and
feeded it to simple Perl script.
Runtime for different methods (memory used including Perl itself):
- Single 70000 name regex, 20s (8MB)
- 7 regexes of 10000 names each, 141s (9MB)
- "Martin style", lookups from Perl hash, 8s (12MB)
Yeah looking at this I would go for the generic regex and test all
matches with names stored in Perl hash. Average count of "names"
to check per message was around 100, so using SQL directly would
be inefficient though possible.
This smells like a custom plugin, building a hash from database
queries of names added since plugin-local last-updated-datetime. Big
initialization hit unless you build persistence into the plugin, but
minimal database traffic primarily consisting of an "IF EXISTS()"
query, and a few rows queried every time a new patient is added to
the system.
That just sounds too much work for little gain. Perhaps better in some
other scenario.. it was already stated that the patient names are dumped
somewhere, and usually SA is only reloaded once a day or such. Reading
in a file is the simplest and most efficient way to go. Not to mention
insecurities that might arise from "querying a patient database".
Ah; I missed (or forgot) the non-real-time nature of the scenario. Never
mind, then; batch-generated rules or a plugin with a batch-generated
static hashtable would suffice.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
An operating system design that requires a system reboot in order to
install a document viewing utility does not earn my respect.
-----------------------------------------------------------------------
5 days until the 65th anniversary of the end of World War II