On Tue, Aug 10, 2010 at 07:37:32AM -0700, John Hardin wrote: > On Tue, 10 Aug 2010, Henrik K wrote: > > >Ok I did some more testing since this is an interesting experiment.. > > > >I dumped 15000 mail bodies into a file like SA sees them and > >feeded it to simple Perl script. > > > >Runtime for different methods (memory used including Perl itself): > > > >- Single 70000 name regex, 20s (8MB) > >- 7 regexes of 10000 names each, 141s (9MB) > >- "Martin style", lookups from Perl hash, 8s (12MB) > > > >Yeah looking at this I would go for the generic regex and test all > >matches with names stored in Perl hash. Average count of "names" > >to check per message was around 100, so using SQL directly would > >be inefficient though possible. > > This smells like a custom plugin, building a hash from database > queries of names added since plugin-local last-updated-datetime. Big > initialization hit unless you build persistence into the plugin, but > minimal database traffic primarily consisting of an "IF EXISTS()" > query, and a few rows queried every time a new patient is added to > the system.
That just sounds too much work for little gain. Perhaps better in some other scenario.. it was already stated that the patient names are dumped somewhere, and usually SA is only reloaded once a day or such. Reading in a file is the simplest and most efficient way to go. Not to mention insecurities that might arise from "querying a patient database".