At 11:36 PM 3/29/02 -0500, Jim Witte wrote:
>   I'm contemplating writing some software to scan through a large volume 
> of email (over 95 MB) to identify threads and remove quoted material.
>Does anyone have any good references on algorithms to do text processing 
>like this for such a massive amount of data?

Is this something you're planning on doing once, or many times?  95MB is 
nothing; right now I'm scanning through several hundred gigabytes of 
text.  Do you need sub-second response on this?  If not, I don't see the 
need for advanced algorithms.


--
Peter Scott
Pacific Systems Design Technologies
http://www.perldebugged.com


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to