On Tue, Nov 2, 2010 at 11:38 AM, José Mejuto <joshy...@gmail.com> wrote: > Hello FPC-Pascal, > > Tuesday, November 2, 2010, 11:02:18 AM, you wrote: > > TH> If I understand it correctly, this assumes reading the whole file into > TH> memory at once. Depending on the size of that file and other conditions, > TH> this may or may not be advisable... > > Yes, and a pdf2text conversion will reduce the PDF file to a 1% of its > original size, so unless you handle 10 gigabyte PDFs should be not > problem in loading the whole file in memory. > > I doubt that there are memory problems as running pdf2text will for > sure consume more memory that the result file size. > > Of course if you will end up with 300 megabytes txt files then a > different approach would be needed using a buffer with a window over > the size of the searched text. > > Also logic will be different if you would like to match one word, > several words, large sentences, sequeces of chars, etc.
I need to search several words. So, I can't use Pos function to search each word. My algorithm need to read each word (token) just one time to be fast. I'll define each separator for each Token like <space>, comma, "/", "\", <enter>, etc. For each token found, I'll search a combination in my lists of words. If I found a match, I need to know which page the token was found... Marcos Douglas _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal