Re: Looking for lots of words in lots of files

2008-06-19 Thread Bruno Desthuilliers
brad a écrit : Just wondering if anyone has ever solved this efficiently... not looking for specific solutions tho... just ideas. I have one thousand words and one thousand files. I need to read the files to see if some of the words are in the files. I can stop reading a file once I find 10 o

Re: Looking for lots of words in lots of files

2008-06-18 Thread Cong
On Jun 18, 11:01 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote: > Calvin Spealman wrote: > > Upload, wait, and google them. > > > Seriously tho, aside from using a real indexer, I would build a set of > > thewordsI'mlookingfor, and then loop over each file, looping over > > thewordsand doing quick ch

Re: Looking for lots of words in lots of files

2008-06-18 Thread Jeff McNeil
On Jun 18, 10:29 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > brad wrote: > > Just wondering if anyone has ever solved this efficiently... not looking > > for specific solutions tho... just ideas. > > > I have one thousand words and one thousand files. I need to read the > > files to see if

Re: Looking for lots of words in lots of files

2008-06-18 Thread Martin P. Hellwig
Kris Kennaway wrote: If you can't use an indexer, and performance matters, evaluate using grep and a shell script. Seriously. grep is a couple of orders of magnitude faster at pattern matching strings in files (and especially regexps) than python is. Even if you are invoking grep multipl

Re: Looking for lots of words in lots of files

2008-06-18 Thread Robert Bossy
I forgot to mention another way: put one thousand monkeys to work on it. ;) RB Robert Bossy wrote: brad wrote: Just wondering if anyone has ever solved this efficiently... not looking for specific solutions tho... just ideas. I have one thousand words and one thousand files. I need to read t

Re: Looking for lots of words in lots of files

2008-06-18 Thread Robert Bossy
brad wrote: Just wondering if anyone has ever solved this efficiently... not looking for specific solutions tho... just ideas. I have one thousand words and one thousand files. I need to read the files to see if some of the words are in the files. I can stop reading a file once I find 10 of t

Re: Looking for lots of words in lots of files

2008-06-18 Thread Francis Girard
Hi, Use a suffix tree. First make yourself a suffix tree of your thousand files and the use it. This is a classical problem for that kind of structure. Just search "suffix tree" or "suffix tree python" on google to find a definition and an implementation. (Also Jon Bentley's "Programming Pearls"

Re: Looking for lots of words in lots of files

2008-06-18 Thread Kris Kennaway
Calvin Spealman wrote: Upload, wait, and google them. Seriously tho, aside from using a real indexer, I would build a set of the words I'm looking for, and then loop over each file, looping over the words and doing quick checks for containment in the set. If so, add to a dict of file names to

Re: Looking for lots of words in lots of files

2008-06-18 Thread Calvin Spealman
Upload, wait, and google them. Seriously tho, aside from using a real indexer, I would build a set of the words I'm looking for, and then loop over each file, looping over the words and doing quick checks for containment in the set. If so, add to a dict of file names to list of words found

Re: Looking for lots of words in lots of files

2008-06-18 Thread Diez B. Roggisch
brad wrote: > Just wondering if anyone has ever solved this efficiently... not looking > for specific solutions tho... just ideas. > > I have one thousand words and one thousand files. I need to read the > files to see if some of the words are in the files. I can stop reading a > file once I find

Looking for lots of words in lots of files

2008-06-18 Thread brad
Just wondering if anyone has ever solved this efficiently... not looking for specific solutions tho... just ideas. I have one thousand words and one thousand files. I need to read the files to see if some of the words are in the files. I can stop reading a file once I find 10 of the words in i