Re: termpositions at index time...

2006-10-19 Thread Erick Erickson
Thanks. That's very similar to what we're doing, and I'd love to see some technical details too... Erick On 10/19/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Oct 18, 2006, at 4:50 PM, Erick Erickson wrote: > We're indexing books. I need to > a> return books ordered by relevancy > b> for an

Re: termpositions at index time...

2006-10-19 Thread Erik Hatcher
On Oct 18, 2006, at 4:50 PM, Erick Erickson wrote: We're indexing books. I need to a> return books ordered by relevancy b> for any single book, return the number of hits in each chapter (which, of course, may be many pages). I think your application deserves a good look at XTF:

Re: termpositions at index time...

2006-10-18 Thread Erick Erickson
I tried the notion of a temporary RAMDirectory already, and the documents parse unacceptably slowly , 8-10 seconds. Great minds think alike. Believe it or not, I have to deal with a 7,500 page book that details Civil War records of Michigan volunteers. The XML form is 24M, probably 16M of text exc

Re: termpositions at index time...

2006-10-18 Thread Michael D. Curtin
Erick Erickson wrote: Arbitrary restrictions by IT on the space the indexes can take up. Actually, I won't categorically I *can't* make this happen, but in order to use this option, I need to be able to present a convincing case. And I can't do that until I've exhausted my options/creativity.

Re: termpositions at index time...

2006-10-18 Thread Erick Erickson
Arbitrary restrictions by IT on the space the indexes can take up. Actually, I won't categorically I *can't* make this happen, but in order to use this option, I need to be able to present a convincing case. And I can't do that until I've exhausted my options/creativity. And this it way keeps fo

Re: termpositions at index time...

2006-10-18 Thread Michael D. Curtin
Erick Erickson wrote: Here's my problem: We're indexing books. I need to a> return books ordered by relevancy b> for any single book, return the number of hits in each chapter (which, of course, may be many pages). 1>If I index each page as a document, creating the relevance on a book basis

termpositions at index time...

2006-10-18 Thread Erick Erickson
Here's my problem: We're indexing books. I need to a> return books ordered by relevancy b> for any single book, return the number of hits in each chapter (which, of course, may be many pages). 1>If I index each page as a document, creating the relevance on a book basis is interesting, but collec