Re: Hierarchical classified documents

2006-11-24 Thread karl wettin
25 nov 2006 kl. 04.19 skrev Chris Hostetter: : This is the simplest thing I could think of: : : * store full namespace path as one term: "1/2/3" : * store each namespace identity as on term: "1", "2", "3" I use the second approach for finding all Docs at a given node or below in the tr

Re: Hierarchical classified documents

2006-11-24 Thread Chris Hostetter
: This is the simplest thing I could think of: : : * store full namespace path as one term: "1/2/3" : * store each namespace identity as on term: "1", "2", "3" I use the second approach for finding all Docs at a given node or below in the tree ... instead of the first appraoch, have a field w

Hierarchical classified documents

2006-11-24 Thread karl wettin
This is an excerpt of my business object class diagram (go fixed font size): [MyObject] | 0..* | V parent [Namespace]<>--+ |0..1 | | 0..*| +-+ child All business object instances have a unique identifier. A namespace path

Re: Hit.getDocument performance

2006-11-24 Thread Mark Miller
Hits will use TopDocs to return the first 100 doc ids and put them in a cache (normalizing their scores first if I remember correctly)...then when you retrieve a doc it will put that in a cache as well. If you ask for a doc over 100 it will execute a topdocs search again to fill the cache up to

Re: Newbie Search Question

2006-11-24 Thread Erick Erickson
If we're still dealing with StringReader(text) throwing an error It really shouldn't unless the document has no field named "contents". Here's what I'd do... Get a copy of Luke (google luke lucene) to examine your index. Figure out what the document ID is that you're blowing up on and look at

Re: Hit.getDocument performance

2006-11-24 Thread mark harwood
Look in the latest SVN version - there is some new code for "Lazy field loading" i.e. not incurring the hit for retrieving *all* fields if you only want to retrieve a subset from a document. Not used it myself yet but it may be applicable. If you *really* want all matching docs too then I would

Re: Hit.getDocument performance

2006-11-24 Thread Luis Rodrigo Aguado
I have just read in the API doc that going through the Hits returned is not really adviceable. However, I am not developing the final application, but a middleware that accesses Lucene, so I would not want to take the decision to cut the number of docs returned, but let the application do that.

Hit.getDocument performance

2006-11-24 Thread Luis Rodrigo Aguado
Hi all, I am having a performance bottleneck that is driving me crazy. Maybe anyone there has a clue of the source... I am working with an index of 2400 pdf files. For each of them, I index the contents, and I store the filename and the creation date. Nothing else. The resulting ind

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-11-24 Thread Erik Hatcher
On Nov 24, 2006, at 3:22 AM, Jin Yiqing wrote: Does this book really exit? I googled and didn't find any introduction about it :) No, I'm sure Bob meant to say "Lucene in Action" in which he contributed a wonderful case study on bits of LingPipe. Erik 2006/11/22, Erik Hatch

Re: Fwd: Hibernate Lucene trademark issues

2006-11-24 Thread Jin Yiqing
Thanks. I'm looking forward to your reply. :) 2006/11/23, Emmanuel Bernard <[EMAIL PROTECTED]>: Hi Jin, I'll answer your email on the hibernate dev list. See you there :-) Jin Yiqing wrote: > > Hi,Emmanuel > i think you did a very greate job! Since i am now working on a system > that > usin

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-11-24 Thread Jin Yiqing
Does this book really exit? I googled and didn't find any introduction about it :) 2006/11/22, Erik Hatcher <[EMAIL PROTECTED]>: On Nov 21, 2006, at 5:46 PM, Bob Carpenter wrote: > LingPipe in Action. Now that's a book I'd love to own!