Re: How to calculate centroid from HITS?

2007-04-19 Thread Lokeya
I figured out that there are already java API's written for this and available. LucQE has that : QueryExpansion.expandQuery(java.lang.String queryStr, org.apache.lucene.search.Hits hits, java.util.Properties prop) which returns the expanded query (will have centroid) Grant Ingersoll-6 wrote:

IndexReader method semantics

2007-04-19 Thread Daniel Noll
Hi all. I'm considering making a kind of IndexReader where each time terms() is called it might return a different sequence even though the reader hasn't been reopened. Would that kind of thing conceivably cause problems anywhere else in the framework? Which is to ask, do other parts of the

Re: Document Boost

2007-04-19 Thread Erick Erickson
I hate to ask this (actually, I don't hate it, but...) "what behavior of the scoring are you actually finding doesn't fit your needs?". The reason I ask is that I've been asked to change the scoring, that is, set boosts, based on some vague notion of "how things should work" that is often just th

Re: Document Boost

2007-04-19 Thread Les Fletcher
Oo I like the BAR_significant field idea. It seems that you'd have to have one of those for every different level of boosting in your document. But that is significantly easier than reforming a query for 30-odd fields. The next quersion would be should you omit the boosted field word

Re: Document Boost

2007-04-19 Thread Chris Hostetter
The full post Erick alluded too may be helpful... http://mail-archives.apache.org/mod_mbox/lucene-java-user/200609.mbox/[EMAIL PROTECTED] in general, if your goal is that words in the "metadata" of a document should be worth more then words in the "body" then you should have two fields: "metada

Re: adding a field at index-time

2007-04-19 Thread Mike Klaas
On 4/18/07, William Mee <[EMAIL PROTECTED]> wrote: I'd like to add metadata which I get *after* indexing a document's contents to the index. To be more specific: I'm implementing shingling (detection of near-duplicate documents) and want to add the document fingerprint (which is based on the s

Re: Document Boost

2007-04-19 Thread Les Fletcher
I am also releatively new to lucene and was wondering about this. The way it seems to work, is that if you boost a field then you have to actually specify that field in your query to benefit from that field boost. Otherwise you'll search the default field and the boost will be ignored. I hac

Re: Document Boost

2007-04-19 Thread HG1212
I am setting the boost at index time. Thanks -- View this message in context: http://www.nabble.com/Document-Boost-tf3609748.html#a10088201 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscrib

Re: question about field equality in query

2007-04-19 Thread Chris Hostetter
: > Thank you for your reply, your solution doesnt work in my case because I : > was thinking of indexing more than one document in a single index and each : > document representing a table in database. so if I put more than one : > document some fields for a single document will be empty. somethi

Re: Document Boost

2007-04-19 Thread Erick Erickson
Were you setting the boosts at index or search time? From an old e-mail from Chris H. "index time field boosts are a way to express things like "this documents title is worth twice as much as the title of most documents" query time boosts are a way to express "I care about matches on this clause

Document Boost

2007-04-19 Thread HG1212
Hi there, I am new to Lucene and would appreciate any help on this. Thank you in advance. I want the order of the search results based on the keywords mentioned in the meta information of the document. For example, if I have two very similar documents first.htm and second.htm, first.htm has keyw

Re: Merge performance

2007-04-19 Thread Michael D. Curtin
david m wrote: A couple of reasons that lead to the merge approach: - Source documents are written to archive media and retrieval is relatively slow. Add to that our processing pipeline (including text extraction)... Retrieving and merging minis is faster than re-processing and re-indexing f

Re: Index performance

2007-04-19 Thread Doron Cohen
"Tony Qian" wrote on 19/04/2007: > I found the problem which slowed down indexing. It is our NFS file system. If only the index is maintained on NFS (say input is on local disk or DB or such) it may help to index to local disk and only occasionally (once a day?) copy/update to an index maintained

Problem with MultiFieldQueryParser

2007-04-19 Thread Mark Woon
Hi all, Can someone clear something up for me regarding MultiFieldQueryParser? Using the same inputs to MultiFieldQueryParser.parse(|String[] queries, String[] fields, Analyzer analyzer|) and MultiFieldQueryParser.parse(String query) I seem to be getting the exact same query back (according

Re: Newbie needs help "addField"

2007-04-19 Thread jim shirreffs
Thanks to Karl and Donna, I followed your suggestions and was able to get a test driver (modified demo code) working, thanks again. jim s - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROT

Re: Index performance

2007-04-19 Thread Michael McCandless
"Tony Qian" <[EMAIL PROTECTED]> wrote: > I found the problem which slowed down indexing. It is our NFS file > system. NFS performance is generally slower than local filesystem, though there may be ways to tune it (I'm not sure). I have heard but not personally verified that mounting NFS read-onl

Re: Index performance

2007-04-19 Thread Tony Qian
Doron and Erick, I found the problem which slowed down indexing. It is our NFS file system. Thanks for help. Tony From: "Tony Qian" <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: Index performance Date: Mon, 16 Apr 2007 14:37:46 +