RE: Prioiritze new documents

2008-01-03 Thread Seneviratne_Yasoja
IMHO it would be nice if Lucene's Similarity formula took the indexed-date of the document into account. Ideally as an optional setting, where the user can provide a date field as well. Some of the other search engines do - for example Fast's Instream. It makes sense that as documents age over t

RE: does the MultiSearcher class calculate IDF properly?

2007-12-10 Thread Seneviratne_Yasoja
Thank you for the response. I logged a bug https://issues.apache.org/jira/browse/LUCENE-1087 -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Friday, December 07, 2007 10:30 PM To: java-user@lucene.apache.org Subject: Re: does the MultiSearcher class calculate IDF

RE: Indexing XML documents (Urdu)

2007-12-06 Thread Seneviratne_Yasoja
Hi Liaqat, I'd rather keep the email-thread on the lucene user list. The code I used is below, the thing to do is be careful when reading UTF-8 text so you don't garble it. import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler; import org.apache.lucene.document.*; import org.apache.luc

does the MultiSearcher class calculate IDF properly?

2007-12-05 Thread Seneviratne_Yasoja
I tried the following. Creating 2 different indexes, search each individually and print score details and compare to searching both indexes with MulitSearcher and printing score details. The "docFreq" value printed don't seem right - is this just a problem with using Explain together with the M

RE: Indexing XML document

2007-12-05 Thread Seneviratne_Yasoja
The example from Grant's earlier reply uses UTF-8: http://wiki.apache.org/lucene-java/IndexingOtherLanguages I tried out the Urdu in your email, first converted it to UTF-8, then Lucene seemed to index/search ok, SAX worked as well for parsing it. -Original Message- From: Liaqat Ali [ma

indexing a large number of fields (and nesting them)

2007-11-12 Thread Seneviratne_Yasoja
Hi, I need to index tons of meta-data fields along with every document (around 80 fields, mostly strings of 32 characters, some integers, bools and dates too, a couple of strings are longer like 64 chars or 120). Also it would be nice if there was a way to represent nested fields, and query for t