Re: Custom sorting - memory leaks

2007-07-06 Thread Xiaocheng Luan
Intentionally copied the subject line of this thread (from last August), and an email from the thread is attached at the end of this email - I ran into similar problems in custom sorting (memory leak due to caching) - the subject has been well discussed in the thread but just want to add a voic

Re: Emulating Pages Search

2007-04-01 Thread Xiaocheng Luan
Just to add to the thoughtful responses from the others, it isn't really that bad to do a new search each time. First, the later searches may likely be "warm" searches and thus won't take as long as the first search; second, it's the searcher.doc(docId) part that will likely hurt the most, but h

Re: setBoost on Field

2007-03-31 Thread Xiaocheng Luan
what fields did you search, the headlines field only? DECAFFMEYER MATHIEU <[EMAIL PROTECTED]> wrote: setBoost on Field Hi, I am parsing this file called Logistics.htm I have a field named "headlines" that contains word "clients" among others. When I don't put a boost on this field,

Re: Virtually merge two indexes?

2007-03-26 Thread Xiaocheng Luan
How the indexes will be searched, do you need to search fields in both indexes? If the ParallelReader is not an attractive solution for you, finding a general solution may be difficult. Would it be possible to explore solutions that may work for your specific case? Just a thought. Xiaocheng C

Re: Search Design Question

2007-03-24 Thread Xiaocheng Luan
Hi Michael, if I understand your questions correctly - feels like I must have missed something - here is what can do to achieve what you want: index these fields: to from content subject all (includes text from all the above 4 fields) and use "all" as your default search field. Then when you

Re: how to get approximate total matching

2007-03-14 Thread Xiaocheng Luan
If I remember correctly, I once searched over 40G of indexes using multi-searcher with 512M max heap size, how much memory did you give the JVM? Thanks, Xiaocheng senthil kumaran <[EMAIL PROTECTED]> wrote: Hi. I have more index directories (>6) all in GB,and searching my query with single Ind

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Xiaocheng Luan
; wrote: Ya...I think i will store it in the database so that later it could be used in scoring/ranking for retrieval...:) Another thing i would like to see is whether the precision or recall will be much affaected by this... Regards, Maureen Xiaocheng Luan wrote:One side-effect of

Re: Complete field search

2007-03-13 Thread Xiaocheng Luan
Or, you may index the fields that you want "exact matches" as non-tokenized. Thanks, Xiaocheng Bhavin Pandya <[EMAIL PROTECTED]> wrote: Hi kainth, >So for example if I have a field with this text: "world cup" and I do a >search for "cup" I want it to return false but for another field that >conta

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Xiaocheng Luan
One side-effect of turning off the norms may be that the scoring/ranking will be different? Do you need to search by each of these many fields? If not, you probably don't have to index these fields (but store them for retrieval?). Just a thought. Xiaocheng Michael McCandless <[EMAIL PROTECTED]>

Re: How to set query time scoring

2006-12-07 Thread Xiaocheng Luan
Try to play with the similarity class/subclasses, it might help. For example, you may adjust the coord to increase the chance (not necessary guarantee?) that ORed results will be after the ANDed results; adjust the sloppy factor to favor phrases, etc. Xiaocheng Sajid Khan <[EMAIL PROTECTED]> wro

Re: Creating a new index from an existing index

2006-08-30 Thread Xiaocheng Luan
urces, you store the raw data locally in case you > need to do this again in the future. I know that's not much help, but > > Or, figure out how to make Lucene update-in-place, write the code, test it > and submit a patch. I'm sure Erik, Otis et.al. would offer you profuse >

Re: Creating a new index from an existing index

2006-08-29 Thread Xiaocheng Luan
ct the document from the index without potentially losing information. Hope this helps Erick On 8/29/06, Xiaocheng Luan wrote: > > Hi, > Got a question. Here is what I want to achieve: > > Create a new index from an existing index, to change the boosting factor > for some of the

Creating a new index from an existing index

2006-08-29 Thread Xiaocheng Luan
Hi, Got a question. Here is what I want to achieve: Create a new index from an existing index, to change the boosting factor for some of the documents (and potentially some other tweaks), without reindexing it from the source. Is there any tools or ways to do this? Thanks! Xiaocheng Luan

(Lucene) tools/algorithms for co-occurrence terms computation

2006-05-10 Thread Xiaocheng Luan
"pandemic", depending on the underlying data set. It may be precompued or dynamically computed on a small data set, any help wil be highly appreciated. Thanks! Xiaocheng Luan __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Spellchecker bug (or feature?)

2006-03-31 Thread Xiaocheng Luan
Not sure if this is the right place to report this issue: The accuracy value, which can be set via setAccuracy(), is being modified in SpellChecker.java when a word is checked. As a result, the "min" may be pushed very high and will not suggest anything for later requests. One workar

Does Lucene support on-disk search?

2006-03-08 Thread Xiaocheng Luan
Hi, I heard that Lucene loads the index into memory to do a search, which does not sound quite right to me. I will not be surprised if Lucene is smart enough to load the index into memory when it is feasible, but I'd be surprised if it ALWAYS loads index memory to do the search, which I