RE: Memory Usage

2005-11-15 Thread Vanlerberghe, Luc
Since an IndexReader can't know what indexInterval was used and that each segment could have a different indexInterval, wouldn't it be better too have a parameter that sets an average indexInterval that should be used? The fraction you talk about could then be calculated by the IndexReader per segm

RE: References to deleted file handles in long-running server application

2005-11-18 Thread Vanlerberghe, Luc
Good rule of thumb: don't ever count on the garbage collector cleaning up for you (even if you call System.gc() to give it a hint). You should close your IndexSearchers, but with a multithreaded application it's difficult to know when (you have to keep them open until no thread uses it any more)

RE: Determine the index of a hit after using MultiSearcher

2005-11-29 Thread Vanlerberghe, Luc
There's a public method "int subSearcher(int n)" in MultiSearcher. If you pass it a document id (not the hit sequence number!), it returns the number of the searcher that contains that document id (in the array you passed to the constructor of MultiSearcher) Luc -Original Message- From

RE: Lucene performance bottlenecks

2005-12-07 Thread Vanlerberghe, Luc
Since 'byte' is signed in Java, can't the first test be simply written as if (b>0) return b; Doing an 'and' of two bytes and checking if the result is 0 probably requires masking operations on >8 bit processors... Also perhaps change to int b=readByte()) so that all operators use int's... Luc --

RE: Basic lucene usage

2005-12-23 Thread Vanlerberghe, Luc
IndexWriter will try to delete obsolete files when it is closed. If the files cannot be deleted immediately for whatever reason (usually because some IndexReaderIndexSearcher has them open) it will list the files to deleted in the file "deletable". On the next update of the index, the other IndexR

RE: searching and indexing simultaneously...

2006-01-05 Thread Vanlerberghe, Luc
One reader/searcher per server. My configuration uses - one Lucene index in a shared location, - one server that uses either a single IndexReader or a single IndexWriter to delete or add documents - several servers that read/search the index. The 'search' servers each have a single IndexReader op

RE: online incremental indexing

2006-01-09 Thread Vanlerberghe, Luc
You can open a new instance *before* closing the previous one. If you have queries that occur often, you can 'warm up' the new instance before starting to use it. Just make sure you don't close an IndexSearcher instance that is in use by Hits instances... Luc -Original Message- From: zzz

RE: index merging

2006-02-05 Thread Vanlerberghe, Luc
Sorry to contradict you Yonik, but I'm pretty sure the commit lock is *not* locked during a merge, only while the "segments" file is being updated. The merge process takes a set of 'old' segment files, writes new segment files and 'registers' them in the "segments" file when they are ready to be

RE: two applications accessing same index

2006-02-05 Thread Vanlerberghe, Luc
Sure, the only danger is you have to make sure that both processes store their lock files in the same directory (default they are in your home directory I believe) unless you use a different locking mechanism. There are supposed to be problems when accessing indices over network shares, but I use

RE: Iterating hits

2006-02-16 Thread Vanlerberghe, Luc
My guess is you are using the same reader both for searching and deleting. The Hits class buffers the first 100 hits, and when you go beyond that, it reruns the query to get more hits. If you use the same reader, the searcher probably doesn't return the same results the second time. If different

RE: segments.new

2006-03-15 Thread Vanlerberghe, Luc
Are you using Lucene 1.4.3 ? There's a bug report in JIRA (LUCENE-481) with a patch that solves this. On Windows, files cannot be deleted while they are open and before the patch, calling getCurrent or isCurrent in one process could block another one from updating the segments file. The patch in

RE: segments.new

2006-03-15 Thread Vanlerberghe, Luc
-user@lucene.apache.org Subject: RE: segments.new Yes I use the Lucene 143 Could you send me the link for this patch? Thanks in advance -Original Message- From: Vanlerberghe, Luc [mailto:[EMAIL PROTECTED] Sent: mercredi 15 mars 2006 13:38 To: java-user@lucene.apache.org Subject: RE: segments.new Are you

RE: Re-creating IndexSearcher after update

2006-03-21 Thread Vanlerberghe, Luc
Yep, I created DelayCloseIndexSearcher just for this scenario and it's running in production for about half a year now... There's an usage example in the javadoc, but it can be optimised even more (without touching the code that does the searches, handles the hits, etc...). In my production envi

RE: Errors when searching index and writing to index simultaenously

2006-03-22 Thread Vanlerberghe, Luc
Make sure both the indexing process and the searcher process use the same directory to store the Lock files (default your home directory I believe). Luc -Original Message- From: Satuluri, Venu_Madhav [mailto:[EMAIL PROTECTED] Sent: woensdag 22 maart 2006 14:14 To: java-user@lucene.apache

RE: Passing XML objects to the analyzer ?

2005-04-20 Thread Vanlerberghe, Luc
The problem with this approach is that the Analyser you will use for indexing will be *very* different from the one used for searching. The way I see it, the Document objects pqssed to Lucene should contain fields that are as much text based as possible, comparable to what a user would type whi

RE: WhiteSpace Tokenizer question

2005-08-23 Thread Vanlerberghe, Luc
The query string is first parsed by QueryParser and what it believes to be single terms are then passed on to your analyzer. QueryParser only considers space, tab, \n and \r to be white space (See QueryParser.jj) QueryParser itself is not aware that '-' should be treated as white space so in your

RE: QueryParser not thread-safe

2005-08-24 Thread Vanlerberghe, Luc
Thanks for pointing that out! I checked the source and QueryParser is indeed not thread-safe (the presence of local variables like jj_lastpos that are used *during* the parsing makes this obvious) Perhaps it should be explicitly mentioned in the javadoc. The solution I'll probably go for is using

RE: Document visible by Term, but not search

2005-08-25 Thread Vanlerberghe, Luc
Is your Analyzer aware that that particular field does not need to be tokenized? During indexation, if a field is passed that is passed as tokenize=false, the analyzer won't be called so the string will be stored as-is. During searching, the queryparser doesn't know which fields should be tokeniz

RE: Can't return Hits!

2005-09-01 Thread Vanlerberghe, Luc
Keep the IndexSearcher object you used to get the Hits open until you have finished with them... Luc -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: donderdag 1 september 2005 10:14 To: java-user@lucene.apache.org Subject: Can't return Hits! Hi, i want to r

RE: Small problem in searching

2005-09-16 Thread Vanlerberghe, Luc
You could also add a field with all the terms reversed during the indexation. So documents containing "tirupathireddy" or "venkatreddy" would have "ydderihtapurit" and "yddertaknev" in the reversed field. If you detect that the user entered a suffix query like "*reddy", transform it into a prefix

RE: live update of index used by Tomcat

2005-09-20 Thread Vanlerberghe, Luc
You should keep your IndexReader open until the merge has finished, but also until there are no more Hits Objects that depend on it (tricky in multithreaded environments like tomcat). The fact that the files cannot be deleted immediately after the merge is no problem. The filenames will be stored

RE: "Best-practice" in a web application AND live update of index used by Tomcat

2005-09-21 Thread Vanlerberghe, Luc
Are you sure that both processes use the same directory to store the Lock files? If both processes are on the same machine, they will both default to the same local directory and you won't see the problem. If they are on separate machines, you should set the lock directory to some shared locati

RE: live update of index used by Tomcat

2005-10-04 Thread Vanlerberghe, Luc
-user@lucene.apache.org Cc: Vanlerberghe, Luc Subject: RE: live update of index used by Tomcat > I can post the code and testcases if you're interested. Luc, that would be great as I have the very same problem. Regards, Carsten - T

RE: Renewing IndexSearcher on index change.

2005-10-04 Thread Vanlerberghe, Luc
I've just posted the solution I use as a jira attachment. See http://issues.apache.org/jira/browse/LUCENE-445 It was designed to be used in a multithread environment (tomcat) It contains javadoc to explain the usage. It extends IndexSearcher since that is the object that searches are executed ag

RE: IndexSearcher in servlet containers

2005-10-05 Thread Vanlerberghe, Luc
Take a look at the DelayCloseIndexSearcher I contributed yesterday. http://issues.apache.org/jira/browse/LUCENE-445 You should set up a SearcherFactory in an object that implements ServletContextListener that receives webapp startup/shutdown events and your servlets should get an IndexSearcher fr

RE: Query to return all documents in the index

2005-10-06 Thread Vanlerberghe, Luc
There is a MatchAllDocsQuery available (in the current development trunk I believe) in org.apache.lucene.search. I simply took the source and compiled it along with my project to use it... Luc -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chris Hostett