Need some Advice on Searching

2006-05-19 Thread David Ahlschläger
Hi All. Firstly I am new to using Lucene and all its API's. I am trying to evaluate if Lucene can solve the following problem for me. 1. I need to temporarilly index sets of documents on the Fly say 100 at a Time. This seems simple enough - I create a Index either on the File System or in Me

does anybody have the experience to do some pooling upon lucene?

2006-05-19 Thread Zhenjian YU
I'm using apache commons pooling library to pool the IndexSearcher, so that my system can provide high performance. I wonder if it is reasonable to pooling objects of lucene? If yes, is there any other objects I can also pool? Thank you!

Re: does anybody have the experience to do some pooling upon lucene?

2006-05-19 Thread Erik Hatcher
A single IndexSearcher is all a system needs to use (in the basic sense). Pooling multiple instances pointing to the same index won't benefit your performance. Things get trickier when you are updating the index and want to see the updates. Erik On May 19, 2006, at 5:13 AM, Zhen

Can lucene do this?

2006-05-19 Thread David Ahlschläger
Hi All. Firstly I am new to using Lucene and all its API's. I am trying to evaluate if Lucene can solve the following problem for me. 1. I need to temporarilly index sets of documents on the Fly say 100 at a Time. This seems simple enough - I create a Index either on the File System or in Me

OutOfMemory and IOException Access Denied errors

2006-05-19 Thread Rahil
Hi I am new to Lucene so am perhaps missing something obvious. I have included Lucene 1.9.1 in my classpath and am trying to integrate it with MySQL. I have a table which has near a million records in it. According to the documentation on Lucene I have read so far, my understanding is that I ne

OutOfMemory and IOException Access Denied errors

2006-05-19 Thread Rahil
Hi I am new to Lucene so am perhaps missing something obvious. I have included Lucene 1.9.1 in my classpath and am trying to integrate it with MySQL. I have a table which has near a million records in it. According to the documentation on Lucene I have read so far, my understanding is that

Analyzer question

2006-05-19 Thread AsifTheManRahman
I need to know how the following analyzers work: Whitespace Keyword I am looking for an analyzer that will result in a hit if the string that is queried appears in the document being searched. For example, if I am looking for "A_B_C", then I want the analyzer to detect all of the following patte

Re: OutOfMemory and IOException Access Denied errors

2006-05-19 Thread Otis Gospodnetic
It's impossible to tell from the code you provided, but you are most likely just leaking memory/resources somewhere. For example, ResultSet's and other DB operations should typically be placed in a try/catch/FINALLY block, where the finally block ensures all DB resources are closed/released. O

Re: OutOfMemory and IOException Access Denied errors

2006-05-19 Thread Paul . Illingworth
I guess you are executing your SQL and getting the whole result set. There are options on the JDBC Statement class that can be used for controlling the fetch size - by using these you should be able to limit the amount of data returned from the database so you don't get OOM. I haven't used the

Re: Analyzer question

2006-05-19 Thread Jeff Rodenburg
The Keyword analyzer does no stemming or input modification of any sort: think of it as WYSIWYG for index population. The Whitespace analyzer simply removes spaces from your input (still no stemming), but the tokens are the individual words. I don't have the code in front of me, so I'm not sure

Re: OutOfMemory and IOException Access Denied errors

2006-05-19 Thread Rahil
Thanks Paul and Otis I basically applied the same mechanism used in creating indexes in MySQL to Lucene. So I didnt use any fetchSize. But Ill implement it now and see how it performs. Will also look into DBSight. However when executing the query by limiting the result set to 10 the quer

Re: How are results merged from a multisearcher?

2006-05-19 Thread Tom Emerson
Daniel, Thanks for the clarification. What then is the difference between a MultiSearcher and using an IndexSearcher on a MultiReader? On 5/18/06, Daniel Naber <[EMAIL PROTECTED]> wrote: On Donnerstag 18 Mai 2006 23:26, Tom Emerson wrote: > OK, but what does "merged correctly" mean? I assume

Re: OutOfMemory and IOException Access Denied errors

2006-05-19 Thread Dennis Watson
Hi Rahil, Your out of memory error is likely due to a mysql bug outlined here: http://bugs.mysql.com/bug.php?id=7698 There is a work around presented in the article. I have been able to select large datasets from mysql while indexing by using the SQL_BIG_RESULT hint in mysql and pumping up

Re: OutOfMemory and IOException Access Denied errors

2006-05-19 Thread Rahil
Hi Dennis Dennis Watson wrote: Hi Rahil, Your out of memory error is likely due to a mysql bug outlined here: http://bugs.mysql.com/bug.php?id=7698 There is a work around presented in the article. I have been able to select large datasets from mysql while indexing by using the SQL_BIG_RE

Re: Need some Advice on Searching

2006-05-19 Thread Chris Hostetter
i assume when you say this... : 1. I need to temporarilly index sets of documents on the Fly say 100 at a : Time. you mean that you'll have lots of temporary indexes of a few hundrad documents and then you'll do a bunch of queries and throw the index away. Even if i'm wrong most of the rest of m

Scoring purely on term frequencies

2006-05-19 Thread W.H. van Atteveldt
Dear list, I am interested in using Lucene for analyzing documents based on occurrence of certain keywords. As such, I am not interested in the 'top' or 'best' documents, but I do want to know exactly how many words in the query matched. Thus, instead of the complicated formula used by default, I

Matching at least N terms of subqueries

2006-05-19 Thread Michael Chan
Hi, Is there any way to make sure, e.g. at least 2, terms of a subquery are contained in the results? For example, with the query "OR(t1,t2,t3) AND OR(t4,t5,t6)", the docs returned must contain either 2 or more of (t2,t3,t3) and either 2 or more of (t4,t5,t6). I've read about Similarity, but it s

Re: Matching at least N terms of subqueries

2006-05-19 Thread Chris Hostetter
take a look at BooleanQuery.setMinimumNumberShouldMatch(int) : Date: Sat, 20 May 2006 14:27:00 +0800 : From: Michael Chan <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Matching at least N terms of subqueries : : Hi, : : Is there any way t