Hi All.
Firstly I am new to using Lucene and all its API's.
I am trying to evaluate if Lucene can solve the following problem for me.
1. I need to temporarilly index sets of documents on the Fly say 100 at a
Time.
This seems simple enough - I create a Index either on the File System or
in Me
I'm using apache commons pooling library to pool the IndexSearcher, so that
my system
can provide high performance.
I wonder if it is reasonable to pooling objects of lucene?
If yes, is there any other objects I can also pool?
Thank you!
A single IndexSearcher is all a system needs to use (in the basic
sense). Pooling multiple instances pointing to the same index won't
benefit your performance. Things get trickier when you are updating
the index and want to see the updates.
Erik
On May 19, 2006, at 5:13 AM, Zhen
Hi All.
Firstly I am new to using Lucene and all its API's.
I am trying to evaluate if Lucene can solve the following problem for me.
1. I need to temporarilly index sets of documents on the Fly say 100 at a
Time.
This seems simple enough - I create a Index either on the File System or
in Me
Hi
I am new to Lucene so am perhaps missing something obvious. I have included
Lucene 1.9.1 in my classpath and am trying to integrate it with MySQL.
I have a table which has near a million records in it. According to the documentation on Lucene I have read so far, my understanding is that I ne
Hi
I am new to Lucene so am perhaps missing something obvious. I have
included Lucene 1.9.1 in my classpath and am trying to integrate it with
MySQL.
I have a table which has near a million records in it. According to the
documentation on Lucene I have read so far, my understanding is that
I need to know how the following analyzers work:
Whitespace
Keyword
I am looking for an analyzer that will result in a hit if the string that is
queried appears in the document being searched. For example, if I am looking
for "A_B_C", then I want the analyzer to detect all of the following
patte
It's impossible to tell from the code you provided, but you are most likely
just leaking memory/resources somewhere. For example, ResultSet's and other DB
operations should typically be placed in a try/catch/FINALLY block, where the
finally block ensures all DB resources are closed/released.
O
I guess you are executing your SQL and getting the whole result set. There
are options on the JDBC Statement class that can be used for controlling
the fetch size - by using these you should be able to limit the amount of
data returned from the database so you don't get OOM. I haven't used the
The Keyword analyzer does no stemming or input modification of any sort:
think of it as WYSIWYG for index population. The Whitespace analyzer simply
removes spaces from your input (still no stemming), but the tokens are the
individual words. I don't have the code in front of me, so I'm not sure
Thanks Paul and Otis
I basically applied the same mechanism used in creating indexes in MySQL
to Lucene. So I didnt use any fetchSize. But Ill implement it now and
see how it performs. Will also look into DBSight.
However when executing the query by limiting the result set to 10
the quer
Daniel,
Thanks for the clarification. What then is the difference between a
MultiSearcher and using an IndexSearcher on a MultiReader?
On 5/18/06, Daniel Naber <[EMAIL PROTECTED]> wrote:
On Donnerstag 18 Mai 2006 23:26, Tom Emerson wrote:
> OK, but what does "merged correctly" mean?
I assume
Hi Rahil,
Your out of memory error is likely due to a mysql bug outlined here:
http://bugs.mysql.com/bug.php?id=7698
There is a work around presented in the article. I have been able to select
large datasets from mysql while indexing by using the SQL_BIG_RESULT hint in
mysql and pumping up
Hi Dennis
Dennis Watson wrote:
Hi Rahil,
Your out of memory error is likely due to a mysql bug outlined here:
http://bugs.mysql.com/bug.php?id=7698
There is a work around presented in the article. I have been able to select
large datasets from mysql while indexing by using the SQL_BIG_RE
i assume when you say this...
: 1. I need to temporarilly index sets of documents on the Fly say 100 at a
: Time.
you mean that you'll have lots of temporary indexes of a few hundrad
documents and then you'll do a bunch of queries and throw the index away.
Even if i'm wrong most of the rest of m
Dear list,
I am interested in using Lucene for analyzing documents based on
occurrence of certain keywords. As such, I am not interested in the
'top' or 'best' documents, but I do want to know exactly how many words
in the query matched.
Thus, instead of the complicated formula used by default, I
Hi,
Is there any way to make sure, e.g. at least 2, terms of a subquery
are contained in the results? For example, with the query
"OR(t1,t2,t3) AND OR(t4,t5,t6)", the docs returned must contain either
2 or more of (t2,t3,t3) and either 2 or more of (t4,t5,t6). I've read
about Similarity, but it s
take a look at BooleanQuery.setMinimumNumberShouldMatch(int)
: Date: Sat, 20 May 2006 14:27:00 +0800
: From: Michael Chan <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Matching at least N terms of subqueries
:
: Hi,
:
: Is there any way t
18 matches
Mail list logo