Generally you are right. Except not exactly copy old index, but first index
new content in a new directory, and merge old index with the new index.
There is a quicker merge index method, indexWriter.addIndexesNoOptimize(),
but in general, merging index is slow, although quicker than re-index.
--
So I am assuming that is not just a matter of "indexing" to that same
directory as you "indexed" before.
So, based on what you are saying, you would have to reload the
previous index (eg, INDEX_DIR_OLD) and then index the new content.
When I mean "index", I am talking about actually invoking lucen
Hi all,
I found that instead of storing a term ID for a term in the index, Lucene
stores the actual term string value. I am wondering if there ever is such a
"term ID" for each distinctive term indexed in Lucne, similar as a "doc ID"
for each distinctive document indexed in Lucene.
In other words
I think you can simply change you sql to select only the recently updated
messages, and add to your existing index. Although adding to an existing
large index also takes a long time, it should be quicker than re-building
the whole index.
If your index continues to grow, you may need to have a dedi
I have been fine with my database (discussion forum) to lucene. I am taking
the simplest approach, eg; I have a discussion forum which are just text
messages, I take those out of the databse and then index the content.
I am having troubling because I have hundreds of thousands of messages and i
Which analyzer are you using in your query parser ?
can you share the one line of code in which you construct a QueryParser Object.
As you might be parsing a query string made of different fields, I
suggest you use a
PerFieldAnalyzerWrapper which lets you do unique analysis for different fie
You should test your Analyzer to confirm what tokens are being produced. You
can do this by using a helper class, to save time there is one written in
the Lucene in Action book called AnalyzerUtils, you should be able to get it
out of the download of sourcecode from the book here:
http://www.lucene
Have you tried giving the name field a boost? E.g. name:(John Smith)^10
alias:(John Smith)
i'm also guessing youd be much happier with a sloppy phrase query then
with the boolean queries you are currently using..
name:"John Smith"~3^10 alias:"John Smith"~3
-Hoss
-
Thanks for the reply - i have tried boosting but not like you stated. I have
tried to boost the Alias field so that it would score as high as a match on
the name field. But it didn't increase enough. like :
name:(John Smith) alias:(John Smith)^10
I think it has something to do with the fact that
Hi Everyone,
I am facing some strange behaviour with Analyzers. I am using SimpleAnalyzer
for some fields in my Compass entity, but I also wrote a custom Analyzer
that is slightly different from the SimpleAnalyzer as I wanted to allow even
letters and digits in company name column.
So custom analy
Kalvir,
Have you tried giving the name field a boost? E.g. name:(John Smith)^10
alias:(John Smith)
-M
On 8/31/07, Kalvir Sandhu <[EMAIL PROTECTED]> wrote:
>
> Hi all.
>
> I am working on building a lucene index to search names of people. I want
> to
> be able to score things differently. Here i
Hi all.
I am working on building a lucene index to search names of people. I want to
be able to score things differently. Here is an example of the behaviour i
need.
Doc 1 with aliases
name: Bob Jones
alias: John Smith Andrew Jones
Doc 2 without aliases
name: John Andrew Smith
alias: none
When
Hi Madhu,
Madhu wrote:
> i am indexing pdf document using pdfbox 7.4, its working fine for some pdf
> files. for japanese pdf files its giving the below exception.
>
> caught a class java.io.IOException
> with message: Unknown encoding for 'UniJIS-UCS2-H'
>
> Can any one help me , how to set th
I'm creating a tokenized "content" Field from a plain text file
using an InputStreamReader and new Field("content", in);
The text file is large, 20 MB, and contains zillions lines,
each with the the same 100-character token.
That causes an OutOfMemoryError.
Given that all tokens are the *same*,
Great!
Thanks !
--
Antoine Baudoux
Development Manager
[EMAIL PROTECTED]
Tél.: +32 2 333 58 44
GSM: +32 499 534 538
Fax.: +32 2 648 16 53
Le 31 Aug 2007 à 09:45, Michael Busch a écrit :
Antoine Baudoux wrote:
From what I have seen in the patch, It re-opens the segments th
Antoine Baudoux wrote:
> From what I have seen in the patch, It re-opens the segments tha
> have changed.
>
> So Imagine I always change the biggest sement (because that's where
> most docs are and i need to update them frequently) . Will there still
> be a benefit of IndexReader.reopen()?
From what I have seen in the patch, It re-opens the segments tha
have changed.
So Imagine I always change the biggest sement (because that's where
most docs are and i need to update them frequently) . Will there
still be a benefit of IndexReader.reopen()?
--
Antoine Baudoux
Development
17 matches
Mail list logo