> Van: Erick Erickson [mailto:[EMAIL PROTECTED]
> Verzonden: maandag 14 augustus 2006 16:52
> Aan: java-user@lucene.apache.org
> Onderwerp: Re: Index not recreated
>
> You have all my sympathy. Let me see if I can restate your
> problem.
>
> "Hey Ron. The indexing process doesn't work. We c
Daniel Naber wrote:
Hi,
as some of you may have noticed, Lucene prefers shorter documents over
longer ones, i.e. shorter documents get a higher ranking, even if the
ratio "matched terms / total terms in document" is the same.
For example, take these two artificial documents:
doc1: x 2 3 4 5
I actually suspect that your process isn't hung, it's just taking forever
because it's swapping a lot. Like a really, really, really lot. Like more
than you ever want to deal with .
I think you're pretty much forced, as Lin said, to use a filter. I was
pleasantly surprised at how quickly filters
To avoid "TooManyClauses", you can try Filter instead of Query. But that
will be slower.
Form what I see is that there are so many keys that match your query, it
will be tough for Lucene.
On 8/14/06, Van Nguyen <[EMAIL PROTECTED]> wrote:
It was how I was implementing the search.
I am using a b
It was how I was implementing the search.
I am using a boolean query. Prior to the 7GB index, I was searching
over a 150MB index that consist of a very small part of the bigger
index. I was able to set my BooleanQuery to
BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE) and that worked fine.
B
PS...
The "intermittent" nature of your problem points to a concurrency issue.
Does the production environment have a greater number of users? If so, this
likely translates to a greater number of threads acting upon the index. I'd
be looking for possible conflicts between different threads acce
My advice would be the "back-to-basics" approach. Create a test case which
creates a simple index with a few documents, verify the index is as you
expect, then re-create the index and verify again. Run this test case on
your production environment (if you are able). This will determine once and
Hi,
as some of you may have noticed, Lucene prefers shorter documents over
longer ones, i.e. shorter documents get a higher ranking, even if the
ratio "matched terms / total terms in document" is the same.
For example, take these two artificial documents:
doc1: x 2 3 4 5 6 7 8 9 10
doc2: x x 3
On Montag 14 August 2006 17:50, Suba Suresh wrote:
> I have some stored emails in folders in my local
> disk and huge list of email archives in another system.
Lucene can only index plain text, so if you can convert these mails to text
you can index them without any problem.
Regards
Daniel
--
Hi!
Can someone help me?
suba suresh
Suba Suresh wrote:
I was looking at "http://www.tropo.com/techno/java/lucene/imap.html"; and
my understanding is it is used to retrieve and index the emails that is
on the email server. I have some stored emails in folders in my local
disk and hug
Therre's a lot of information in your email, and a lot of questions that
relate to similar topics and address different ways of acomplishing
similar but different things ... too much for me to digest
all at once, so lemme start by seeing if i can summarize your goal, and
then give you my suggestio
2GB limitation only exists when you want to put them to memory in 32bits
box.
Our index size is larger than 13 giga bytes, and it works fine.
I think it must be something error in your design. You can use Luke to see
what happened in your index.
On 8/14/06, Van Nguyen <[EMAIL PROTECTED]> wrote:
Hi,
I have a 7GB index (about 45 fields per document X roughly
5.5 million docs) running on a Windows 2003 32bit machine (dual proc, 2GB
memory). The index is optimized. Performing a search on this index
will just “hang” when performing the search (wild card query with a
sort). At fi
I have a couple of fields like this (e.g., a given case can have 1:many
case numbers and 1:many defendant aliases). So there's no problem with
adding the same field n times to a given document? If so, that's
perfect and i'll add it to the faq. I was concatenating before and
getting false matche
The important detail here is what you mean by "single server"?
A high-end server will work just fine - you want 4GB+ or RAM and the fastest
disk/IO you can get; CPU speed is far less important; A nice Linux software
RAID and 5+ 15K SCSI disks will get you superb performance, at a reasonable
price.
I was looking at "http://www.tropo.com/techno/java/lucene/imap.html"; and
my understanding is it is used to retrieve and index the emails that is
on the email server. I have some stored emails in folders in my local
disk and huge list of email archives in another system. Is there a way I
could
You have all my sympathy. Let me see if I can restate your problem.
"Hey Ron. The indexing process doesn't work. We can't/won't let you look at
the process or the results. We can't/won't let you look at the finished
product. We can't/won't let you on the machine where it fails. Now fix it"
..
I've been puzzling this one for a while now, and can't figure it out.
The idea is to allow stemmed searches and exact matches (tokenized, but
unstemmed phrase searches) on the same field. The subject of this email
had "same" in quotes, because it's from the search-client perspective
that the sa
Thanks for your response, comments are below. I'm using Lucene 1.9.1.
> Van: Erick Erickson [mailto:[EMAIL PROTECTED]
> Verzonden: maandag 14 augustus 2006 16:20
> Onderwerp: Re: Index not recreated
>
> My first suspicion is that you have duplicate documents on
> the *input* side, or are some
My first suspicion is that you have duplicate documents on the *input* side,
or are somehow adding documents more than once. I use code similar to yours
and it works just fine for me.
How big is the index before and after you re-create it? Twice the size and
you're appending, not twice then..
Hi,
I'm experiencing the problem that my index does not seem to be
recreated, despite using the correct flags. The result is that documents
that represent equal database rows occur multiple times in the index. I
recreate my entire index each night.
My IndexDirectory/IndexWriter construction cod
Thanks! I did not notice that the code was lower-casing the query string!
Regards,
Nina
> I am refactoring our search code that was written prior to 1.4.3. I am
> using Lucene 2.0 now. The search string entered by users was actually
> parsed by our custom code to generate the query.
You will run into problems with the sorting if you can't hold the fieldcache
for long intervals. I'm working on a system containing 300 million docs. And I
ran into sorting issues after only 5 million docs, but then again I can't hold
my IndexSearcher open for so long intervals since I'm dealing
I am refactoring our search code that was written prior to 1.4.3. I am
using Lucene 2.0 now. The search string entered by users was actually
parsed by our custom code to generate the query. This code was getting
fairly big and messy and I'm changing the code to use Lucene's query
parsers t
Hello,
I am refactoring our search code that was written prior to 1.4.3. I am
using Lucene 2.0 now. The search string entered by users was actually
parsed by our custom code to generate the query. This code was getting
fairly big and messy and I'm changing the code to use Lucene's query
par
Thanks for the replies on my question.
In the end I've taken the StandardAnalyser grammar, modified it and
generated a new analyser with JavaCC. Seems to be working a treat!
Adrian
On 11 Aug 2006, at 14:32, Erik Hatcher wrote:
On Aug 11, 2006, at 1:23 AM, Martin Braun wrote:
Hello Adrian
26 matches
Mail list logo