The only problems I've had with 1.5 JVM crashes and Lucene was related
to stack overflow... try increasing the stack size and see of anything
different happens.
My crashes happened while trying to use Luke to open a 4GB index with
thousands of indexed fields.
-Yonik
-
: I'm relatively new to Lucene. When I run my app, I get a JVM error.
: This gets called a lot, but only fails every once in awhile (maybe 1 in
: 100 calls?)
i'm not that familiar with TermFreqVectors, and I have no idea what
indexManager is, but I'm suprised this works at all ... I thought calli
Hi--
I'm relatively new to Lucene. When I run my app, I get a JVM error.
This gets called a lot, but only fails every once in awhile (maybe 1 in
100 calls?)
I filed a report with Sun, but I don't expect to hear anything from them.
So, I was wondering if any Lucene experts have run across th
There IS difference between something being marked as deleted and
something is actually deleted. As these marked as deleted can be
undeleted.
The document is marked as deleted even before the reader is closed.
There is an example in "Lucene in Action". /dan
-Original Message-
From: Dan Q
I'm confused by what you mean - there is no difference between something being
marked as deleted and deleted. (Since it's not removed from the index until
optimization)
I've found that unless I close(), the document isn't even marked for deletion.
And if I recall, I think I also had to close
On Dec 8, 2005, at 10:51 AM, Sonja Löhr wrote:
Thank you both, I found it
(I really asked a bit too early, sorry)
The highlighter works correct if I use my custom Analyzer during
indexing
(and for QueryParser), BUT
when preparing the TokenStream to feed the highlighter, I must NOT
use it.
The document is markded as "deleted" when reader.delete(i) is called. It
is actually deleted from index when reader.close().
The deleted douments seems put in a separate file with extension ".del"
in the index folder.
When optimiation happens after deletion, the ".del" file is gone, and
Document
Mordo, Aviran (EXP N-NANNATEK) wrote:
Optimization also purges the deleted documents, thus reduces the size
(in bytes) of the index. Until you optimize documents stay in the index
only marked as deleted.
Deleted documents' space is reclaimed during optimization, 'tis true.
But it can also be
Optimization also purges the deleted documents, thus reduces the size
(in bytes) of the index. Until you optimize documents stay in the index
only marked as deleted.
-Original Message-
From: Dan Liu [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 08, 2005 2:00 PM
To: java-user@lucene.
The document is indexed first. This is required by the application.
Based on Lucene in Action", "Optimizaation" is to merge multiple index
files together in order to reduce their number and thus minimize the
time it takes to read at search time"
The approach1 does deletion on an optimized index. S
J.J. Larrea wrote:
So... I notice that both IndexWriter.addIndexes(...) merge methods start
and end with calls to optimize() on the target index. I'm not sure
whether that is causing the unpacking and repacking I observe, but it
does wonder whether they truly need to be there:
I don't recall
Well the best way in my opinion is to:
1) open the IndexReader and delete some documents from the same index
2) close the IndexReader
3) open IndexWriter and index documents
4) optimize the indexWriter and close the indexWriter
For best performance you want the optimization to be
Thanks for the advice.
It is hard to say whether the useability folks want
to distinguish between "/usr/include" as oppose to "usr include".
Actually, I am sure that they would, but whether they would
accept "usr include" is the right question to ask :-)
I'll have to sort it out with them :-(
Tha
Hi,
What is the difference between following approaches?
Approach1
1) open IndexWriter and index documents
2) optimize the indexWriter and close the indexWriter
3) open the IndexReader and delete some documents from the same
index
4) close the IndexReader
Approach2
On Dec 8, 2005, at 10:15 AM, Beady Geraghty wrote:
Since someone suggested hyphen, the next requestion
is underscore. I can see more and more of these requests.
Also, people might like to search for "/usr/include/wchar.h" (hence,
the slash) and apostrophe etc. There really isn't a set of
re
Thank you both, I found it
(I really asked a bit too early, sorry)
The highlighter works correct if I use my custom Analyzer during indexing
(and for QueryParser), BUT
when preparing the TokenStream to feed the highlighter, I must NOT use it.
TokenStream tStream = new GermanAnalyzer().tokenSt
Thank you for your answer.
I would like to not give you a "general" question so that I can
understand more.
But, I have random requests from people. For example,
this request for hyphen is originated from a colleaque who is French,
and she believes that hyphen is important, though, I don't
know w
Andrzej, I think you did a great job elucidating my thoughts as well. I
heartily concur with everything you said.
Andrzej Bialecki Wrote:
> Hmm... Please define what "adequate" means. :-) IMHO,
> "adequate" is when for any query the response time is well
> below 1 second. Otherwise the serv
I had to do the same thing and I used Log4J...that will do the trick for you.
-Original Message-
From: Cheolgoo Kang <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thu, 8 Dec 2005 17:51:23 +0900
Subject: Re: Top n Searches
Hi,
You first save those search keywords entered by
Wow, those were some great details. But, as I hope you've seen with
some other recent issues, things become so much clearer when you can
isolate the issues. This is one reason that test-driven development
with unit tests is so amazingly helpful. If you could isolate a
single PDF going th
Hello all,
Whilst merging one index into another using
IndexWriter.addIndexes(IndexReader[]) I got the following error.
(index _file_path)\_5z.fnm (The system cannot find the file specified)
It would appear that this occurred during the adding of the indexes. The
indexes I was merging to an
> if it comes from PdfBox, the wrong text is
> highlighted.
Wrong in what sense?
A couple of things to consider from looking at your
code.
* It is preferable to pass a rewritten query to the
highlighter (pass the same rewritten query to searcher
if you want to avoid query rewriting costs twice).
Hi, Eric and the other experts!
I'll try to collect some code fragments.
Many things are configurable and I wrote a Crawler for indexing, but the
rest is very close to the examples in "Lucene in Action". I hope I chose the
appropriate snippets.
The analyzer I use is created once and stored in a
Sonja,
Do you have an example, or at least some relevant code, that would
help the community in helping resolve this?
Erik
On Dec 8, 2005, at 4:24 AM, Sonja Löhr wrote:
Hi, all!
I have a question concerning analysis and highlighting. I'm indexing
multiple document formats (up to
Hi, all!
I have a question concerning analysis and highlighting. I'm indexing
multiple document formats (up to now, only html and pdf occured, and use the
highlighter from the Lucene sandbox.
The documents text is extracted via JTidy and PDFBox, respectively, then in
both indexing and search anal
(Moving the discussion to nutch-dev, please drop the cc: when responding)
Doug Cutting wrote:
Andrzej Bialecki wrote:
It's nice to have these couple percent... however, it doesn't solve
the main problem; I need 50 or more percent increase... :-) and I
suspect this can be achieved only by som
Hi,
You first save those search keywords entered by users into some kind
of storage like a database system or even into a dedicated Lucene
index. So it's a database and web issue, not a Lucene one.
And, as you know, Lucene does not provide this functionality out of the box.
Good luck!
On 12/8/0
Hi,
I've been asked whether we can do a Top n Searches functionality where
we record the most common searched for phrases on a daily basis. I'm
not sure where to start with this or even if this is feasible with
Lucene.
Anyone done anything similar?
Cheers.
Paul Williams.
On Wednesday 07 Dec 2005 22:23, Chris Hostetter wrote:
> -- the real issue is that your query should matches a certain set of
> documents, if there is a document you've added to the index that you
> expect to see in that result but isn't there, then use Luke or
> something like it to verify:
> 1)
29 matches
Mail list logo