Re: JVM Crash in Lucene

2005-12-08 Thread Yonik Seeley
The only problems I've had with 1.5 JVM crashes and Lucene was related to stack overflow... try increasing the stack size and see of anything different happens. My crashes happened while trying to use Luke to open a 4GB index with thousands of indexed fields. -Yonik -

Re: JVM Crash in Lucene

2005-12-08 Thread Chris Hostetter
: I'm relatively new to Lucene. When I run my app, I get a JVM error. : This gets called a lot, but only fails every once in awhile (maybe 1 in : 100 calls?) i'm not that familiar with TermFreqVectors, and I have no idea what indexManager is, but I'm suprised this works at all ... I thought calli

JVM Crash in Lucene

2005-12-08 Thread Dan Gould
Hi-- I'm relatively new to Lucene. When I run my app, I get a JVM error. This gets called a lot, but only fails every once in awhile (maybe 1 in 100 calls?) I filed a report with Sun, but I don't expect to hear anything from them. So, I was wondering if any Lucene experts have run across th

RE: delete and optimize

2005-12-08 Thread Dan Liu
There IS difference between something being marked as deleted and something is actually deleted. As these marked as deleted can be undeleted. The document is marked as deleted even before the reader is closed. There is an example in "Lucene in Action". /dan -Original Message- From: Dan Q

RE: delete and optimize

2005-12-08 Thread Dan Quaroni
I'm confused by what you mean - there is no difference between something being marked as deleted and deleted. (Since it's not removed from the index until optimization) I've found that unless I close(), the document isn't even marked for deletion. And if I recall, I think I also had to close

Re: pdf and highlighting

2005-12-08 Thread Erik Hatcher
On Dec 8, 2005, at 10:51 AM, Sonja Löhr wrote: Thank you both, I found it (I really asked a bit too early, sorry) The highlighter works correct if I use my custom Analyzer during indexing (and for QueryParser), BUT when preparing the TokenStream to feed the highlighter, I must NOT use it.

RE: delete and optimize

2005-12-08 Thread Dan Liu
The document is markded as "deleted" when reader.delete(i) is called. It is actually deleted from index when reader.close(). The deleted douments seems put in a separate file with extension ".del" in the index folder. When optimiation happens after deletion, the ".del" file is gone, and Document

Re: delete and optimize

2005-12-08 Thread Michael D. Curtin
Mordo, Aviran (EXP N-NANNATEK) wrote: Optimization also purges the deleted documents, thus reduces the size (in bytes) of the index. Until you optimize documents stay in the index only marked as deleted. Deleted documents' space is reclaimed during optimization, 'tis true. But it can also be

RE: delete and optimize

2005-12-08 Thread Mordo, Aviran (EXP N-NANNATEK)
Optimization also purges the deleted documents, thus reduces the size (in bytes) of the index. Until you optimize documents stay in the index only marked as deleted. -Original Message- From: Dan Liu [mailto:[EMAIL PROTECTED] Sent: Thursday, December 08, 2005 2:00 PM To: java-user@lucene.

RE: delete and optimize

2005-12-08 Thread Dan Liu
The document is indexed first. This is required by the application. Based on Lucene in Action", "Optimizaation" is to merge multiple index files together in order to reduce their number and thus minimize the time it takes to read at search time" The approach1 does deletion on an optimized index. S

Re: Merging with IndexWriter.addIndexes(...)

2005-12-08 Thread Doug Cutting
J.J. Larrea wrote: So... I notice that both IndexWriter.addIndexes(...) merge methods start and end with calls to optimize() on the target index. I'm not sure whether that is causing the unpacking and repacking I observe, but it does wonder whether they truly need to be there: I don't recall

RE: delete and optimize

2005-12-08 Thread Mordo, Aviran (EXP N-NANNATEK)
Well the best way in my opinion is to: 1) open the IndexReader and delete some documents from the same index 2) close the IndexReader 3) open IndexWriter and index documents 4) optimize the indexWriter and close the indexWriter For best performance you want the optimization to be

Re: words with more than 1 hyphen ?

2005-12-08 Thread Beady Geraghty
Thanks for the advice. It is hard to say whether the useability folks want to distinguish between "/usr/include" as oppose to "usr include". Actually, I am sure that they would, but whether they would accept "usr include" is the right question to ask :-) I'll have to sort it out with them :-( Tha

delete and optimize

2005-12-08 Thread Dan Liu
Hi, What is the difference between following approaches? Approach1 1) open IndexWriter and index documents 2) optimize the indexWriter and close the indexWriter 3) open the IndexReader and delete some documents from the same index 4) close the IndexReader Approach2

Re: words with more than 1 hyphen ?

2005-12-08 Thread Erik Hatcher
On Dec 8, 2005, at 10:15 AM, Beady Geraghty wrote: Since someone suggested hyphen, the next requestion is underscore. I can see more and more of these requests. Also, people might like to search for "/usr/include/wchar.h" (hence, the slash) and apostrophe etc. There really isn't a set of re

RE: pdf and highlighting

2005-12-08 Thread Sonja Löhr
Thank you both, I found it (I really asked a bit too early, sorry) The highlighter works correct if I use my custom Analyzer during indexing (and for QueryParser), BUT when preparing the TokenStream to feed the highlighter, I must NOT use it. TokenStream tStream = new GermanAnalyzer().tokenSt

Re: words with more than 1 hyphen ?

2005-12-08 Thread Beady Geraghty
Thank you for your answer. I would like to not give you a "general" question so that I can understand more. But, I have random requests from people. For example, this request for hyphen is originated from a colleaque who is French, and she believes that hyphen is important, though, I don't know w

RE: Lucene performance bottlenecks

2005-12-08 Thread Dalton, Jeffery
Andrzej, I think you did a great job elucidating my thoughts as well. I heartily concur with everything you said. Andrzej Bialecki Wrote: > Hmm... Please define what "adequate" means. :-) IMHO, > "adequate" is when for any query the response time is well > below 1 second. Otherwise the serv

Re: Top n Searches

2005-12-08 Thread msftblows
I had to do the same thing and I used Log4J...that will do the trick for you. -Original Message- From: Cheolgoo Kang <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thu, 8 Dec 2005 17:51:23 +0900 Subject: Re: Top n Searches Hi, You first save those search keywords entered by

Re: pdf and highlighting

2005-12-08 Thread Erik Hatcher
Wow, those were some great details. But, as I hope you've seen with some other recent issues, things become so much clearer when you can isolate the issues. This is one reason that test-driven development with unit tests is so amazingly helpful. If you could isolate a single PDF going th

Index merging

2005-12-08 Thread Paul . Illingworth
Hello all, Whilst merging one index into another using IndexWriter.addIndexes(IndexReader[]) I got the following error. (index _file_path)\_5z.fnm (The system cannot find the file specified) It would appear that this occurred during the adding of the indexes. The indexes I was merging to an

RE: pdf and highlighting

2005-12-08 Thread mark harwood
> if it comes from PdfBox, the wrong text is > highlighted. Wrong in what sense? A couple of things to consider from looking at your code. * It is preferable to pass a rewritten query to the highlighter (pass the same rewritten query to searcher if you want to avoid query rewriting costs twice).

RE: pdf and highlighting

2005-12-08 Thread Sonja Löhr
Hi, Eric and the other experts! I'll try to collect some code fragments. Many things are configurable and I wrote a Crawler for indexing, but the rest is very close to the examples in "Lucene in Action". I hope I chose the appropriate snippets. The analyzer I use is created once and stored in a

Re: pdf and highlighting

2005-12-08 Thread Erik Hatcher
Sonja, Do you have an example, or at least some relevant code, that would help the community in helping resolve this? Erik On Dec 8, 2005, at 4:24 AM, Sonja Löhr wrote: Hi, all! I have a question concerning analysis and highlighting. I'm indexing multiple document formats (up to

pdf and highlighting

2005-12-08 Thread Sonja Löhr
Hi, all! I have a question concerning analysis and highlighting. I'm indexing multiple document formats (up to now, only html and pdf occured, and use the highlighter from the Lucene sandbox. The documents text is extracted via JTidy and PDFBox, respectively, then in both indexing and search anal

Re: Lucene performance bottlenecks

2005-12-08 Thread Andrzej Bialecki
(Moving the discussion to nutch-dev, please drop the cc: when responding) Doug Cutting wrote: Andrzej Bialecki wrote: It's nice to have these couple percent... however, it doesn't solve the main problem; I need 50 or more percent increase... :-) and I suspect this can be achieved only by som

Re: Top n Searches

2005-12-08 Thread Cheolgoo Kang
Hi, You first save those search keywords entered by users into some kind of storage like a database system or even into a dedicated Lucene index. So it's a database and web issue, not a Lucene one. And, as you know, Lucene does not provide this functionality out of the box. Good luck! On 12/8/0

Top n Searches

2005-12-08 Thread Paul Williams
Hi, I've been asked whether we can do a Top n Searches functionality where we record the most common searched for phrases on a daily basis. I'm not sure where to start with this or even if this is feasible with Lucene. Anyone done anything similar? Cheers. Paul Williams.

Re: Confused about ... [SOLVED]

2005-12-08 Thread Alan Chandler
On Wednesday 07 Dec 2005 22:23, Chris Hostetter wrote: > -- the real issue is that your query should matches a certain set of > documents, if there is a document you've added to the index that you > expect to see in that result but isn't there, then use Luke or > something like it to verify: > 1)