HOT SPOT VIRTUAL MACHINE aleatory crash while index documents

2006-04-07 Thread pepone pepone
Hi Lucene experts I have a program that use lucene to index the content of my objects Documents, Coments, etc after index a lot of documents (5000) (12000) is not always at same point i get this error # # An unexpected error has been detected by HotSpot Virtual Machine: # # SIGSEGV (0xb) at pc=

Indexing Problems

2006-04-07 Thread trupti mulajkar
hello, i have modified the IndexFiles.java to read the document numbers from within the TREC files, which are also being read correctly, however the index fails to create the .cfs file. thus the search query does not return the correct document number. any suggestions how this can be sorted? chee

Re: Lucene Sandbox - SearchBean

2006-04-07 Thread Erik Hatcher
There used to be a such a beast, and to get at it you'll need to resurrect it from the jakarta-lucene-sandbox CVS repository attic. We (well, I, but no one objected) chose not to bring it over as it was not a best-practice recommended way to work with Lucene search results. It had its own

Re: Lucene Sandbox - SearchBean

2006-04-07 Thread Chris Hostetter
The "Lucene Sandbox" is also known as the "Lucene contrib directory" which as of 1.9 is included in the core distribution (with each contrib module in it's own jar) however, there does not appear to be anything named "SearchBean" in contrib at the moment. : Date: Fri, 7 Apr 2006 15:38:50 -0500

Lucene Sandbox - SearchBean

2006-04-07 Thread Rajesh Munavalli
Can someone tell me where I can find the source code for SearchBean (Lucene Sandbox)? Thanks, --Rajesh

Re: Getting count of documents matching a query?

2006-04-07 Thread Chris Hostetter
first off: you should double check the correctness ofyour customized similarity class. I'm pretty sure it's resulting in a differnet set of matches then the DefaultSimilarity because your tf function returns 0f regardless of wether there is a match. (when i said "every function returns 0 or 1" i

Re: Getting count of documents matching a query?

2006-04-07 Thread Jason Calabrese
I just wrote some simple code to test this. For my test I ran the test with 3 queries: - A 3 term boolean - A single term query with over 5000 hits - A single term query with 0 hits For each query I ran the ran 4 tests of 10,000 searches: 1) using hits.length to get the counts and the standard si

Re: lucene indexing

2006-04-07 Thread Grant Ingersoll
Lucene does not provide this out of the box. You will have to write a program to do it and feed the results to Lucene. If I remember right, these files are in XML, so you can probably use SAX or a pull parser. I think a number of TREC participants, in the past, have used Lucene, so you may

Exception in WildCardQuery

2006-04-07 Thread Erick Erickson
So I'm trying to do silly stuff, just to poke a bit at wildcard queries. So sue me... But I ran across this And yes, I know that creating a wildcard query is dangerous and downright silly when you don't have a wildcard in the term, but this still seems like a case should, say, default to a sim

lucene indexing

2006-04-07 Thread trupti mulajkar
hi can anyone suggest how to split files using lucene. i am trying to index the TREC collection using lucene-1.4.3 i want lucene to read the multiple files within single TREC file and create an index accordingly. cheers, trupti mulajkar MSc Advanced Computer Science -

I just don't get wildcards at all.

2006-04-07 Thread Erick Erickson
OK, I know I'm asking you to write my code for me (or at least point me to an example), but I'm at my wits end, so please rescue me This is a reprise of TooManyClauses. We have a large amount of text, and a requirement to do a wildcard query. Of course, it's wy too big to use Wildcard or t

Re: doc.get("contents")

2006-04-07 Thread miki sun
Thanks Chris I just realize the "contents" in the index is not the "contents" in the original document. Miki Original Message Follows From: Chris Hostetter <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: doc.get("contents") Date:

RE: Optimize completely in memory with a FSDirectory?

2006-04-07 Thread Max Pfingsthorn
Hi all, Sorry for the noise, it was my own fault. After a look at the sources, I saw I misinterpreted the MaxBufferedDocs parameter. IndexWriter.maybeMergeSegments() seems to always merge everything if it is set so high. For my iterative updates of the index, it seems that the standard setting

Re: highlighting - fuzzy search

2006-04-07 Thread Fisheye
yes, this might be a way, but in my case it would not work: The probles is, that I have to return an exceprt (snippet) and the words to be highlighted as two separate strings. So now I use highlighter and getBestFragment to extract the excerpt, then I remove the inserted html tags and return the