Re: Split an existing index into smaller segments without a re-index?

2006-08-16 Thread Yonik Seeley
On 8/16/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote: I searched the mail list archives for an answer to that question; The closest (and perhaps the only) thread in this regard that I found is: http://www.gossamer-threads.com/lists/lucene/java-user/9928 So the answer was "No", but this is

Re: is there such an analyzer?

2006-08-16 Thread Erick Erickson
I suspect you'll have to roll your own. I'd use the SynonymAnalyzer from Lucene in Action as a model, starting around page 129. I really doubt that there's much you can expect Lucene to do for you for this specialized kind of tokenizing. Erick On 8/16/06, Van Nguyen <[EMAIL PROTECTED]> wrot

Re: Lucene/Tomcat Memory Leak Issue

2006-08-16 Thread adrena . keating
We are indexing the file server via Lucene. We have a 15 MB index file and have set up a once a day re-index and then switch to a continuous index update. So everytime a content item is published it is immediately indexed and deployed. Problem persists with both scenario's. We are using jconsole

is there such an analyzer?

2006-08-16 Thread Van Nguyen
I'm looking for a cross between a WhitespaceAnalyzer and StandardAnalyzer.  If I pass in:   I-Pity-da-fool who has a 1" ladder said MR.T   I want it to index these:   i-pity-da-fool pity fool 1" 1 ladder mr.t United Rentals Consider it done.™ 800-UR-RENTS unitedrentals.com ---

Re: Quotes dependent StopWords removal

2006-08-16 Thread Mark Miller
This keeps popping back into my head. A little more info for you. Bear in mind I have not dealt with the QueryParser before. Use the approach I gave last time. Pull out the QueryParser and change either QueryParser.jj or QueryParser.java...you may be able to just change QueryParser.java and av

Re: Highlighter

2006-08-16 Thread Mark Miller
The reason has already been posted in response to my initial inquiry. This problem bugged me last month. I did not know the particulars but I assumed it was a bug. I inquired on the mailing list and someone responded with the following link: Highligter fails to include non-token at end of st

Re: Highlighter

2006-08-16 Thread Bill Taylor
[EMAIL PROTECTED] told me that the highlighter ALWAYS does this under certain conditions. In my case, it is when the string ends with . He knew why but I did not. I just fixed it in my code by putting things back. On Aug 16, 2006, at 3:17 AM, Ramesh Salla wrote: which version of Lucene a

Re: search document for keywords and keyphrases

2006-08-16 Thread Eugeny N Dzhurinsky
On Fri, Aug 11, 2006 at 02:39:19PM +0300, Eugeny N Dzhurinsky wrote: > On Fri, Aug 11, 2006 at 01:22:26PM +0200, Simon Willnauer wrote: > > Sure you can do this. > > You index your document with the keywords assigned to the document and > > search with and Boolean Query to get all document having t

Re: Best Practice: emails and file-attachments

2006-08-16 Thread John Haxby
Oh rats. Thunderbird ate the indenting. The two examples should be: multipart/alternative text/plain multipart/related text/html image/gif image/gif application/msword and multipart/related text/html image/

Re: Best Practice: emails and file-attachments

2006-08-16 Thread John Haxby
lude wrote: You also mentioned indexing each bodypart ("attachment") separately. Why? To my mind, there is no use case where it makes sense to search a particular bodypart I will give you the use case: [snip] 3.) The result list would show this: 1. mail-1 'subject' 'Abstract of the messa

Split an existing index into smaller segments without a re-index?

2006-08-16 Thread Stanislav Jordanov
I searched the mail list archives for an answer to that question; The closest (and perhaps the only) thread in this regard that I found is: http://www.gossamer-threads.com/lists/lucene/java-user/9928 So the answer was "No", but this is way back in the mid 2004 (2 years ago). Is there a solution

Re: Best Practice: emails and file-attachments

2006-08-16 Thread lude
Hi Johan, thanks again for the many words and explanations! You also mentioned indexing each bodypart ("attachment") separately. Why? To my mind, there is no use case where it makes sense to search a particular bodypart I will give you the use case: 1.) User searches for "abcd" 2.) Luc

counting and updating the index

2006-08-16 Thread IgO
Hi all, here the scenario, i'm trying to index a database and i'd like to put in the index "the counts of all related table", the first option is to count against the db and then store the data into the documents but i think is not a real option because of huge ammount of structured-data doesnt

Memory allocation to Ram Directory

2006-08-16 Thread adrena . keating
Hi, my lucene index updates via the fileserver is eating up almost a huge amount of heap memory and once the index is completed the memory is not been returned. Ram Drive is enabled. Does anyone know if this might be a problem with the amount of memory been allocated to the Ram Directory? Can y

Re: Best Practice: emails and file-attachments

2006-08-16 Thread John Haxby
lude wrote: Hi John, thanks for the detailed answer. You wrote: If you're indexing a multipart/alternative bodypart then index all the MIME headers, but only index the content of the *first* bodypart. Does this mean you index just the first file-attachment? What do you advice, if you have to

Re: Quotes dependent StopWords removal

2006-08-16 Thread duiduder
Hello Sameer, what about this: - during indexing, use the StandardAnalyzer without stopwords - during the search, use 2 different Analyzers - one with and one without stopwords. Thereyby, you look first whether the user has typed in quotes inside her query String. # If so, look whether there

SV: addIndexes method without the merge

2006-08-16 Thread Marcus Falck
-Ursprungligt meddelande- Från: Marcus Falck [mailto:[EMAIL PROTECTED] Skickat: den 16 augusti 2006 10:47 Till: java-user@lucene.apache.org Ämne: addIndexes method without the merge Hi, In my search engine (based on top of the lucene 1.4.3 api) I'm using one RAMDir as a live indexin

Re: Best Practice: emails and file-attachments

2006-08-16 Thread lude
Hi Dejan, how do you query for email- and(!) attachment-documents, if you just want to present one hit per email (even if the searchterm matches in the email- and(!) in the corresponding attachment-document)? Thanks lude On 8/15/06, Dejan Nenov <[EMAIL PROTECTED]> wrote: The approach we I fi

Re: Best Practice: emails and file-attachments

2006-08-16 Thread lude
Hi John, thanks for the detailed answer. You wrote: If you're indexing a multipart/alternative bodypart then index all the MIME headers, but only index the content of the *first* bodypart. Does this mean you index just the first file-attachment? What do you advice, if you have to index mulitp

addIndexes method without the merge

2006-08-16 Thread Marcus Falck
Hi, In my search engine (based on top of the lucene 1.4.3 api) I'm using one RAMDir as a live indexing buffert and one FSDir as the main persisted index. When the RAMDir buffert has been filled I'm adding those documents to the FSDir and clear the RAMDir. At first I was iterating thru

Re: HELP: how to highlight the search key word in lucene's search results?

2006-08-16 Thread Ramesh Salla
goto mailing list archive and you find a lot of info there. i can brief you out procdure for now. get the Highlighter jar from the lucene-sandbox and see the examples from this downloaded folder. Get the Search Results from the Hits and pass this string to the highlighter class. if you still

Re: Highlighter

2006-08-16 Thread Ramesh Salla
which version of Lucene and which version of Highlighter, do you use. I dont see any such issues? I think,  I can resolve the issue,  if you can pass on a few info on you are trying to get the data and highlight things. On Sat, 2006-08-12 at 00:05 +, Ronnie Kolehmainen wrote: There is