Sorting

2006-07-29 Thread neils
Hi, Lucene sort hits by relevance as default. Cause i would like to sort them by a special string field and not by relevance i was thinking about dropping the sorting by relevance as default and implement sorting by alphabetic order. Reason that sorting by alpabetic order takes a lot of time. Ma

Re: Sorting

2006-07-29 Thread karl wettin
On Sat, 2006-07-29 at 01:43 -0700, neils wrote: > is there another "fast" way to sort by alphabetic order ? > > The indexsize is currently about 2 GB and index is split in two parts > and are accessed by a parallelreader. I don't know how much faster it is, but you could try to store the sort orde

Re: About search performance

2006-07-29 Thread karl wettin
On Sat, 2006-07-29 at 09:46 +0800, zhongyi yuan wrote: > Hi,How about implement multi-key search use lucene, for example use > boolean search exceed 1000 clauses,it will affect the performance > greatly. If use filter or custom sorter to select the result, because > the result is extremely large in

Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-29 Thread Michael J. Prichard
Hasan Diwan wrote: Michael: On 7/28/06, Michael J. Prichard <[EMAIL PROTECTED]> wrote: Howdynot sure if anyone else wants this but here is my first attempt at writing an analyzer for an email address...modifications, updates, fixes welcome. Why reinvent the wheel? See http://java.sun.c

SpellChecker

2006-07-29 Thread neils
Hi, i had seen that a suggestion-tool (like google: Did you mean xyz?) can be implemented with a addon for lucene which is called SpellingChecker. Is this correct or is there another (better) solution and where can this addon be downloaded (cause i do not find a working download link). Thanks

Re: SpellChecker

2006-07-29 Thread Mark Miller
neils wrote: Hi, i had seen that a suggestion-tool (like google: Did you mean xyz?) can be implemented with a addon for lucene which is called SpellingChecker. Is this correct or is there another (better) solution and where can this addon be downloaded (cause i do not find a working download

Re: Sorting

2006-07-29 Thread Jason Calabrese
One fast way to make an alphabetic sort very fast is to presort your docs before adding them to the index. If you do this you can then just sort by index order. We are using this for a large index (1 million+ docs) and it works very good, and seems even slightly faster than relevance sorting.

Re: SpellChecker

2006-07-29 Thread neils
Hi Mark, thanks a lot for the link. I already found this but i can not use the libarys import org.apache.lucene.search.spell.Dictionary; import org.apache.lucene.search.spell.LuceneDictionary; import org.apache.lucene.search.spell.SpellChecker; cause is seems like there are not included in Luce

PerFieldAnalyzerWrapper use? Analyzer's not being used as expected....

2006-07-29 Thread Michael J. Prichard
So I have the following code... // let's get our SynonymAnalyzer SynonymAnalyzer synAnalyzer = getSynonymAnalyzer(); // let's get our EmailAnalyzer EmailAnalyzer emailAnalyzer = getEmailAnalyzer(); // set up perfieldanalyzer PerFieldAnalyzerWrapper aWrapper = new PerFieldAnalyzerWrapper(new Sta

Re: PerFieldAnalyzerWrapper use? Analyzer's not being used as expected....

2006-07-29 Thread Michael J. Prichard
Oh my...disregard this question. It works...I was instantiating my IndexWriter before setting up my Analyzers!! Dangit...I feel a little dumb. I just switched the order and put the instantiated indexwriter last...it works. Thanks, Michael P.S. I feel somewhat silly! Michael J. Prichard w

Re: PerFieldAnalyzerWrapper use? Analyzer's not being used as expected....

2006-07-29 Thread Erik Hatcher
I think you should use a new instance of each analyzer for each field, not reuse instances. Other than that, your usage is fine. Erik On Jul 29, 2006, at 3:49 PM, Michael J. Prichard wrote: So I have the following code... // let's get our SynonymAnalyzer SynonymAnalyzer synAnalyze

Re: PerFieldAnalyzerWrapper use? Analyzer's not being used as expected....

2006-07-29 Thread Michael J. Prichard
Hey Erik, Will do. May I ask why? Out of curiousity. Thanks, Michael Erik Hatcher wrote: I think you should use a new instance of each analyzer for each field, not reuse instances. Other than that, your usage is fine. Erik On Jul 29, 2006, at 3:49 PM, Michael J. Prichard wrote:

Re: Sorting

2006-07-29 Thread karl wettin
On Sat, 2006-07-29 at 12:39 -0700, Jason Calabrese wrote: > One fast way to make an alphabetic sort very fast is to presort your > docs before adding them to the index. If you do this you can then > just sort by index order. We are using this for a large index (1 > million+ docs) and it works ver

Re: Sorting

2006-07-29 Thread neils
Hi, thanks a lot for you helpfull answers :-)) I think I will try i like karl suggest, cause i have to update the index every day :-)) Thanks a lot :-)) -- View this message in context: http://www.nabble.com/Sorting-tf2019404.html#a5558212 Sent from the Lucene - Java Users forum at Nabble.com.

Re: Sorting

2006-07-29 Thread Chris Hostetter
: thanks a lot for you helpfull answers :-)) I think I will try i like karl : suggest, cause i have to update the index every day :-)) All of the suggestions so far assume that you have some way of mapping each document to a number that indicates it's relative position in the total space of order

Re: SpellChecker

2006-07-29 Thread Chris Hostetter
: import org.apache.lucene.search.spell.Dictionary; : import org.apache.lucene.search.spell.LuceneDictionary; : import org.apache.lucene.search.spell.SpellChecker; : : cause is seems like there are not included in Lucene.Net. Could this be : right ? Are there alternatives ? Those classes are pat

Re: PerFieldAnalyzerWrapper use? Analyzer's not being used as expected....

2006-07-29 Thread Otis Gospodnetic
I think you can reuse them. Fields should he handled/analyzed sequentially. I reuse them for some stuff on Simpy.com. But you may want to clean up that try/catch. Instead of catching the IOException, you may want to use !IndexReader.indexExists(...) in place of that boolean param to IndexWri

Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-29 Thread Otis Gospodnetic
No, you're not missing anything. :) That JavaMail API is good for getting the whole email, but you then need to chop it up with your EmailAnalyzer, so you're doing the right thing. Otis - Original Message From: Michael J. Prichard <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent

Re: Sorting

2006-07-29 Thread neils
Hi Chris, thanks a lot for your reply. Currently I'm using a parallelreader, cause one part of index is in the memory and one part is on disk. It seems like parallelreader has a problem with sorting. So i have three questions: 1. Is there a know bug in the parallelreader? 2. Is it true, that onl

Re: Sorting

2006-07-29 Thread Chris Hostetter
: thanks a lot for your reply. Currently I'm using a parallelreader, cause one : part of index is in the memory and one part is on disk. It seems like : parallelreader has a problem with sorting. So i have three questions: : : 1. Is there a know bug in the parallelreader? not that i know of ... ca