AW: Problem with sorting on NumericFields

2010-10-26 Thread Uwe Goetzke
should reindex the whole stuff or at least try to optimize the index to get rid of deleted documents and the terms. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Goetzke [mailto:uw

Problem with sorting on NumericFields

2010-10-25 Thread Uwe Goetzke
I got stuck on a problem using NumericFields using with lucene 2.9.3 I add values to the document by doc.add(new NumericField("minprice").setDoubleValue(net_price)); If I want to search with a sorter for this field, I get this error: java.lang.NumberFormatException: Invalid shift val

AW: How can I merge .cfx and .cfs into a single cfs file?

2010-05-05 Thread Uwe Goetzke
Index all into a directory and determine the size of all files in it. >From http://lucene.apache.org/java/3_0_1/fileformats.html Starting with Lucene 2.3, doc store files (stored field values and term vectors) can be shared in a single set of files for more than one segment. When compound file

AW: Relevancy Practices

2010-05-03 Thread Uwe Goetzke
Regarding Part3: Data quality For our search domain (catalog products) we face very often the problem that the search data is full of acronyms and abbreviations like: cable,nym-j,pvc,3x2.5mm² or dvd-/cd-/usb-carradio,4x50W,divx,bl We solved this by a combination of normalization for better data

AW: Reverse stemmer?

2009-10-09 Thread Uwe Goetzke
ing of the search results regarding to the phrase entered by the user. Regards Uwe Goetzke Healy Hudson -Ursprüngliche Nachricht- Von: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Gesendet: Donnerstag, 8. Oktober 2009 21:20 An: java-user@lucene.apache.org Betreff: Re: Reverse ste

AW: MergePolicy$MergeException because of FileNotFoundException because wrong path to index-file

2009-08-31 Thread Uwe Goetzke
Ups, sorry 2.4.1 Thx Uwe Goetzke -Ursprüngliche Nachricht- Von: Uwe Schindler [mailto:u...@thetaphi.de] Gesendet: Montag, 31. August 2009 17:42 An: java-user@lucene.apache.org Betreff: RE: MergePolicy$MergeException because of FileNotFoundException because wrong path to index-file

MergePolicy$MergeException because of FileNotFoundException because wrong path to index-file

2009-08-31 Thread Uwe Goetzke
r.doPrivileged( new GetPropertyAction("file.separator"))).charAt(0); Which sounds more than strange to me... Any idea? Regards Uwe Goetzke --- Healy Hudson GmbH - D-55252 Mainz Kastel Geschaftsfuhrer Christian

AW: Most frequently indexed term

2009-06-08 Thread Uwe Goetzke
Hello Ganesh, What about making a seperate index for each day, get your analysis and merge thereafter that index. I am not sure but I think this might work. Use MultiSearcher for the search. Regards Uwe Goetzke -Ursprüngliche Nachricht- Von: Ganesh [mailto:emailg...@yahoo.co.in

AW: Transforming german umlaute like ö,ä,ü ,ß into oe, ae, ue, ss

2008-11-18 Thread Uwe Goetzke
output.append("th"); break; case '\u00F9' : // ù case '\u00FA' : // ú case '\u00FB' : // û

AW: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-26 Thread Uwe Goetzke
t; size and even min and max n-gram size. >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> ----- Original Message >>> From: Jay <[EMAIL PROTECTED]> >>> To: java-user@lucene.a

AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-25 Thread Uwe Goetzke
am-based index. Is it significantly better than before you use the NGramAnalyzer? -jake On 3/24/08, Uwe Goetzke <[EMAIL PROTECTED]> wrote: > Hi Ivan, > No, we do not use StandardAnalyser or StandardTokenizer. > > Most data is processed by > fTextTokenStream = result

AW: Implement a relaxed PhraseQuery?

2008-03-24 Thread Uwe Goetzke
Hi Cuong , I have written a TolerantPhraseScorer starting with the code from PhraseScorer but I think I have modified it to much to be generally useful. We use it with bigramm clusters and therefore does not need the slop factor for scoring but have a tolerance factor (depending on the length o

AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-24 Thread Uwe Goetzke
index, merges or something else and this is the reason the total process of indexing to be not so reasonably faster. Best Regards, Ivan Uwe Goetzke wrote: > This week I switched the lucene library version on one customer system. > The indexing speed went down from 46m32s to 16m20s for the com

AW: Does Lucene support partition-by-keyword indexing?

2008-03-01 Thread Uwe Goetzke
Hi, I do not yet fully understand what you want to achieve. You want to spread the index split by keywords to reduce the time to distribute indexes? And you want the distribute queries to the nodes based on the same split mechanism? You have several nodes with different kind of documents. Y

feedback: Indexing speed improvement lucene 2.2->2.3.1

2008-03-01 Thread Uwe Goetzke
This week I switched the lucene library version on one customer system. The indexing speed went down from 46m32s to 16m20s for the complete task including optimisation. Great Job! We index product catalogs from several suppliers, in this case around 56.000 product groups and 360.000 products inclu

Re: Chinese Segmentation with Phase Query

2007-11-10 Thread Uwe Goetzke
abbreviations) Regards Uwe Goetzke -Ursprüngliche Nachricht- Von: Cedric Ho [mailto:[EMAIL PROTECTED] Gesendet: Samstag, 10. November 2007 02:28 An: java-user@lucene.apache.org Betreff: - Re: Chinese Segmentation with Phase Query On Nov 10, 2007 2:08 AM, Steven A Rowe <[EMAIL PROTECTED]>

Scoring algorithm suggestion?

2007-10-18 Thread Uwe Goetzke
ion that terms which follow each other in the indexed doc in the same order get a higher score. In this case we have 5 terms in the correct order which should give to the doc a boost of 4 (relatively spoken). What type of query should I base the development of my scorer on? Regards Uwe