Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
I forgot about the Solr/Lucene code shuffling. Back in 3.4, WDF was in Solr rather than Lucene. Here's the code: http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_4/solr/core/src/java/org/apache/solr/analysis/WordDelimiterFilter.java?revision=1166268&view=markup -- Jack Krupansky

RE: Stemming - limited index expansion

2012-06-12 Thread Paul Hill
Thanks for the reply. > -Original Message- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Tuesday, June 12, 2012 1:14 PM > To: java-user@lucene.apache.org > Subject: Re: Stemming - limited index expansion > > I don't completely follow precisely what you want to do, but th

Re: Stemming - limited index expansion

2012-06-12 Thread Jack Krupansky
I don't completely follow precisely what you want to do, but the WordDelimiterFilter is an example of a token filter that outputs an extra token at the same position, such as with its CATENATE_ALL/WORDS/NUMBERS options. https://builds.apache.org/job/Lucene-trunk/javadoc/analyzers-common/org/ap

Re: CodeMaps updates for Lucene

2012-06-12 Thread Abhishek Rakshit
So I just looked at the tags added by Paul and quickly tagged a few other related items. You can see them here: http://www.codemaps.org/s/Lucene/t/query-related Do you think this might be helpful to people trying to learn more about queries? (It seems that the site has a bug. I tagged some of the

Stemming - limited index expansion

2012-06-12 Thread Paul Hill
As others have previously proposed on this list, I am interesting in inserting a second token at some positions in my index. I'll call this Limited Index Expansion. I want to retain the original token, so that I can score an original word that matches in a text better than just any synonym/stem

Re: problem understanding the documentation for the TieredMergePolicy class

2012-06-12 Thread Jack Krupansky
You start by defining minimum segment size, number of segments per tier, and maximum segment size. From that, the "budget" or maximum number of segments allowed is calculated. Each move up a level (tier) increases the size of the segment allowed at that next higher tier until the largest segment

threshold calculation in CarmelTopKTermPruningPolicy

2012-06-12 Thread Zeynep P.
Hi, In CarmelTopKTermPruningPolicy class, the threshold is calculated as follows: *float threshold = docs[k - 1].score - scoreDelta;* docs[k - 1].score corresponds to z_t in the original paper (Carmel et al 2001) and scoreDelta = epsilon * r Could you please explain me why it is calculated

Re: Support for NumericRangeQuery in QueryParser

2012-06-12 Thread Jochen Hebbrecht
Hi Uwe, Thanks for your answer. The alternative way was already familiar to me, but thanks anyway ;-)! I didn't know the "contrib" folder :-). Thanks for your reply. Kind regards, Jochen PS: I'm Belgian ;-) ... 2012/6/12 Uwe Schindler > Hi Jochen, > > the flexible query parser in contrib al

RE: Support for NumericRangeQuery in QueryParser

2012-06-12 Thread Uwe Schindler
Hi Jochen, the flexible query parser in contrib allows for numeric fields (you need to configure it to "know" the types of fields, e.g. which fields are long float,...). Alternatively use the code from my "Java Magazin" article a few years ago and customize core's QueryParser by overriding the fac

problem understanding the documentation for the TieredMergePolicy class

2012-06-12 Thread thomas
Hello, I've read the documentation about the TiredMergePolicy class. But I just can't get behind what this sentence is trying to state: [..] For normal merging, this policy first computes a "budget" of how many segments are allowed by be in the index. [...] http://lucene.apache.org/core/old_v