Re: Superset Similarity?

2012-11-16 Thread Robert Muir
On Fri, Nov 16, 2012 at 5:18 PM, Tom Burton-West wrote: > Hi Otis, > > I hope this is not off-topic, > > Apparently in Lucene similarity does not have to be set at index time: > Actually in the general case it does. IndexWriter calls the Similarity's computeNorm method at index-time. Its just th

Re: Retrieval of the position of indexed terms

2012-11-16 Thread wgggfiy
Does anyone resove this ? thx -- View this message in context: http://lucene.472066.n3.nabble.com/Retrieval-of-the-position-of-indexed-terms-tp4015079p4020835.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: Which stemmer?

2012-11-16 Thread Lance Norskog
Nope! This slang term only exists in the plural. The kind of prose with this usage may not follow standard grammatical and spelling rules anyway. Historically, text search has been funded mostly by the US intelligence agencies because they want to analyze formal and technical prose. And, it is

Re: Which stemmer?

2012-11-16 Thread Igal @ getRailo.org
but if "dogs" are feet (and I guess I fall into the not-perfect group here)... and "feet" is the plural form of "foot", then shouldn't "dogs" be stemmed to "dog" as a base, singular form? On 11/16/2012 2:32 PM, Tom Burton-West wrote: Hi Mike, Honestly I've never heard of anyone using "dog

Re: Which stemmer?

2012-11-16 Thread Tom Burton-West
Hi Mike, >>Honestly I've never heard of anyone using "dogs" to mean feet either, but hey nobody's perfect. This is really off topic but I couldn't resist. This usage of "dogs" to mean feet occurs in old blues lyrics such as Blind Lemon Jefferson's "Hot Dogs" http://www.youtube.com/watch?v=v670qV

Re: Superset Similarity?

2012-11-16 Thread Tom Burton-West
Hi Otis, I hope this is not off-topic, Apparently in Lucene similarity does not have to be set at index time: See http://lucene.apache.org/core/4_0_0/changes/Changes.html under Lucene 2959 "All models default to the same index-time norm encoding as DefaultSimilarity, so you can easily try these

Re: Grouping on multiple shards possible in lucene?

2012-11-16 Thread Michael McCandless
Yes, this is possible using Lucene's grouping APIs. It looks like index time grouping won't work, since you get the same parent spread out across time, but you can use the two-pass grouping instead ... run the FirstPassGroupingCollector on each shard, get the top groups from each, merge those and

Re: what is the format of .tim and .tiq in lucene 4.0 ?

2012-11-16 Thread Michael McCandless
The format is unfortunately rather intricate ... FST = finite state transducer (see eg http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html ). We use that to hold the terms index (*.tip), which is loaded into RAM. The blocks are because we encode a block of between 25 -

Re: Lucene Index File Format

2012-11-16 Thread wgggfiy
I'm study deeply in the index format, write java utils to log all of it. And now I have successfully logged .si, .fnm, .fdx, .fdt, but the .tim and .tiq is too complicated... -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-Index-File-Format-tp4011133p4020685.html S

Re: Lucene 4.0 Get All Index Terms

2012-11-16 Thread wgggfiy
me too ! Could you explain how you solved it ?? -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-4-0-Get-All-Index-Terms-tp3686023p4020683.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --