Re: posting list traversal code

2013-06-12 Thread Denis Bazhenov
Document id on the index level is offset of the document in the index. It can change over time for the same document, for example when merging several segments. They are also stored in order in posting lists. This allows fast posting list intersection. Some Lucene API's explicitly state that the

Re: posting list traversal code

2013-06-12 Thread Sriram Sankar
Thanks Denis. I've been looking at the code in more detail now. I'm interested in how the new SortingAtomicReader works. Suppose I build an index and sort the documents using my own sorting function - as shown in the docs: AtomicReader sortingReader = new SortingAtomicReader(reader, sorter); w

Re: posting list traversal code

2013-06-12 Thread Denis Bazhenov
I'm not quite sure, what you really need. But as far as I understand, you want to get all document id's for a given term. If so, the following code will work for you: Term term = new Term("fieldName", "fieldValue"); TermDocs termDocs = indexReader.termDocs(term); while (termDocs.next()) {

posting list traversal code

2013-06-12 Thread Sriram Sankar
Can someone point me to the code that traverses the posting lists? I trying to understand how it works. Thanks, Sriram

答复: [SPAM] Re: A Problem in Customizing DefaultSimilarity

2013-06-12 Thread Oliver Xu (Aigine Co)
Lovely. Thank you very much! Oliver -邮件原件- 发件人: java-user-return-56101-oliver.xu=aigine@lucene.apache.org [mailto:java-user-return-56101-oliver.xu=aigine@lucene.apache.org] 代表 Koji Sekiguchi 发送时间: 2013年6月12日 22:47 收件人: java-user@lucene.apache.org 主题: [SPAM] Re: A Problem in Customi

Re: Seemingly very difficult to wrap an Analyzer with CharFilter

2013-06-12 Thread Michael Sokolov
On 6/12/2013 7:02 PM, Steven Schlansker wrote: On Jun 12, 2013, at 3:44 PM, Michael Sokolov wrote: You may not have noticed that CharFilter extends Reader. The expected pattern here is that you chain instances together -- your CharFilter should act as *input* to the Analyzer, I think. Don

Re: Seemingly very difficult to wrap an Analyzer with CharFilter

2013-06-12 Thread Steven Schlansker
On Jun 12, 2013, at 3:44 PM, Michael Sokolov wrote: > You may not have noticed that CharFilter extends Reader. The expected > pattern here is that you chain instances together -- your CharFilter should > act as *input* to the Analyzer, I think. Don't think in terms of extending > these ana

Re: Seemingly very difficult to wrap an Analyzer with CharFilter

2013-06-12 Thread Michael Sokolov
You may not have noticed that CharFilter extends Reader. The expected pattern here is that you chain instances together -- your CharFilter should act as *input* to the Analyzer, I think. Don't think in terms of extending these analysis classes (except the base ones designed for it): compose t

Re: Remove/Filter emails from a TokenStream?

2013-06-12 Thread Gucko Gucko
Hello, I figured out how to solve this. I just added stopTypes.add(""); On Wed, Jun 12, 2013 at 8:39 PM, Gucko Gucko wrote: > Hello all, > > is there a filter I can use to remove emails from a TokenStream? > > so far I'm using this to remove numbers, URls, and I would like to remove > emails

Remove/Filter emails from a TokenStream?

2013-06-12 Thread Gucko Gucko
Hello all, is there a filter I can use to remove emails from a TokenStream? so far I'm using this to remove numbers, URls, and I would like to remove emails too: Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43, new StringReader(text)); Set stopTypes = new HashSet(); st

Re: Exception while creating a Tokenizer

2013-06-12 Thread Gucko Gucko
thank you so much. Yes the problem was that I had a jar that's using Lucene 1.5. Best! On Wed, Jun 12, 2013 at 7:52 PM, Uwe Schindler wrote: > Hi, > > This happens if you have incompatible Lucene versions next to each other > in your classpath. Please clean up your classpath carefully and make

RE: Exception while creating a Tokenizer

2013-06-12 Thread Uwe Schindler
Hi, This happens if you have incompatible Lucene versions next to each other in your classpath. Please clean up your classpath carefully and make sure all JAR files of Lucene have the same version and no duplicate JARs with different versions are in it! Uwe - Uwe Schindler H.-H.-Meier-All

Exception while creating a Tokenizer

2013-06-12 Thread Gucko Gucko
Hello all, I'm trying the following code (trying to play with Tokenizers in order to create my own Analyzer) but I'm getting an exception: public class TokenizerTest { public static void main(String[] args) throws IOException { String text = "A #revolution http://hi.com in t...@test.com softwa

Re: A Problem in Customizing DefaultSimilarity

2013-06-12 Thread Koji Sekiguchi
Hi Oliver, > My questions are: > > 1. Why are the overrided lengthNorm() (under Lucene410) or > computeNorm() (under Lucene350) methods not called during a searching > process? Regardless of whether you override the method or not, Lucene framework calls the method during index time only be

A Problem in Customizing DefaultSimilarity

2013-06-12 Thread Oliver Xu
Dear, I built my own scoring class by extending the DefaultSimilarity. Three major methods from DefaultSimilarity were overrided, including: 1. public float lengthNorm(FieldInvertState state) 2. public float tf(float freq) 3.public float idf(long docFreq, long numDocs) However, with embe