What about TermVector? it says in "lucene in action": Term vectors are something a mix of between an indexed field and a stored field. They are similar to a stored field because you can quickly retrieve all term vector fields for a given document: term vectors are keyed first by document ID. But then, they are keyed secondarily by term, meaning they store a miniature inverted index for that one document. Unlike a stored field, where the original String content is stored verbatim, term vectors store the actual separate terms that were produced by the Analyzer.This allows you to retrieve all terms, and the frequency of their occurrence within the document and sorted in lexicographic order, for a particular indexed Field of a particular Document. TermVector.YES – record the unique terms that occurred, and their counts, in each document, but do not store any positions or offsets information. TermVector.WITH_POSITIONS – record the unique terms and their counts, and also the positions of each occurrence of every term, but no offsets. TermVector.WITH_OFFSETS – record the unique terms and their counts, with the offsets (start & end character position) of each occurrence of every term, but no positions. TermVector.WITH_POSITIONS_OFFSETS – store unique terms and their counts, along with positions and offsets. TermVector.NO – do not store
I am confused. what's the difference between TermVector and Index? in an index, we can save postion information and also we can save it in TermVector. If I want to support phrase query, I must save position in index. And if I want to support fast highlighter and similar like this, I have to save TermVector. How these information stored? e.g. there are 2 docs using WhitespaceAnalyzer 1, it is a good day good night 2, you are a good man The index's data structure seems like: good -> doc1 2(tf) 3 5; doc2 1(tf) 3 what about termvector? like? "lucene in action" says it indexed first by doc id then term. I can't image it 2010/5/31 Andrzej Bialecki <a...@getopt.org>: > On 2010-05-31 10:54, Uwe Schindler wrote: >> No. > > See also LUCENE-2048 (nice round number ;) ). > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org