Hi ! I am newbie in lucene, and i have some problems to create a
simple code to query a text file collection.
My code is this (http://pastebin.com/HqrbBPtp), but does not works.
What is Wrong?
Thanks,
Celso.
-
To unsubscribe, e-
After you get the query object, you can use IndexSearcher's function
docFreq(), like this
final Set terms = new HashSet();
query = searcher.rewrite(query);
query.extractTerms(terms);
for(Term t : terms){
int frequency = searcher.docFreq(t);
}
--
--
Chris Lu
-
Instant
After you get the query object, you can use IndexSearcher's function
docFreq(), like this
final Set terms = new HashSet();
query = searcher.rewrite(query);
query.extractTerms(terms);
for(Term t : terms){
int frequency = irs.getSearcher().docFreq(t);
}
--
--
Chris Lu
-
I have an active lucene implementation that has been in place for a
couple years and was recently upgraded to the 3.02 branch. We are now
occasionally seeing documents returned from searches that should not be
returned. I have reduced the code and indexes to the smallest set
possible where I can st
Can you give more details about what you want? Perhaps with an example?
Do you want the number of documents containing the query term, the number of
occurrences of the query term within a document, or the number of occurrences
of the term in the entire index?
You can use an explain query to get
You might want to take a look at this tutorial on how Lucene calculates
Scoring [1]. If all you are interested in is the term frequency and you want
to ignore other calculations you can override the others and have them
return 1.
Hope this helps!
Seth Rosen
s...@architexa.com
www.architexa.com
The problem with your implementatio n of reuseableTokenStream is that it does
not set a new reader when it reuses. Reset() is the wrong method. Attempt b is
also wrong, as it does not reuse the whole analyzer chain. The correct way is
to make some utility class that you use for storing the Token
Likely what happened is you had a bunch of smaller segments, and then
suddenly they got merged into that one big segment (_aiaz) in your
index.
The representation for norms in particular is not sparse, so this
means the size of the norms file for a given segment will be
number-of-unique-indexed-fi
Thanks, Uwe!
Indeed you're right! Whenever IndexReader is called, a new document instance is
created! And since the Document class does no override equals &
hashCode, I can't know if the same doc was
retrieved. And since Document is final, I can only write a wrapper for it.
Is this an oversight or
hello Lucene list,
I have a question about a custom Analyzer we're trying to write. The
intention is that it tokenizes on whitespace, and abstracts over
upper/lowercase and accented characters. It is used both when indexing
documents, and before creating lucene queries from search terms.
I have 2
Hi Carmit,
equals and hashCode is not implemented for oal.document.Document, so two
instances always compare not to each other. The same happens if you retrieve
the document two times from same IndexReader.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u..
Hi,
I have a weird result:
If I access the same document through the IndexReader or IndexSearcher, they
are not equal and have different hash values:
Document doc1 = indexSearcher.doc(i);
Document doc2 = indexSearcher.getIndexReader().document(i);
S
I need to find the most frequent terms that are appeared with a query.
HighFreqTerms.java can be used only to obtain the high frequency terms in
the whole index.
I need just to find the high frequency terms to the submitted query.
What I do now is:
I search the index with the query and retr
13 matches
Mail list logo