How index and search text files in Lucene 3.0.2 ?

2010-11-04 Thread Celso Fontes
Hi ! I am newbie in lucene, and i have some problems to create a simple code to query a text file collection. My code is this (http://pastebin.com/HqrbBPtp), but does not works. What is Wrong? Thanks, Celso. - To unsubscribe, e-

Re: High frequency term for the searched query

2010-11-04 Thread Chris Lu
After you get the query object, you can use IndexSearcher's function docFreq(), like this final Set terms = new HashSet(); query = searcher.rewrite(query); query.extractTerms(terms); for(Term t : terms){ int frequency = searcher.docFreq(t); } -- -- Chris Lu - Instant

Re: High frequency term for the searched query

2010-11-04 Thread Chris Lu
After you get the query object, you can use IndexSearcher's function docFreq(), like this final Set terms = new HashSet(); query = searcher.rewrite(query); query.extractTerms(terms); for(Term t : terms){ int frequency = irs.getSearcher().docFreq(t); } -- -- Chris Lu -

Search returning documents matching a NOT range

2010-11-04 Thread David Fertig
I have an active lucene implementation that has been in place for a couple years and was recently upgraded to the 3.02 branch. We are now occasionally seeing documents returned from searches that should not be returned. I have reduced the code and indexes to the smallest set possible where I can st

RE: High frequency term for the searched query

2010-11-04 Thread Burton-West, Tom
Can you give more details about what you want? Perhaps with an example? Do you want the number of documents containing the query term, the number of occurrences of the query term within a document, or the number of occurrences of the term in the entire index? You can use an explain query to get

Re: High frequency term for the searched query

2010-11-04 Thread Seth Rosen
You might want to take a look at this tutorial on how Lucene calculates Scoring [1]. If all you are interested in is the term frequency and you want to ignore other calculations you can override the others and have them return 1. Hope this helps! Seth Rosen s...@architexa.com www.architexa.com

RE: Question about custom Analyzer

2010-11-04 Thread Uwe Schindler
The problem with your implementatio n of reuseableTokenStream is that it does not set a new reader when it reuses. Reset() is the wrong method. Attempt b is also wrong, as it does not reuse the whole analyzer chain. The correct way is to make some utility class that you use for storing the Token

Re: IndexWriter.close() performance issue

2010-11-04 Thread Michael McCandless
Likely what happened is you had a bunch of smaller segments, and then suddenly they got merged into that one big segment (_aiaz) in your index. The representation for norms in particular is not sparse, so this means the size of the norms file for a given segment will be number-of-unique-indexed-fi

Weird document equals and hash through IndexReader & IndexSearcher

2010-11-04 Thread Carmit Sahar
Thanks, Uwe! Indeed you're right! Whenever IndexReader is called, a new document instance is created! And since the Document class does no override equals & hashCode, I can't know if the same doc was retrieved. And since Document is final, I can only write a wrapper for it. Is this an oversight or

Question about custom Analyzer

2010-11-04 Thread heikki
hello Lucene list, I have a question about a custom Analyzer we're trying to write. The intention is that it tokenizes on whitespace, and abstracts over upper/lowercase and accented characters. It is used both when indexing documents, and before creating lucene queries from search terms. I have 2

RE: Weird document equals and hash through IndexReader & IndexSearcher

2010-11-04 Thread Uwe Schindler
Hi Carmit, equals and hashCode is not implemented for oal.document.Document, so two instances always compare not to each other. The same happens if you retrieve the document two times from same IndexReader. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u..

Weird document equals and hash through IndexReader & IndexSearcher

2010-11-04 Thread Carmit Sahar
Hi, I have a weird result: If I access the same document through the IndexReader or IndexSearcher, they are not equal and have different hash values: Document doc1 = indexSearcher.doc(i); Document doc2 = indexSearcher.getIndexReader().document(i); S

High frequency term for the searched query

2010-11-04 Thread starz10de
I need to find the most frequent terms that are appeared with a query. HighFreqTerms.java can be used only to obtain the high frequency terms in the whole index. I need just to find the high frequency terms to the submitted query. What I do now is: I search the index with the query and retr