date:20101104

How index and search text files in Lucene 3.0.2 ?

2010-11-04 Thread Celso Fontes

Hi ! I am newbie in lucene, and i have some problems to create a simple code to query a text file collection. My code is this (http://pastebin.com/HqrbBPtp), but does not works. What is Wrong? Thanks, Celso. - To unsubscribe, e-

Re: High frequency term for the searched query

2010-11-04 Thread Chris Lu

After you get the query object, you can use IndexSearcher's function docFreq(), like this final Set terms = new HashSet(); query = searcher.rewrite(query); query.extractTerms(terms); for(Term t : terms){ int frequency = searcher.docFreq(t); } -- -- Chris Lu - Instant

Re: High frequency term for the searched query

2010-11-04 Thread Chris Lu

After you get the query object, you can use IndexSearcher's function docFreq(), like this final Set terms = new HashSet(); query = searcher.rewrite(query); query.extractTerms(terms); for(Term t : terms){ int frequency = irs.getSearcher().docFreq(t); } -- -- Chris Lu -

Search returning documents matching a NOT range

2010-11-04 Thread David Fertig

I have an active lucene implementation that has been in place for a couple years and was recently upgraded to the 3.02 branch. We are now occasionally seeing documents returned from searches that should not be returned. I have reduced the code and indexes to the smallest set possible where I can st

RE: High frequency term for the searched query

2010-11-04 Thread Burton-West, Tom

Can you give more details about what you want? Perhaps with an example? Do you want the number of documents containing the query term, the number of occurrences of the query term within a document, or the number of occurrences of the term in the entire index? You can use an explain query to get

Re: High frequency term for the searched query

2010-11-04 Thread Seth Rosen

You might want to take a look at this tutorial on how Lucene calculates Scoring [1]. If all you are interested in is the term frequency and you want to ignore other calculations you can override the others and have them return 1. Hope this helps! Seth Rosen s...@architexa.com www.architexa.com

RE: Question about custom Analyzer

2010-11-04 Thread Uwe Schindler

The problem with your implementatio n of reuseableTokenStream is that it does not set a new reader when it reuses. Reset() is the wrong method. Attempt b is also wrong, as it does not reuse the whole analyzer chain. The correct way is to make some utility class that you use for storing the Token

Re: IndexWriter.close() performance issue

2010-11-04 Thread Michael McCandless

Likely what happened is you had a bunch of smaller segments, and then suddenly they got merged into that one big segment (_aiaz) in your index. The representation for norms in particular is not sparse, so this means the size of the norms file for a given segment will be number-of-unique-indexed-fi

Weird document equals and hash through IndexReader & IndexSearcher

2010-11-04 Thread Carmit Sahar

Thanks, Uwe! Indeed you're right! Whenever IndexReader is called, a new document instance is created! And since the Document class does no override equals & hashCode, I can't know if the same doc was retrieved. And since Document is final, I can only write a wrapper for it. Is this an oversight or

Question about custom Analyzer

2010-11-04 Thread heikki

hello Lucene list, I have a question about a custom Analyzer we're trying to write. The intention is that it tokenizes on whitespace, and abstracts over upper/lowercase and accented characters. It is used both when indexing documents, and before creating lucene queries from search terms. I have 2

RE: Weird document equals and hash through IndexReader & IndexSearcher

2010-11-04 Thread Uwe Schindler

Hi Carmit, equals and hashCode is not implemented for oal.document.Document, so two instances always compare not to each other. The same happens if you retrieve the document two times from same IndexReader. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u..

Weird document equals and hash through IndexReader & IndexSearcher

2010-11-04 Thread Carmit Sahar

Hi, I have a weird result: If I access the same document through the IndexReader or IndexSearcher, they are not equal and have different hash values: Document doc1 = indexSearcher.doc(i); Document doc2 = indexSearcher.getIndexReader().document(i); S

High frequency term for the searched query

2010-11-04 Thread starz10de

I need to find the most frequent terms that are appeared with a query. HighFreqTerms.java can be used only to obtain the high frequency terms in the whole index. I need just to find the high frequency terms to the submitted query. What I do now is: I search the index with the query and retr

How index and search text files in Lucene 3.0.2 ?

Re: High frequency term for the searched query

Re: High frequency term for the searched query

Search returning documents matching a NOT range

RE: High frequency term for the searched query

Re: High frequency term for the searched query

RE: Question about custom Analyzer

Re: IndexWriter.close() performance issue

Weird document equals and hash through IndexReader & IndexSearcher

Question about custom Analyzer

RE: Weird document equals and hash through IndexReader & IndexSearcher

Weird document equals and hash through IndexReader & IndexSearcher

High frequency term for the searched query

13 matches

Site Navigation

Mail list logo

Footer information