Re: Search for more than one term

2010-01-27 Thread Phan The Dai
Hello ctorresl, you can use QueryParser automatically creating query as query syntax (Erick showed). Or use BooleanQuery class. BooleanQuery query = new BooleanQuery; query.add(a_termquery, Occur.SHOULD); query.add(other_termquery, Occur.SHOULD); On Thu, Jan 28, 2010 at 11:15 AM, Erick Erickson w

Re: Average Precision - TREC-3

2010-01-27 Thread Ivan Provalov
Robert, Thank you for this great information. Let me look into these suggestions. Ivan --- On Wed, 1/27/10, Robert Muir wrote: > From: Robert Muir > Subject: Re: Average Precision - TREC-3 > To: java-user@lucene.apache.org > Date: Wednesday, January 27, 2010, 2:52 PM > Hi Ivan, it sounds to

Re: Search for more than one term

2010-01-27 Thread Erick Erickson
Have you looked at the query syntax? See... http://lucene.apache.org/java/3_0_0/queryparsersyntax.html And the book Lucene In Action has many examples HTH Erick On Wed, Jan 27, 2010 at 6:55 PM, ctorresl wrote: > > Hello: > IÄm working with Lucene for my thesis, please I need answers to >

Re: Search for more than one term

2010-01-27 Thread Mark Miller
ctorresl wrote: > Hello: > IÄm working with Lucene for my thesis, please I need answers to > these questions: > 1. How can I tell Lucene to search for more than one term??? (for example: > the query "house garden computer" will return documents in which at least > one of the > term appears) What cl

Search for more than one term

2010-01-27 Thread ctorresl
Hello: IÄm working with Lucene for my thesis, please I need answers to these questions: 1. How can I tell Lucene to search for more than one term??? (for example: the query "house garden computer" will return documents in which at least one of the term appears) What classes I need to use? 2. Lucen

Re: Average Precision - TREC-3

2010-01-27 Thread Robert Muir
Hi Ivan, it sounds to me like you are going about it the right way. I too have complained about different document/topic formats before, at least with non-TREC test collections that claim to be in TREC format. Here is a description of what I do, for what its worth. 1. if you use the trunk benchma

RE: Average Precision - TREC-3

2010-01-27 Thread Provalov, Ivan (Gale)
Thank you, Jose. -Original Message- From: José Ramón Pérez Agüera [mailto:jose.agu...@gmail.com] Sent: Wednesday, January 27, 2010 1:42 PM To: java-user@lucene.apache.org Subject: Re: Average Precision - TREC-3 Hi Ivan, you might want use the lucene BM25 implementation. Results should b

Re: Average Precision - TREC-3

2010-01-27 Thread José Ramón Pérez Agüera
Hi Ivan, you might want use the lucene BM25 implementation. Results should be better changing the ranking function. Another option is Language model implementation for Lucene: http://nlp.uned.es/~jperezi/Lucene-BM25/ http://ilps.science.uva.nl/resources/lm-lucene The main problem with this imple

Re: Average Precision - TREC-3

2010-01-27 Thread Ivan Provalov
Robert, Grant: Thank you for your replies.  Our goal is to fine-tune our existing system to perform better on relevance. I agree with Robert's comment that these collections are not completely compatible.  Yes, it is possible that the results will vary some depending on the collections differ

Re: Analyze java camelcase words ?

2010-01-27 Thread Phan The Dai
Thank you much. I study about your comments. They are useful. I am newer using Lucene 3.0. Hope it works well. On Thu, Jan 28, 2010 at 1:21 AM, Robert Muir wrote: > no, but you can take the tokenfilter itself and simply use it in your > lucene > application. > > it uses the old tokenstream API s

Re: Analyze java camelcase words ?

2010-01-27 Thread Robert Muir
no, but you can take the tokenfilter itself and simply use it in your lucene application. it uses the old tokenstream API so if you want to use Lucene 3.0 or 3.1, you will need a version that works with the new tokenstream API. There is a patch available here for that: https://issues.apache.org/ji

Re: Analyze java camelcase words ?

2010-01-27 Thread Erick Erickson
Robert: Is this in Lucene yet? According to what I could find in JIRA, it's still open. And it's not in the Javadocs on a quick scan. Erick On Wed, Jan 27, 2010 at 11:08 AM, Robert Muir wrote: > WordDelimiterFilter has a splitOnCaseChange option that should be useful > for > this: > > http

Re: Average Precision - TREC-3

2010-01-27 Thread Robert Muir
Hello, forgive my ignorance here (I have not worked with these english TREC collections), but is the TREC-3 test collection the same as the test collection used in the 2007 paper you referenced? It looks like that is a different collection, its not really possible to compare these relevance scores

Re: Analyze java camelcase words ?

2010-01-27 Thread Robert Muir
WordDelimiterFilter has a splitOnCaseChange option that should be useful for this: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory >From the example: PowerShot -> Power, Shot On Wed, Jan 27, 2010 at 11:01 AM, Phan The Dai wrote: > Can everyone suggest

Re: Average Precision - TREC-3

2010-01-27 Thread Grant Ingersoll
On Jan 26, 2010, at 8:28 AM, Ivan Provalov wrote: > We are looking into making some improvements to relevance ranking of our > search platform based on Lucene. We started by running the Ad Hoc TREC task > on the TREC-3 data using "out-of-the-box" Lucene. The reason to run this old > TREC-3 (

Analyze java camelcase words ?

2010-01-27 Thread Phan The Dai
Can everyone suggest me a solution for tokenize the camelcase words in java ? Examples for camelcase words are: getXmlRule, setTokenizeAnalyzer. They should be tokenized to get, Xml, Rule, set, Tokenize, Analyzer. Thank you very much!

Re: Index searching problem

2010-01-27 Thread Simon Willnauer
On Wed, Jan 27, 2010 at 4:53 PM, Asif Nawaz wrote: > > > IndexSearcher is = new IndexSearcher("index");IndexReader ir = > is.getIndexReader().open("index");System.out.println("No of documents in > index = "+ir.numDocs()); > The last statement shows no of documents = 167. that means IndexReader i

Problem with „AND“ operat or to search Chinese text

2010-01-27 Thread starz10de
Hello , I could successfully implement the Chinese analyzer (CJKAnalyzer) and search Chinese text. However, I have problem when I use the Boolean operator AND then I got always 0 hits. When I search for the 2 Chinese terms without the “AND” operator is no problem, When I want to count only the

RE: Index searching problem

2010-01-27 Thread Asif Nawaz
IndexSearcher is = new IndexSearcher("index");IndexReader ir = is.getIndexReader().open("index");System.out.println("No of documents in index = "+ir.numDocs()); The last statement shows no of documents = 167. that means IndexReader is reading from index, which is open. I think the problem may

RE: Index searching problem

2010-01-27 Thread Asif Nawaz
In the demo example for hotel database searching. I am confused how to open the index and where should i fit that code. In SearchEngine.java file i opened the index this way IndexSearcher is = new IndexSearcher(IndexReader.open("index")); but it's not working and still returns 0 hits :( > D

Re: file open handles?

2010-01-27 Thread Michael McCandless
On Wed, Jan 27, 2010 at 4:25 AM, Jamie wrote: > We got to the bottom of it. Thanks for bringing closure! > Turned out to be a status page that was opening > the reader to obtain docCount but not closing it.Thanks for your help! If you only need the docCount in the index, it's much faster to us

Re: Index searching problem

2010-01-27 Thread Ian Lea
Lots of other things to check are listed in the FAQ: http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F -- Ian. On Wed, Jan 27, 2010 at 11:47 AM, Simon Willnauer wrote: > Do you open the searcher  / reader after you call commit on the writer? > > sim

Re: Index searching problem

2010-01-27 Thread Simon Willnauer
Do you open the searcher / reader after you call commit on the writer? simon On Wed, Jan 27, 2010 at 12:40 PM, Asif Nawaz wrote: > > ok. it works when i add commit and close indexes. when open the index file > with Lukes, it shows me the list of documents that were matched.  But in my > progr

RE: Index searching problem

2010-01-27 Thread Asif Nawaz
ok. it works when i add commit and close indexes. when open the index file with Lukes, it shows me the list of documents that were matched. But in my program it returns no of hits = 0. Why??? Hits hits = se.performSearch("significance");System.out.println("hits length = "+ hits.length());

Re: Index searching problem

2010-01-27 Thread Simon Willnauer
do you close your index writer or commit it before you open your searcher? one more thing, if you search for "Hotel" you might not find anything if the querystring is not passed through the StandardAnalyzer you use for indexing. (well, or another analyzer that does lowercasing). BTW. you email is

Index searching problem

2010-01-27 Thread Asif Nawaz
i build an index to store 100 docs, each with field author, title and abstract.for (i=0;i<100;i++) {writer = new IndexWriter("index",new StandardAnalyzer(),true,IndexWriter.MaxFieldLength.UNLIMITED); doc.add(new Field("author",cfcDoc.getAu(), Field.Store.YES, Field.Index.TOKENIZED));do

Re: file open handles?

2010-01-27 Thread Jamie
Hi Jake We got to the bottom of it. Turned out to be a status page that was opening the reader to obtain docCount but not closing it.Thanks for your help! Jamie On 2010/01/27 10:48 AM, Jamie wrote: Hi Jake Ok. The number of file handles left open is increasing rapidly. For instance, 4200 f

Re: file open handles?

2010-01-27 Thread Jamie
Hi Jake Ok. The number of file handles left open is increasing rapidly. For instance, 4200 file handles were left open by Lucene 2.9.1 over a period of 16 min. You can see in the attached snapshot a picture from JPicus showing the file handles that are left open. These index files are delet

Re: file open handles?

2010-01-27 Thread Jake Mannix
On Wed, Jan 27, 2010 at 12:17 AM, Jamie wrote: > Hi Jake > > > You were indexing but not searching? So you are never calling getReader() >> in the first place? >> >> > Of course, the call exists, its just that during testing we did not execute > any searches at all. Oh! Re-reading your initi

Re: file open handles?

2010-01-27 Thread Jamie
Hi Jake You were indexing but not searching? So you are never calling getReader() in the first place? Of course, the call exists, its just that during testing we did not execute any searches at all. How have you been doing search in a realtime fashion with Lucene before 2.9's introduction