Hello I created an index for some xml files which contains 10 keyword fields and one text field.
I created a TermQuery for searching text field. code like this, ------Dim textQuery As TermQuery = New TermQuery(New Term("content", "school")) I also created a Query using QueryParser.Parse for 10 keyword fields. code like this, ------analyzer = New StandardAnalyzer() ------allkeywordsQuery = QueryParser.Parse("Secondname:Beckwith AND Firstname:Louise", "Firstname", analyzer) I try to use BooleanQuery to allow user to search XML data with the textQuery and filter the result with the allkeywordsQuery. code like this ------myBooleanQuery.Add(allkeywordsQuery, True, False) ------myBooleanQuery.Add(textQuery, True, False) ------Dim hits As Hits = searcher.Search(myBooleanQuery) I should get a xml file contain <Secondname>Beckwith</Secondname> <Firstname>Louise<Firstname> <content>...He used to pick us up from the first school...</content> ... But I always get 0 hits. Please tell me where I did wrong or other better solution for search and filter XML data. I try a query "Secondname:Beckwith AND Firstname:Louise AND content:school" on Luke with WhitespaceAnalyzer, I can get hits, but nothing if I use StandardAnalyzer Thanks Yj -----Original Message----- From: Michael D. Curtin [mailto:[EMAIL PROTECTED] Sent: 22 January 2007 13:53 To: java-user@lucene.apache.org Subject: Re: Long Query Performance Somnath Banerjee wrote: > Thanks for the reply. Good guess I think. > > DB (Index) is basically a collection of encyclopedia documents. Queries are > also a collection of documents but of various domains. My task is to find > out for each "query document" top 100 matching encyclopedia contents. > > I tried by using only the title of (5-8 words) the query documents instead > of full text of the document. But that is also taking 0.5-1 sec for each > query. That's mean it will also take nearly 6 and half days to run > 0.72Mqueries (and expectedly the precision will suffer). Thank you, the problem is a little clearer now. This is a big search problem, so your biggest obstacle is running it on a tiny computer (from what you've said, only a fraction of the *queries* fit in your RAM budget, not to mention the database). I'm confident that spending a little $$ on your computing resources, particularly RAM, would be FAR, FAR more cost-effective than programming labor. But, if you can't get a bigger computer, then you can't. In that case, I'm not sure that Lucene is the best tool for this problem, at least not exclusively. You might find that some preprocessing, of the encyclopedia and query documents, could be used to quickly find the highest-probability candidates, then use Lucene to score them. I'm think of a merge-sort kind of algorithm between the query "documents" and the encyclopedia documents. For each query, find the top several hundred candidate documents from the encyclopedia, perhaps by number of matching non-"stop words" (see below). You could also get more sophisticated by looking at word frequencies from your Lucene index, but I doubt it would make a huge difference for queries of hundreds of words that are looking for hundreds of hits. A couple more suggestions: - Search the archives for the topic "more documents like this" and variations on that theme. Several people have used Lucene in this way, with varying degrees of success. - If you haven't already, ditch "stop words", like "a", "the", prepositions, etc. Your index will be smaller and so will your queries, making each search faster. Good luck! --MDC --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]