Also, if I do the below Query q = new QueryParser(Version.LUCENE_35, "searchText", analyzer).parse("Takeaway f...@company.com^100")
I get them in reverse order. Do I need to boost the term, even if it appears more than once in the document? Regards Meeraj On Wed, May 16, 2012 at 9:52 PM, Meeraj Kunnumpurath < meeraj.kunnumpur...@asyska.com> wrote: > This is the output I get from explaining the plan .. > > > Found 2 hits. > 1. XYZ Takeaway f...@company.com > 0.5148823 = (MATCH) sum of: > 0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of: > 0.57735026 = queryWeight(searchText:takeaway), product of: > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.97109574 = queryNorm > 0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of: > 1.0 = tf(termFreq(searchText:takeaway)=1) > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.5 = fieldNorm(field=searchText, doc=1) > 0.34325486 = (MATCH) sum of: > 0.17162743 = (MATCH) weight(searchText:fred in 1), product of: > 0.57735026 = queryWeight(searchText:fred), product of: > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.97109574 = queryNorm > 0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of: > 1.0 = tf(termFreq(searchText:fred)=1) > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.5 = fieldNorm(field=searchText, doc=1) > 0.17162743 = (MATCH) weight(searchText:company.com in 1), product of: > 0.57735026 = queryWeight(searchText:company.com), product of: > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.97109574 = queryNorm > 0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1), > product of: > 1.0 = tf(termFreq(searchText:company.com)=1) > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.5 = fieldNorm(field=searchText, doc=1) > > > 2. ABC Takeaway f...@company.com f...@company.com > 0.49279732 = (MATCH) sum of: > 0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of: > 0.57735026 = queryWeight(searchText:takeaway), product of: > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.97109574 = queryNorm > 0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of: > 1.0 = tf(termFreq(searchText:takeaway)=1) > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.375 = fieldNorm(field=searchText, doc=0) > 0.36407676 = (MATCH) sum of: > 0.18203838 = (MATCH) weight(searchText:fred in 0), product of: > 0.57735026 = queryWeight(searchText:fred), product of: > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.97109574 = queryNorm > 0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of: > 1.4142135 = tf(termFreq(searchText:fred)=2) > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.375 = fieldNorm(field=searchText, doc=0) > 0.18203838 = (MATCH) weight(searchText:company.com in 0), product of: > 0.57735026 = queryWeight(searchText:company.com), product of: > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.97109574 = queryNorm > 0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0), > product of: > 1.4142135 = tf(termFreq(searchText:company.com)=2) > 0.5945349 = idf(docFreq=2, maxDocs=2) > 0.375 = fieldNorm(field=searchText, doc=0) > > > On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath < > meeraj.kunnumpur...@asyska.com> wrote: > >> The actual query is >> >> Query q = new QueryParser(Version.LUCENE_35, "searchText", >> analyzer).parse("Takeaway f...@company.com"); >> >> If I use >> >> Query q = new QueryParser(Version.LUCENE_35, "searchText", >> analyzer).parse("f...@company.com"); >> >> I get them in the reverse order. >> >> Regards >> Meeraj >> >> >> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath < >> meeraj.kunnumpur...@asyska.com> wrote: >> >>> I have tried the same using Lucene directly with the following code, >>> >>> import org.apache.lucene.store.RAMDirectory; >>> import org.apache.lucene.document.Document; >>> import org.apache.lucene.document.Field; >>> import org.apache.lucene.index.IndexWriterConfig; >>> import org.apache.lucene.util.Version; >>> import org.apache.lucene.analysis.standard.StandardAnalyzer; >>> import org.apache.lucene.index.IndexWriter; >>> import org.apache.lucene.queryParser.QueryParser; >>> import org.apache.lucene.index.IndexReader; >>> import org.apache.lucene.search.IndexSearcher; >>> import org.apache.lucene.search.Query; >>> import org.apache.lucene.search.TopScoreDocCollector; >>> import org.apache.lucene.search.ScoreDoc; >>> >>> public class LuceneTest { >>> >>> public static void main(String[] args) throws Exception { >>> >>> StandardAnalyzer analyzer = new >>> StandardAnalyzer(Version.LUCENE_35); >>> RAMDirectory index = new RAMDirectory(); >>> IndexWriterConfig config = new >>> IndexWriterConfig(Version.LUCENE_35, >>> analyzer); >>> IndexWriter indexWriter = new IndexWriter(index, config); >>> >>> Document doc1 = new Document(); >>> doc1.add(new Field("searchText", "ABC Takeaway f...@company.com >>> f...@company.com", Field.Store.YES, Field.Index.ANALYZED)); >>> Document doc2 = new Document(); >>> doc2.add(new Field("searchText", "XYZ Takeaway f...@company.com", >>> Field.Store.YES, Field.Index.ANALYZED)); >>> >>> indexWriter.addDocument(doc1); >>> indexWriter.addDocument(doc2); >>> indexWriter.close(); >>> >>> Query q = new QueryParser(Version.LUCENE_35, "searchText", >>> analyzer).parse("Takeaway"); >>> >>> int hitsPerPage = 10; >>> IndexReader reader = IndexReader.open(index); >>> IndexSearcher searcher = new IndexSearcher(reader); >>> TopScoreDocCollector collector = >>> TopScoreDocCollector.create(hitsPerPage, true); >>> searcher.search(q, collector); >>> ScoreDoc[] hits = collector.topDocs().scoreDocs; >>> >>> System.out.println("Found " + hits.length + " hits."); >>> for(int i=0;i<hits.length;++i) { >>> int docId = hits[i].doc; >>> Document d = searcher.doc(docId); >>> System.out.println((i + 1) + ". " + d.get("searchText")); >>> } >>> >>> } >>> >>> } >>> >>> The output is .. >>> >>> Found 2 hits. >>> 1. XYZ Takeaway f...@company.com >>> 2. ABC Takeaway f...@company.com f...@company.com >>> >>> >>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath < >>> meeraj.kunnumpur...@asyska.com> wrote: >>> >>>> Thanks Ivan. >>>> >>>> I don't use Lucene directly, it is used behind the scene by the Neo4J >>>> graph database for full-text indexing. According to their documentation for >>>> full text indexes they use white space tokenizer in the analyser. Yes, I do >>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the >>>> search string, and just put "f...@company.com", I get Listing 1 first. >>>> >>>> Regards >>>> Meeraj >>>> >>>> >>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <i...@brusic.com> wrote: >>>> >>>>> Use the explain function to understand why the query is producing the >>>>> results you see. >>>>> >>>>> >>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query >>>>> , >>>>> int) >>>>> >>>>> Does your current query return Listing 2 first? That might be because >>>>> of term frequencies. Which analyzers are you using? >>>>> >>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63 >>>>> >>>>> Cheers, >>>>> >>>>> Ivan >>>>> >>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath >>>>> <meeraj.kunnumpur...@asyska.com> wrote: >>>>> > Hi, >>>>> > >>>>> > I am quite new to Lucene. I am trying to use it to index listings of >>>>> local >>>>> > businesses. The index has only one field, that stores the attributes >>>>> of a >>>>> > listing as well as email addresses of users who have rated that >>>>> business. >>>>> > >>>>> > For example, >>>>> > >>>>> > Listing 1: "XYZ Takeaway London f...@company.com bar...@company.com >>>>> > f...@company.com" >>>>> > Listing 2: "ABC Takeaway London f...@company.com bar...@company.com" >>>>> > >>>>> > Now when the user does a search with "Takeaway f...@company.com", >>>>> how do I >>>>> > get listing 1 to always come before listing 2, because it has the >>>>> term >>>>> > f...@company.com appear twice where as listing 2 has it only once? >>>>> > >>>>> > Regards >>>>> > Meeraj >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>> >>> >> >