If you read the explain output, you can see where the scores are different. One difference with a noticeable affect is:
1.0 = tf(termFreq(searchText:fred)=1) 0.5 = fieldNorm(field=searchText, doc=1) vs. 1.4142135 = tf(termFreq(searchText:fred)=2) 0.375 = fieldNorm(field=searchText, doc=0) As predicted, the term frequencies and norms are affecting the scoring. Try omitting norms on the field and try your query again. field.setOmitNorms(true) or Field.Index.ANALYZED_NO_NORMS Cheers, Ivan On Wed, May 16, 2012 at 1:54 PM, Meeraj Kunnumpurath <meeraj.kunnumpur...@asyska.com> wrote: > Also, if I do the below > > Query q = new QueryParser(Version.LUCENE_35, "searchText", > analyzer).parse("Takeaway f...@company.com^100") > > I get them in reverse order. Do I need to boost the term, even if it > appears more than once in the document? > > Regards > Meeraj > > On Wed, May 16, 2012 at 9:52 PM, Meeraj Kunnumpurath < > meeraj.kunnumpur...@asyska.com> wrote: > >> This is the output I get from explaining the plan .. >> >> >> Found 2 hits. >> 1. XYZ Takeaway f...@company.com >> 0.5148823 = (MATCH) sum of: >> 0.17162743 = (MATCH) weight(searchText:takeaway in 1), product of: >> 0.57735026 = queryWeight(searchText:takeaway), product of: >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.97109574 = queryNorm >> 0.29726744 = (MATCH) fieldWeight(searchText:takeaway in 1), product of: >> 1.0 = tf(termFreq(searchText:takeaway)=1) >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.5 = fieldNorm(field=searchText, doc=1) >> 0.34325486 = (MATCH) sum of: >> 0.17162743 = (MATCH) weight(searchText:fred in 1), product of: >> 0.57735026 = queryWeight(searchText:fred), product of: >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.97109574 = queryNorm >> 0.29726744 = (MATCH) fieldWeight(searchText:fred in 1), product of: >> 1.0 = tf(termFreq(searchText:fred)=1) >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.5 = fieldNorm(field=searchText, doc=1) >> 0.17162743 = (MATCH) weight(searchText:company.com in 1), product of: >> 0.57735026 = queryWeight(searchText:company.com), product of: >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.97109574 = queryNorm >> 0.29726744 = (MATCH) fieldWeight(searchText:company.com in 1), >> product of: >> 1.0 = tf(termFreq(searchText:company.com)=1) >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.5 = fieldNorm(field=searchText, doc=1) >> >> >> 2. ABC Takeaway f...@company.com f...@company.com >> 0.49279732 = (MATCH) sum of: >> 0.12872057 = (MATCH) weight(searchText:takeaway in 0), product of: >> 0.57735026 = queryWeight(searchText:takeaway), product of: >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.97109574 = queryNorm >> 0.22295058 = (MATCH) fieldWeight(searchText:takeaway in 0), product of: >> 1.0 = tf(termFreq(searchText:takeaway)=1) >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.375 = fieldNorm(field=searchText, doc=0) >> 0.36407676 = (MATCH) sum of: >> 0.18203838 = (MATCH) weight(searchText:fred in 0), product of: >> 0.57735026 = queryWeight(searchText:fred), product of: >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.97109574 = queryNorm >> 0.31529972 = (MATCH) fieldWeight(searchText:fred in 0), product of: >> 1.4142135 = tf(termFreq(searchText:fred)=2) >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.375 = fieldNorm(field=searchText, doc=0) >> 0.18203838 = (MATCH) weight(searchText:company.com in 0), product of: >> 0.57735026 = queryWeight(searchText:company.com), product of: >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.97109574 = queryNorm >> 0.31529972 = (MATCH) fieldWeight(searchText:company.com in 0), >> product of: >> 1.4142135 = tf(termFreq(searchText:company.com)=2) >> 0.5945349 = idf(docFreq=2, maxDocs=2) >> 0.375 = fieldNorm(field=searchText, doc=0) >> >> >> On Wed, May 16, 2012 at 9:50 PM, Meeraj Kunnumpurath < >> meeraj.kunnumpur...@asyska.com> wrote: >> >>> The actual query is >>> >>> Query q = new QueryParser(Version.LUCENE_35, "searchText", >>> analyzer).parse("Takeaway f...@company.com"); >>> >>> If I use >>> >>> Query q = new QueryParser(Version.LUCENE_35, "searchText", >>> analyzer).parse("f...@company.com"); >>> >>> I get them in the reverse order. >>> >>> Regards >>> Meeraj >>> >>> >>> On Wed, May 16, 2012 at 9:48 PM, Meeraj Kunnumpurath < >>> meeraj.kunnumpur...@asyska.com> wrote: >>> >>>> I have tried the same using Lucene directly with the following code, >>>> >>>> import org.apache.lucene.store.RAMDirectory; >>>> import org.apache.lucene.document.Document; >>>> import org.apache.lucene.document.Field; >>>> import org.apache.lucene.index.IndexWriterConfig; >>>> import org.apache.lucene.util.Version; >>>> import org.apache.lucene.analysis.standard.StandardAnalyzer; >>>> import org.apache.lucene.index.IndexWriter; >>>> import org.apache.lucene.queryParser.QueryParser; >>>> import org.apache.lucene.index.IndexReader; >>>> import org.apache.lucene.search.IndexSearcher; >>>> import org.apache.lucene.search.Query; >>>> import org.apache.lucene.search.TopScoreDocCollector; >>>> import org.apache.lucene.search.ScoreDoc; >>>> >>>> public class LuceneTest { >>>> >>>> public static void main(String[] args) throws Exception { >>>> >>>> StandardAnalyzer analyzer = new >>>> StandardAnalyzer(Version.LUCENE_35); >>>> RAMDirectory index = new RAMDirectory(); >>>> IndexWriterConfig config = new >>>> IndexWriterConfig(Version.LUCENE_35, >>>> analyzer); >>>> IndexWriter indexWriter = new IndexWriter(index, config); >>>> >>>> Document doc1 = new Document(); >>>> doc1.add(new Field("searchText", "ABC Takeaway f...@company.com >>>> f...@company.com", Field.Store.YES, Field.Index.ANALYZED)); >>>> Document doc2 = new Document(); >>>> doc2.add(new Field("searchText", "XYZ Takeaway f...@company.com", >>>> Field.Store.YES, Field.Index.ANALYZED)); >>>> >>>> indexWriter.addDocument(doc1); >>>> indexWriter.addDocument(doc2); >>>> indexWriter.close(); >>>> >>>> Query q = new QueryParser(Version.LUCENE_35, "searchText", >>>> analyzer).parse("Takeaway"); >>>> >>>> int hitsPerPage = 10; >>>> IndexReader reader = IndexReader.open(index); >>>> IndexSearcher searcher = new IndexSearcher(reader); >>>> TopScoreDocCollector collector = >>>> TopScoreDocCollector.create(hitsPerPage, true); >>>> searcher.search(q, collector); >>>> ScoreDoc[] hits = collector.topDocs().scoreDocs; >>>> >>>> System.out.println("Found " + hits.length + " hits."); >>>> for(int i=0;i<hits.length;++i) { >>>> int docId = hits[i].doc; >>>> Document d = searcher.doc(docId); >>>> System.out.println((i + 1) + ". " + d.get("searchText")); >>>> } >>>> >>>> } >>>> >>>> } >>>> >>>> The output is .. >>>> >>>> Found 2 hits. >>>> 1. XYZ Takeaway f...@company.com >>>> 2. ABC Takeaway f...@company.com f...@company.com >>>> >>>> >>>> On Wed, May 16, 2012 at 9:21 PM, Meeraj Kunnumpurath < >>>> meeraj.kunnumpur...@asyska.com> wrote: >>>> >>>>> Thanks Ivan. >>>>> >>>>> I don't use Lucene directly, it is used behind the scene by the Neo4J >>>>> graph database for full-text indexing. According to their documentation >>>>> for >>>>> full text indexes they use white space tokenizer in the analyser. Yes, I >>>>> do >>>>> get Listing 2 first now. Though if I exclude the term "Takeaway" from the >>>>> search string, and just put "f...@company.com", I get Listing 1 first. >>>>> >>>>> Regards >>>>> Meeraj >>>>> >>>>> >>>>> On Wed, May 16, 2012 at 8:49 PM, Ivan Brusic <i...@brusic.com> wrote: >>>>> >>>>>> Use the explain function to understand why the query is producing the >>>>>> results you see. >>>>>> >>>>>> >>>>>> http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Searcher.html#explain(org.apache.lucene.search.Query >>>>>> , >>>>>> int) >>>>>> >>>>>> Does your current query return Listing 2 first? That might be because >>>>>> of term frequencies. Which analyzers are you using? >>>>>> >>>>>> http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e63 >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Ivan >>>>>> >>>>>> On Wed, May 16, 2012 at 12:41 PM, Meeraj Kunnumpurath >>>>>> <meeraj.kunnumpur...@asyska.com> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > I am quite new to Lucene. I am trying to use it to index listings of >>>>>> local >>>>>> > businesses. The index has only one field, that stores the attributes >>>>>> of a >>>>>> > listing as well as email addresses of users who have rated that >>>>>> business. >>>>>> > >>>>>> > For example, >>>>>> > >>>>>> > Listing 1: "XYZ Takeaway London f...@company.com bar...@company.com >>>>>> > f...@company.com" >>>>>> > Listing 2: "ABC Takeaway London f...@company.com bar...@company.com" >>>>>> > >>>>>> > Now when the user does a search with "Takeaway f...@company.com", >>>>>> how do I >>>>>> > get listing 1 to always come before listing 2, because it has the >>>>>> term >>>>>> > f...@company.com appear twice where as listing 2 has it only once? >>>>>> > >>>>>> > Regards >>>>>> > Meeraj >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>> >>>> >>> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org