Hi Erick, a statement like " Adding &debug=all to the query will show you if this is the case" will not help a Lucene user, as it is only available in the Solr server. But Andy uses Lucene directly. In his case he should use IndexSearcher's explain functionalities to retrieve a structured output of how the documents are scored for this query for debugging:
http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/IndexSearcher.html#explain(org.apache.lucene.search.Query, int) But yes, the length norm is encoded with loss of precsision in Lucene (it is a float values encoded to 1 byte only). With Lucene 4 there are ways to change that behavior, but that included changing the similarity implementation and use a different DocValues type for encoding the norms. In most cases this is not needed, because user won't notice. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Wednesday, January 15, 2014 1:30 PM > To: java-user > Subject: Re: Length of the filed does not affect the doc score accurately for > chinese analyzer(SmartChineseAnalyzer) > > the lengths of fields are encoded and lose some precision. So I suspect the > length of the field calculated for the two documents are the same after > encoding. > > Adding &debug=all to the query will show you if this is the case. > > Best > Erick > > On Wed, Jan 15, 2014 at 3:39 AM, andy <yhl...@sohu.com> wrote: > > Hi guys, > > > > As the topic,it seems that the length of filed does not affect the doc > > score accurately for chinese analyzer in my source code > > > > index source code > > > > private static Directory DIRECTORY; > > > > > > @BeforeClass > > public static void before() throws IOException { > > DIRECTORY = new RAMDirectory(); > > Analyzer chineseanalyzer = new > > SmartChineseAnalyzer(Version.LUCENE_40); > > IndexWriterConfig indexWriterConfig = new > > IndexWriterConfig(Version.LUCENE_40,chineseanalyzer); > > FieldType nameType = new FieldType(); > > nameType.setIndexed(true); > > nameType.setStored(true); > > nameType.setOmitNorms(false); > > try { > > IndexWriter indexWriter = new IndexWriter(DIRECTORY, > > indexWriterConfig); > > > > List<String> nameList = new ArrayList<String>(); > > > > nameList.add("咨询公司");nameList.add("飞鹰咨询管理咨询公司 > ");nameList.add("北京中标咨询公司");nameList.add("重庆咨询公司 > ");nameList.add("商务咨询服务公司");nameList.add("法律咨询公司"); > > for (int i = 0; i < nameList.size(); i++) { > > Document document = new Document(); > > document.add(new Field("name", nameList.get(i), > > nameType)); > > document.add(new > > Field("id",String.valueOf(i+1),nameType)); > > indexWriter.addDocument(document); > > } > > indexWriter.commit(); > > } catch (IOException e) { > > // TODO Auto-generated catch block > > e.printStackTrace(); > > } > > } > > > > search snippet: > > @Test > > public void testChinese() throws IOException, ParseException { > > String keyword = "咨询公司"; > > System.out.println("Searching for:" + keyword); > > System.out.println(); > > IndexReader indexReader = DirectoryReader.open(DIRECTORY); > > IndexSearcher indexSearcher = new IndexSearcher(indexReader); > > Query query = null; > > query = new QueryParser(Version.LUCENE_40,"name",new > > SmartChineseAnalyzer(Version.LUCENE_40)).parse(keyword); > > TopDocs topDocs = indexSearcher.search(query,15); > > System.out.println("Search Result:"); > > if (null !=topDocs && 0 < topDocs.totalHits) { > > for (ScoreDoc scoreDoc : topDocs.scoreDocs) { > > System.out.println("doc id:" + > > indexSearcher.doc(scoreDoc.doc).get("id")); > > String name = indexSearcher.doc(scoreDoc.doc).get("name"); > > System.out.println("content of Field:" + name); > > dumpCNTokens(name); > > System.out.println("score:" + scoreDoc.score); > > > > System.out.println("-------------------------------------------"); > > } > > } else { > > System.out.println("no results"); > > } > > > > } > > > > > > And search result as follows: > > Searching for:咨询公司 > > > > Search Result: > > doc id:1 > > content of Field:咨询公司 > > Terms:咨询 公司 > > score:0.74763227 > > ------------------------------------------- > > doc id:2 > > content of Field:飞鹰咨询管理咨询公司 > > Terms:飞鹰 咨询 管理 咨询 公司 > > score:0.6317303 > > ------------------------------------------- > > doc id:3 > > content of Field:北京中标咨询公司 > > Terms:北京 中标 咨询 公司 > > score:0.5981058 > > ------------------------------------------- > > doc id:4 > > content of Field:重庆咨询公司 > > Terms:重庆 咨询 公司 > > score:0.5981058 > > ------------------------------------------- > > doc id:5 > > content of Field:商务咨询服务公司 > > Terms:商务 咨询 服务 公司 > > score:0.5981058 > > ------------------------------------------- > > doc id:6 > > content of Field:法律咨询公司 > > Terms:法律 咨询 公司 > > score:0.5981058 > > ------------------------------------------- > > > > docs:3,4,5,6 have the same score, but I think the doc 4 and doc 6 > > should have a higner score than the doc 3,5, becase the doc 4 and doc > > 6 have three terms ,doc 3,5 have four terms. > > Am I right? who can give me a explanation? And how to get the expected > > result? > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Length-of-the-filed-does-not-affect > > -the-doc-score-accurately-for-chinese-analyzer-SmartChineseAnalyz-tp41 > > 11390.html Sent from the Lucene - Java Users mailing list archive at > > Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org