the lengths of fields are encoded and lose some precision. So I suspect the length of the field calculated for the two documents are the same after encoding.
Adding &debug=all to the query will show you if this is the case. Best Erick On Wed, Jan 15, 2014 at 3:39 AM, andy <yhl...@sohu.com> wrote: > Hi guys, > > As the topic,it seems that the length of filed does not affect the doc score > accurately for chinese analyzer in my source code > > index source code > > private static Directory DIRECTORY; > > > @BeforeClass > public static void before() throws IOException { > DIRECTORY = new RAMDirectory(); > Analyzer chineseanalyzer = new > SmartChineseAnalyzer(Version.LUCENE_40); > IndexWriterConfig indexWriterConfig = new > IndexWriterConfig(Version.LUCENE_40,chineseanalyzer); > FieldType nameType = new FieldType(); > nameType.setIndexed(true); > nameType.setStored(true); > nameType.setOmitNorms(false); > try { > IndexWriter indexWriter = new IndexWriter(DIRECTORY, > indexWriterConfig); > > List<String> nameList = new ArrayList<String>(); > > nameList.add("咨询公司");nameList.add("飞鹰咨询管理咨询公司");nameList.add("北京中标咨询公司");nameList.add("重庆咨询公司");nameList.add("商务咨询服务公司");nameList.add("法律咨询公司"); > for (int i = 0; i < nameList.size(); i++) { > Document document = new Document(); > document.add(new Field("name", nameList.get(i), > nameType)); > document.add(new > Field("id",String.valueOf(i+1),nameType)); > indexWriter.addDocument(document); > } > indexWriter.commit(); > } catch (IOException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > } > > search snippet: > @Test > public void testChinese() throws IOException, ParseException { > String keyword = "咨询公司"; > System.out.println("Searching for:" + keyword); > System.out.println(); > IndexReader indexReader = DirectoryReader.open(DIRECTORY); > IndexSearcher indexSearcher = new IndexSearcher(indexReader); > Query query = null; > query = new QueryParser(Version.LUCENE_40,"name",new > SmartChineseAnalyzer(Version.LUCENE_40)).parse(keyword); > TopDocs topDocs = indexSearcher.search(query,15); > System.out.println("Search Result:"); > if (null !=topDocs && 0 < topDocs.totalHits) { > for (ScoreDoc scoreDoc : topDocs.scoreDocs) { > System.out.println("doc id:" + > indexSearcher.doc(scoreDoc.doc).get("id")); > String name = indexSearcher.doc(scoreDoc.doc).get("name"); > System.out.println("content of Field:" + name); > dumpCNTokens(name); > System.out.println("score:" + scoreDoc.score); > > System.out.println("-------------------------------------------"); > } > } else { > System.out.println("no results"); > } > > } > > > And search result as follows: > Searching for:咨询公司 > > Search Result: > doc id:1 > content of Field:咨询公司 > Terms:咨询 公司 > score:0.74763227 > ------------------------------------------- > doc id:2 > content of Field:飞鹰咨询管理咨询公司 > Terms:飞鹰 咨询 管理 咨询 公司 > score:0.6317303 > ------------------------------------------- > doc id:3 > content of Field:北京中标咨询公司 > Terms:北京 中标 咨询 公司 > score:0.5981058 > ------------------------------------------- > doc id:4 > content of Field:重庆咨询公司 > Terms:重庆 咨询 公司 > score:0.5981058 > ------------------------------------------- > doc id:5 > content of Field:商务咨询服务公司 > Terms:商务 咨询 服务 公司 > score:0.5981058 > ------------------------------------------- > doc id:6 > content of Field:法律咨询公司 > Terms:法律 咨询 公司 > score:0.5981058 > ------------------------------------------- > > docs:3,4,5,6 have the same score, but I think the doc 4 and doc 6 should > have a higner score than the doc 3,5, becase the doc 4 and doc 6 have three > terms ,doc 3,5 have four terms. > Am I right? who can give me a explanation? And how to get the expected > result? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Length-of-the-filed-does-not-affect-the-doc-score-accurately-for-chinese-analyzer-SmartChineseAnalyz-tp4111390.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org