Re: Length of the filed does not affect the doc score accurately for chinese analyzer(SmartChineseAnalyzer)

Erick Erickson Wed, 15 Jan 2014 04:30:49 -0800

the lengths of fields are encoded and lose some precision. So
I suspect the length of the field calculated for the two documents
are the same after encoding.


Adding &debug=all to the query will show you if this is the case.

Best
Erick

On Wed, Jan 15, 2014 at 3:39 AM, andy <yhl...@sohu.com> wrote:
> Hi guys,
>
> As the topic,it seems that the length of filed does not affect the doc score
> accurately for chinese analyzer in my source code
>
> index source code
>
>  private static Directory DIRECTORY;
>
>
>     @BeforeClass
>     public static void before() throws IOException {
>           DIRECTORY = new RAMDirectory();
>           Analyzer chineseanalyzer = new
> SmartChineseAnalyzer(Version.LUCENE_40);
>           IndexWriterConfig indexWriterConfig = new
> IndexWriterConfig(Version.LUCENE_40,chineseanalyzer);
>           FieldType nameType = new FieldType();
>           nameType.setIndexed(true);
>           nameType.setStored(true);
>           nameType.setOmitNorms(false);
>           try {
>               IndexWriter indexWriter = new IndexWriter(DIRECTORY,
> indexWriterConfig);
>
>               List<String> nameList = new ArrayList<String>();
>
> nameList.add("咨询公司");nameList.add("飞鹰咨询管理咨询公司");nameList.add("北京中标咨询公司");nameList.add("重庆咨询公司");nameList.add("商务咨询服务公司");nameList.add("法律咨询公司");
>               for (int i = 0; i < nameList.size(); i++) {
>                   Document document = new Document();
>                   document.add(new Field("name", nameList.get(i),
> nameType));
>                   document.add(new
> Field("id",String.valueOf(i+1),nameType));
>                   indexWriter.addDocument(document);
>             }
>               indexWriter.commit();
>           } catch (IOException e) {
>               // TODO Auto-generated catch block
>               e.printStackTrace();
>           }
>     }
>
> search snippet:
>  @Test
>     public void testChinese() throws IOException, ParseException {
>         String keyword = "咨询公司";
>         System.out.println("Searching for:" + keyword);
>         System.out.println();
>         IndexReader indexReader = DirectoryReader.open(DIRECTORY);
>         IndexSearcher indexSearcher = new IndexSearcher(indexReader);
>         Query query = null;
>         query = new QueryParser(Version.LUCENE_40,"name",new
> SmartChineseAnalyzer(Version.LUCENE_40)).parse(keyword);
>         TopDocs topDocs = indexSearcher.search(query,15);
>         System.out.println("Search Result:");
>         if (null !=topDocs && 0 < topDocs.totalHits) {
>             for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
>                 System.out.println("doc id:" +
> indexSearcher.doc(scoreDoc.doc).get("id"));
>                 String name = indexSearcher.doc(scoreDoc.doc).get("name");
>                 System.out.println("content of Field:" + name);
>                 dumpCNTokens(name);
>                 System.out.println("score:" + scoreDoc.score);
>
> System.out.println("-------------------------------------------");
>             }
>         } else {
>             System.out.println("no results");
>         }
>
>     }
>
>
> And search result as follows:
> Searching for:咨询公司
>
> Search Result:
> doc id:1
> content of Field:咨询公司
> Terms:咨询        公司
> score:0.74763227
> -------------------------------------------
> doc id:2
> content of Field:飞鹰咨询管理咨询公司
> Terms:飞鹰        咨询      管理      咨询      公司
> score:0.6317303
> -------------------------------------------
> doc id:3
> content of Field:北京中标咨询公司
> Terms:北京        中标      咨询      公司
> score:0.5981058
> -------------------------------------------
> doc id:4
> content of Field:重庆咨询公司
> Terms:重庆        咨询      公司
> score:0.5981058
> -------------------------------------------
> doc id:5
> content of Field:商务咨询服务公司
> Terms:商务        咨询      服务      公司
> score:0.5981058
> -------------------------------------------
> doc id:6
> content of Field:法律咨询公司
> Terms:法律        咨询      公司
> score:0.5981058
> -------------------------------------------
>
> docs:3,4,5,6 have the same score, but I think the doc 4 and doc 6 should
> have a higner score than the doc 3,5, becase the doc 4 and doc 6 have three
> terms ,doc 3,5 have four terms.
> Am I right? who can give me a explanation? And how to get the expected
> result?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Length-of-the-filed-does-not-affect-the-doc-score-accurately-for-chinese-analyzer-SmartChineseAnalyz-tp4111390.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Length of the filed does not affect the doc score accurately for chinese analyzer(SmartChineseAnalyzer)

Reply via email to