Search Problem

Amin Mohammed-Coleman Thu, 01 Jan 2009 12:29:36 -0800

Hi

I have created a RTFHandler which takes a RTF file and creates alucene Document which is indexed. The RTFHandler looks like somethinglike this:


if (bodyText != null) {
                        Document document = new Document();

Field field = new Field(MetaDataEnum.BODY.getDescription(),bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED);

                        document.add(field);
                        
                
}

I am using Java Built in RTF text extraction. When I run my test toverify that the document contains text that I expect this works fine.I get the following when I print the document:

Document<stored/uncompressed,indexed,tokenized<body:This is a test rtfdocument that will be indexed.

Amin Mohammed-Coleman> stored/uncompressed,indexed<path:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<name:rtfDocumentToIndex.rtf> stored/uncompressed,indexed<type:RTF_INDEXER> stored/uncompressed,indexed<summary:This is a >>



The problem is when I use the following to search I get no result:

MultiSearcher multiSearcher = new MultiSearcher(new Searchable[]{rtfIndexSearcher});

                        Term t = new Term("body", "Amin");
                        TermQuery termQuery = new TermQuery(t);
                        TopDocs topDocs = multiSearcher.search(termQuery, 1);
                        System.out.println(topDocs.totalHits);
                        multiSearcher.close();

RftIndexSearcher is configured with the directory that holds rtfdocuments. I have used Luke to look at the document and what I amfinding in the overview tab is the following for the document:


1       body    test
1       id      1234
1       name    rtfDocumentToIndex.rtf
1       path    rtfDocumentToIndex.rtf
1       summary This is a
1       type    RTF_INDEXER
1       body    rtf


However on the Document tab I am getting (in the body field):

This is a test rtf document that will be indexed.

Amin Mohammed-Coleman

I would expect to get a hit using "Amin" or even "document". I am notsure whether the

line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);

is incorrect as I am not too sure of the meaning of "Finds the top nhits for query." for search (Query query, int n) according to java docs.

I would be grateful if someone may be able to advise on what I may bedoing wrong. I am using Lucene 2.4.0



Cheers
Amin

Search Problem

Reply via email to