Re: Getting terms from unstored fields, doc-wise

Andrzej Bialecki Fri, 27 Jul 2012 06:15:43 -0700

On 26/07/2012 22:04, Phanindra R wrote:

Thanks for the reply Abdul.


I was exploring the API and I think we can retrieve all those words by
using a brute-force approach.

1) Get all the terms using indexReader.terms()

2) Process the term only if it belongs to the target field.

3) Get all the docs using indexReader.termDocs(term);

4) So, we have the term-doc pairs at this point.

This procedure is implemented in Luke (http://code.google.com/p/luke) inthe "Reconstruct & Edit" function. In case of larger indexes it's indeeda time-consuming procedure.


Is there any better approach other than the above forever-taking procedure?

No. Indexing is usually a lossy process - some data is irretrievablylost - and the resulting data structure is not optimized forre-assembling the original content. If you need to retrieve the originalcontent you have to store it, either using stored fields or in anexternal system.



--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
 ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Getting terms from unstored fields, doc-wise

Reply via email to