On 26/07/2012 22:04, Phanindra R wrote:
Thanks for the reply Abdul.

I was exploring the API and I think we can retrieve all those words by
using a brute-force approach.

1) Get all the terms using indexReader.terms()

2) Process the term only if it belongs to the target field.

3) Get all the docs using indexReader.termDocs(term);

4) So, we have the term-doc pairs at this point.

This procedure is implemented in Luke (http://code.google.com/p/luke) in the "Reconstruct & Edit" function. In case of larger indexes it's indeed a time-consuming procedure.


Is there any better approach other than the above forever-taking procedure?

No. Indexing is usually a lossy process - some data is irretrievably lost - and the resulting data structure is not optimized for re-assembling the original content. If you need to retrieve the original content you have to store it, either using stored fields or in an external system.


--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
 ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to