Thanks a lot Aditya and Andrzej .. Your responses were really helpful. On Fri, Jul 27, 2012 at 6:15 AM, Andrzej Bialecki <a...@getopt.org> wrote:
> On 26/07/2012 22:04, Phanindra R wrote: > >> Thanks for the reply Abdul. >> >> I was exploring the API and I think we can retrieve all those words by >> using a brute-force approach. >> >> 1) Get all the terms using indexReader.terms() >> >> 2) Process the term only if it belongs to the target field. >> >> 3) Get all the docs using indexReader.termDocs(term); >> >> 4) So, we have the term-doc pairs at this point. >> > > This procedure is implemented in Luke (http://code.google.com/p/luke**) > in the "Reconstruct & Edit" function. In case of larger indexes it's indeed > a time-consuming procedure. > > > >> Is there any better approach other than the above forever-taking >> procedure? >> > > No. Indexing is usually a lossy process - some data is irretrievably lost > - and the resulting data structure is not optimized for re-assembling the > original content. If you need to retrieve the original content you have to > store it, either using stored fields or in an external system. > > > -- > Best regards, > Andrzej Bialecki > http://www.sigram.com, blog http://www.sigram.com/blog > ___.,___,___,___,_._. __________________<><_________**___________ > [___||.__|__/|__||\/|: Information Retrieval, System Integration > ___|||__||..\|..||..|: Contact: info at sigram dot com > > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: > java-user-unsubscribe@lucene.**apache.org<java-user-unsubscr...@lucene.apache.org> > For additional commands, e-mail: > java-user-help@lucene.apache.**org<java-user-h...@lucene.apache.org> > >