Luke has some capabilities to look at the index at a low level,
perhaps that could give you some pointers. I think you can pull
the older branch from here:
https://github.com/DmitryKey/luke
or:
https://code.google.com/archive/p/luke/
NOTE: This is not a part of Lucene, but an independent project
Ok. I think you should look at the Java API -- this will give you more
clarity of what is actually stored in the index
and how to extract it. The thing (I think) you're missing is that an
inverted index points in the "other" direction (from a given value to
all documents that contained it). So unle
> Only stored fields are kept for each document. If you need to dump
> internal data structures (terms, positions, offsets, payloads, you
> name it) you'll need to dive into the API and traverse all segments,
> then dump the above (and note that document IDs are per-segment and
> will have to be so
Only stored fields are kept for each document. If you need to dump
internal data structures (terms, positions, offsets, payloads, you
name it) you'll need to dive into the API and traverse all segments,
then dump the above (and note that document IDs are per-segment and
will have to be somehow cons
> How about the quickest solution: dump the content of both indexes to a
document-per-line text
That would work (and is the plan) but so far I can only get stored
field per document and no other data on per document basis. What other
data we can get on per document basis using the Lucene API?
Chet