Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Erick Erickson
Luke has some capabilities to look at the index at a low level, perhaps that could give you some pointers. I think you can pull the older branch from here: https://github.com/DmitryKey/luke or: https://code.google.com/archive/p/luke/ NOTE: This is not a part of Lucene, but an independent project

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Dawid Weiss
Ok. I think you should look at the Java API -- this will give you more clarity of what is actually stored in the index and how to extract it. The thing (I think) you're missing is that an inverted index points in the "other" direction (from a given value to all documents that contained it). So unle

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Chetan Mehrotra
> Only stored fields are kept for each document. If you need to dump > internal data structures (terms, positions, offsets, payloads, you > name it) you'll need to dive into the API and traverse all segments, > then dump the above (and note that document IDs are per-segment and > will have to be so

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Dawid Weiss
Only stored fields are kept for each document. If you need to dump internal data structures (terms, positions, offsets, payloads, you name it) you'll need to dive into the API and traverse all segments, then dump the above (and note that document IDs are per-segment and will have to be somehow cons

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Chetan Mehrotra
> How about the quickest solution: dump the content of both indexes to a document-per-line text That would work (and is the plan) but so far I can only get stored field per document and no other data on per document basis. What other data we can get on per document basis using the Lucene API? Chet