Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-05 Thread Chetan Mehrotra
Based on suggestion here implemented a script to un-invert the index (details at OAK-7122 [1], [2]). uninverting was done by following logic def collectFieldNames(DirectoryReader reader) { println "Proceeding to collect the field names per document" Bits liveDocs = MultiFields.

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-03 Thread Chetan Mehrotra
> This isn't an API problem. This is by design -- this is how it works. Ack. What I was referring to wrt api earlier that uninverting the index is not a direct operation and hence not supported via api. This would need to be done by using other api and would require post processing of index conten

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-03 Thread Dawid Weiss
> That helps and explains why there is no support in std api This isn't an API problem. This is by design -- this is how it works. If you wish to retrieve fields that are indexed and stored with the document, the API provides such an option (indexed and stored field type). Your indexed fields are

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-03 Thread Chetan Mehrotra
>> So unless you "store" that value >> with the document as a stored field, you'll have to "uninvert" the >> index yourself. That helps and explains why there is no support in std api > Luke has some capabilities to look at the index at a low level, > perhaps that could give you some pointers. I

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Erick Erickson
Luke has some capabilities to look at the index at a low level, perhaps that could give you some pointers. I think you can pull the older branch from here: https://github.com/DmitryKey/luke or: https://code.google.com/archive/p/luke/ NOTE: This is not a part of Lucene, but an independent project

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Dawid Weiss
Ok. I think you should look at the Java API -- this will give you more clarity of what is actually stored in the index and how to extract it. The thing (I think) you're missing is that an inverted index points in the "other" direction (from a given value to all documents that contained it). So unle

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Chetan Mehrotra
> Only stored fields are kept for each document. If you need to dump > internal data structures (terms, positions, offsets, payloads, you > name it) you'll need to dive into the API and traverse all segments, > then dump the above (and note that document IDs are per-segment and > will have to be so

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Dawid Weiss
Only stored fields are kept for each document. If you need to dump internal data structures (terms, positions, offsets, payloads, you name it) you'll need to dive into the API and traverse all segments, then dump the above (and note that document IDs are per-segment and will have to be somehow cons

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-02 Thread Chetan Mehrotra
> How about the quickest solution: dump the content of both indexes to a document-per-line text That would work (and is the plan) but so far I can only get stored field per document and no other data on per document basis. What other data we can get on per document basis using the Lucene API? Chet

Re: Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-01 Thread Dawid Weiss
How about the quickest solution: dump the content of both indexes to a document-per-line text file, sort, diff? Even if your indexes are large, if you have large spare disk, this will be super fast. Dawid On Tue, Jan 2, 2018 at 7:33 AM, Chetan Mehrotra wrote: > Hi, > > We use Lucene for indexin

Comparing two indexes for equality - Finding non stored fieldNames per document

2018-01-01 Thread Chetan Mehrotra
Hi, We use Lucene for indexing in Jackrabbit Oak [2]. Recently we implemented a new indexing approach [1] which traverses the data to be indexed in a different way compared to the traversal approach we have been using so far. The new approach is faster and produces index with same number of docume

Re: Comparing Two Indexes

2007-11-12 Thread Erick Erickson
To paraphrase "Why do you want to know"? These kinds of questions are so lacking in context that meaningful help is hard to offer. What problem are you trying to solve? Why do you want to compare indexes? Erick On Nov 9, 2007 12:14 PM, Lucene User <[EMAIL PROTECTED]> wrote: > Hi, > > I wanted tw

Comparing Two Indexes

2007-11-09 Thread Lucene User
Hi, I wanted two compare two indexes.Please recommend an algorithm which takes all the factors into accoubt such as versions of software being used by lucene and application which has an effect on the index being created.We can also compare with certain fields and the text. Regards --