Based on suggestion here implemented a script to un-invert the index
(details at OAK-7122 [1], [2]).
uninverting was done by following logic
def collectFieldNames(DirectoryReader reader) {
println "Proceeding to collect the field names per document"
Bits liveDocs = MultiFields.
> This isn't an API problem. This is by design -- this is how it works.
Ack. What I was referring to wrt api earlier that uninverting the
index is not a direct operation and hence not supported via api. This
would need to be done by using other api and would require post
processing of index conten
> That helps and explains why there is no support in std api
This isn't an API problem. This is by design -- this is how it works.
If you wish
to retrieve fields that are indexed and stored with the document, the
API provides
such an option (indexed and stored field type). Your indexed fields
are
>> So unless you "store" that value
>> with the document as a stored field, you'll have to "uninvert" the
>> index yourself.
That helps and explains why there is no support in std api
> Luke has some capabilities to look at the index at a low level,
> perhaps that could give you some pointers. I
Luke has some capabilities to look at the index at a low level,
perhaps that could give you some pointers. I think you can pull
the older branch from here:
https://github.com/DmitryKey/luke
or:
https://code.google.com/archive/p/luke/
NOTE: This is not a part of Lucene, but an independent project
Ok. I think you should look at the Java API -- this will give you more
clarity of what is actually stored in the index
and how to extract it. The thing (I think) you're missing is that an
inverted index points in the "other" direction (from a given value to
all documents that contained it). So unle
> Only stored fields are kept for each document. If you need to dump
> internal data structures (terms, positions, offsets, payloads, you
> name it) you'll need to dive into the API and traverse all segments,
> then dump the above (and note that document IDs are per-segment and
> will have to be so
Only stored fields are kept for each document. If you need to dump
internal data structures (terms, positions, offsets, payloads, you
name it) you'll need to dive into the API and traverse all segments,
then dump the above (and note that document IDs are per-segment and
will have to be somehow cons
> How about the quickest solution: dump the content of both indexes to a
document-per-line text
That would work (and is the plan) but so far I can only get stored
field per document and no other data on per document basis. What other
data we can get on per document basis using the Lucene API?
Chet
How about the quickest solution: dump the content of both indexes to a
document-per-line text
file, sort, diff?
Even if your indexes are large, if you have large spare disk, this
will be super fast.
Dawid
On Tue, Jan 2, 2018 at 7:33 AM, Chetan Mehrotra
wrote:
> Hi,
>
> We use Lucene for indexin
Hi,
We use Lucene for indexing in Jackrabbit Oak [2]. Recently we
implemented a new indexing approach [1] which traverses the data to be
indexed in a different way compared to the traversal approach we have
been using so far. The new approach is faster and produces index with
same number of docume
To paraphrase "Why do you want to know"? These kinds of questions
are so lacking in context that meaningful help is hard to offer. What
problem
are you trying to solve? Why do you want to compare indexes?
Erick
On Nov 9, 2007 12:14 PM, Lucene User <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I wanted tw
Hi,
I wanted two compare two indexes.Please recommend an algorithm
which takes all the factors into accoubt such as versions of software
being used by
lucene and application which has an effect on the index being
created.We can also
compare with certain fields and the text.
Regards
--
13 matches
Mail list logo