Re: Incremental Indexing when Source Data is not Incremental

Erick Erickson Fri, 19 May 2017 13:21:52 -0700

Yes, you can get all the current IDs from Solr, but it's a bit
cumbersome. Use the /export handler (you have to insure that all
fields you return are docValues="true", then write some sort of script
that diffed them against your source file.


There's nothing in Solr that will do the diff for you, you'll have to
"roll your own" here.

What people often do is keep a list of changes and operate on _that_.
In the DB world that's a trigger for operations on your table along
with an operation, so you'd have something like:

op    ID
delete 123
add     456

Then you process those changes in order for your deltas. Perhaps you
could do something similar with whatever changes the file in the first
place either with a DB or a text file somewhere....

How do you detect if a particular doc is _changed_? You'll have to
re-index then too....

Best,
Erick

Best,
Erick

On Fri, May 19, 2017 at 12:51 PM, William Nelis
<[email protected]> wrote:
> Hello.
>
>
>
> I am new to Solr and have a question about incremental indexing. We have a
> source text file that contains millions of rows. Each row is saved as a
> document in Solr. There is one field in each row that is a unique
> identifier.
>
>
>
> Unfortunately, this source text file can change. We need to check it every
> hour for changes. If rows are removed, we must remove them from Solr. If
> rows are added, we must add them to Solr.
>
>
>
> We do not want to drop all records and re-load them. Instead we would like
> to diff for the changes. What is the recommended way of doing this? Can we
> just get all values Solr stores for the unique identifier field and do the
> diff external to Solr? Does Solr provide functionality that will allow us to
> do the incremental changes even though the source file itself is not
> incremental?
>
>
>
>
>
> An example of the file format (obviously this is not a real file):
>
>
>
> AAQX     This is the first document             213.32
>
> AAZT      This is the second document        243.23
>
> ABGT     This is the third document            321.43
>
> ...
>
>
>
> The first column is the unique identifier (there are far more columns, but
> this has been simplified).
>
>
>
>
>
> Thank you for your help.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Incremental Indexing when Source Data is not Incremental

Reply via email to