Yes, you can get all the current IDs from Solr, but it's a bit cumbersome. Use the /export handler (you have to insure that all fields you return are docValues="true", then write some sort of script that diffed them against your source file.
There's nothing in Solr that will do the diff for you, you'll have to "roll your own" here. What people often do is keep a list of changes and operate on _that_. In the DB world that's a trigger for operations on your table along with an operation, so you'd have something like: op ID delete 123 add 456 Then you process those changes in order for your deltas. Perhaps you could do something similar with whatever changes the file in the first place either with a DB or a text file somewhere.... How do you detect if a particular doc is _changed_? You'll have to re-index then too.... Best, Erick Best, Erick On Fri, May 19, 2017 at 12:51 PM, William Nelis <[email protected]> wrote: > Hello. > > > > I am new to Solr and have a question about incremental indexing. We have a > source text file that contains millions of rows. Each row is saved as a > document in Solr. There is one field in each row that is a unique > identifier. > > > > Unfortunately, this source text file can change. We need to check it every > hour for changes. If rows are removed, we must remove them from Solr. If > rows are added, we must add them to Solr. > > > > We do not want to drop all records and re-load them. Instead we would like > to diff for the changes. What is the recommended way of doing this? Can we > just get all values Solr stores for the unique identifier field and do the > diff external to Solr? Does Solr provide functionality that will allow us to > do the incremental changes even though the source file itself is not > incremental? > > > > > > An example of the file format (obviously this is not a real file): > > > > AAQX This is the first document 213.32 > > AAZT This is the second document 243.23 > > ABGT This is the third document 321.43 > > ... > > > > The first column is the unique identifier (there are far more columns, but > this has been simplified). > > > > > > Thank you for your help. > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
