On 6/13/2022 10:14 AM, Christopher Schultz wrote:
1. Re: regular re-indexes. I've just built this into my web application so it's literally a one-click administrative background-process kick-off. I've been trying to get automatic schema-provisioning as well (see my recent posts to users@) just in case the index doesn't even exist at first. The idea is to make new application installations / DR a simpler and more automated process.

The best option is to entirely eradicate the existing index before rebuilding it.  One way to do this is to completely delete the index directory and then reload the core or restart Solr.  Another way is to delete all documents and then optimize the index. Lucene will see that none of the segments contain non-deleted documents and will completely delete them all.  It should be effectively equivalent to deleting the index directory and reloading.  This is what my rebuild script for my current Solr install does.  A full reindex only takes about ten minutes, though.

2. "Index upgrader tool" -- I have no idea what this is. Do I need to care? Or are you saying that if I upgrade from 7.x -> 9.x I won't even be able to write to the same on-disk index artifacts at all, unless I create a new core?

IndexUpgrader is something provided by Lucene.  All it does is a forceMerge down to one segment -- equivalent to "optimize" in Solr.  This upgrades the index to the current Lucene version as fully as is possible, but the version from the old segments is preserved even through the merge.

That version preservation is why if you try upgrading from 7.x to 9.x, even if you take an intermediate step of running IndexUpgrader from 8.x, you won't even be able to READ the index, much less write to it.  Lucene will refuse to open it.

4. Re: Complete re-build of infrastructure + cut-over: we abuse Solr a little and use it as an online system and not just a static "product catalog" or whatever. We actually use it to store application user information so we can perform quick user-searches. We have several applications all connecting to the same index and contributing updates and performing queries, so a clean switchover is difficult to do (we aren't using an intermediate proxy). I suppose introducing a proxy wouldn't be the worst possible idea.

The way I managed this was a little involved.

I had two complete online copies of the index, three if you count the dev server.  Each copy was independently updated, I did not use replication.  I used haproxy and pacemaker to float a virtual IP address between some of the servers and automatically switch to another copy of the index if the main copy went down.

Each copy of the index had two cores for each shard -- a live core and a build core.  A full rebuild would build into the build cores (wiping them as mentioned above before beginning any indexing), and then once the rebuild was completely done, swap the live cores with the build cores.

In cloud mode, you cannot follow that paradigm precisely. Instead, you would just create a new collection for a rebuild, and once it was ready, update a collection alias to point to the new collection.

Thanks,
Shawn

Reply via email to