On 2022-04-07 11:51 PM, Shawn Heisey wrote:
...
As I understand it, ES offers reindex capability by storing the entire input document into a field in the index.  Which means that the index will be lot bigger than it needs to be, which is going to affect performance.  If the field is not indexed, then the performance impact may not be huge, but it will not be zero.  And it wouldn't really improve the speed of a full reindex, it just makes it possible to do a reindex without an external data source.

The same thing can be done with Solr, and it is something I would definitely say needs to be part of any index design where Solr will be a primary data store.  That capability should be available in Solr, but I do not think it should be enabled by default.

What would be the advantage over dumping the documents into a text file (xml, json) and doing a full re-import? In principle you could dump everything Solr needs into the file and only check if it's all there during the import; that plus the protocol overhead would be the only downside. And deleting the existing index will take a little extra time.

The upside if we can stick the files into git and have versions, it should compress really well, we can clone it to off-site storage etc. etc.

Dima

Reply via email to