On 2022-04-07 11:51 PM, Shawn Heisey wrote:
...
As I understand it, ES offers reindex capability by storing the entire
input document into a field in the index. Which means that the index
will be lot bigger than it needs to be, which is going to affect
performance. If the field is not indexed, then the performance impact
may not be huge, but it will not be zero. And it wouldn't really
improve the speed of a full reindex, it just makes it possible to do a
reindex without an external data source.
The same thing can be done with Solr, and it is something I would
definitely say needs to be part of any index design where Solr will be a
primary data store. That capability should be available in Solr, but I
do not think it should be enabled by default.
What would be the advantage over dumping the documents into a text file
(xml, json) and doing a full re-import? In principle you could dump
everything Solr needs into the file and only check if it's all there
during the import; that plus the protocol overhead would be the only
downside. And deleting the existing index will take a little extra time.
The upside if we can stick the files into git and have versions, it
should compress really well, we can clone it to off-site storage etc. etc.
Dima