I think you are speaking to the point that the requirement to have all your data rebuildable from source isn't a hard requirement as their are ways to re-index without having access to the original source (you still need the full docs stored in solr just not indexed). By looking at solr from that pov it becomes more approachable as a primary data store.
On Fri, Apr 8, 2022, 1:53 PM dmitri maziuk <dmitri.maz...@gmail.com> wrote: > On 2022-04-07 11:51 PM, Shawn Heisey wrote: > ... > > As I understand it, ES offers reindex capability by storing the entire > > input document into a field in the index. Which means that the index > > will be lot bigger than it needs to be, which is going to affect > > performance. If the field is not indexed, then the performance impact > > may not be huge, but it will not be zero. And it wouldn't really > > improve the speed of a full reindex, it just makes it possible to do a > > reindex without an external data source. > > > > The same thing can be done with Solr, and it is something I would > > definitely say needs to be part of any index design where Solr will be a > > primary data store. That capability should be available in Solr, but I > > do not think it should be enabled by default. > > > What would be the advantage over dumping the documents into a text file > (xml, json) and doing a full re-import? In principle you could dump > everything Solr needs into the file and only check if it's all there > during the import; that plus the protocol overhead would be the only > downside. And deleting the existing index will take a little extra time. > > The upside if we can stick the files into git and have versions, it > should compress really well, we can clone it to off-site storage etc. etc. > > Dima >