Hi,

A best practice for performances and ressources usage is to store and/or
index and/or docValues only data required for your search features.
However, in order to implement or modify new or existing features in an
index you will need to reindex all the data in this index.

I propose 2 solutions :

   - The first one is to store the full original JSON data into the _str_
   fields of the index.

   
https://solr.apache.org/guide/8_11/transforming-and-indexing-custom-json.html#setting-json-default


   - The second and the best solution in my opinion is to store the JSON
   data into an intermediate feature neutral data store as a file simple file
   system or better a MongoDB database. This way will allow you to use your
   data in several indexes (one index for search, one index for suggesters,
   ...)  without duplicating data into _src_ fields in each index. A uuid in
   each index will allow you to get the full JSON object in MongoDB.


Obviously a key point is the backup strategy of your data store according
to the solution you choose : either Solr indexes or the file system or the
MongoDB database.

Dominique

















Le lun. 4 avr. 2022 à 13:53, Srijan <shree...@gmail.com> a écrit :

> Hi All,
>
> I am working on designing a Solr based enterprise search solution. One
> requirement I have is to track crawled data from various different data
> sources with metadata like crawled date, indexing status and so on. I am
> looking into using Solr itself as my data store and not adding a separate
> database to my stack. Has anyone used Solr as a dedicated data store? How
> did it compare to an RDBMS? I see Lucidworks Fusion has a notion of Crawl
> DB - can someone here share some insight into how Fusion is using this
> 'DB'? My store will need to track millions of objects and be able to handle
> parallel adds/updates. Do you think Solr is a good tool for this or am I
> better off depending on a database service?
>
> Thanks a bunch.
>

Reply via email to