The 'no' response is traditional and a bit dated. If you have proper backup/snapshots happening it is totally plausible to use solr (lucene) as a primary data store. If you need field/config changes you can import a collection from an existing collection doing the field transforms on the fly.
There are a growing number of products built on lucene/elastic that act as a primary datastore. There is no reason solr can't be used as the same outside of the core devs slow response to bugs/documentation but that's a topic for questioning using solr at all. Like all software solutions your system should be designed with redundancy and resiliency. Good Luck! On Tue, Apr 5, 2022, 12:44 AM Tim Casey <tca...@gmail.com> wrote: > Srijan, > > Comments off the top of my head, so buyer beware. > > Almost always you want to be able to reindex your data from a 'source'. > This makes things like indexes not good as a data store, or a source of > truth. The reasons for this vary. Indexes age out data because there is > frequently a weight towards more recent items, indexes need to be reindexed > for new info to index/issues during indexing/processing, and the list would > go on. > > I have built an index data POJO store in lucene a *long* time ago. It is > doable to hydrate a stored object into a language level object, such as a > java object instance. It is fairly straightforward to data model from a > 'common' type of data model into an index as a data model. But, it is not > quite the same query expectations and so on. It is is not that far, but > again, this is not what the primary focus of an invertible index is. The > primary focus is to take unstructured language data and return results in a > hopefully well ordered list. > > So, the first you might do is treat the different sources of data as > different clusters with a different topology. You might stripe the data > less and have it be more nodes than you might otherwise because you will do > less indexing with it, than you might a normal index. Once you make a > decision to separate out the data, then you have to contend with two > different indexes having references to the same 'documents' with some id to > tie them together and you would lose the ability to do any form of in-index > join using document ids. If you keep all the data in the same index, then > you might be in a situation where the common answer is reindex and you > would not know what to do about the "metadata". > > I strongly suspect what you want is to have a way to either maintain the > metadata within the index and use it simply as you would along with the > documents. As you spider, keep the info about the document with the > document contents. I cannot think of a reason to keep all of the data in a > kinda weird separate space. If you want to be more sophisticated, then > you can build an ETL which takes documents and forms indexable units, store > the indexable units for reindexing. This is usually pretty quick and > separates out the crawling, ETL and indexing/query pieces, for all that > means. This is more complicated, but would be a bit more standard in how > people think about it. > > tim > > > > On Mon, Apr 4, 2022 at 7:32 PM Shawn Heisey <apa...@elyograg.org> wrote: > > > On 4/4/2022 5:52 AM, Srijan wrote: > > > I am working on designing a Solr based enterprise search solution. One > > > requirement I have is to track crawled data from various different data > > > sources with metadata like crawled date, indexing status and so on. I > am > > > looking into using Solr itself as my data store and not adding a > separate > > > database to my stack. Has anyone used Solr as a dedicated data store? > How > > > did it compare to an RDBMS? > > > > As you've been told, Solr is NOT a database. It is most definitely not > > equivalent in any way to an RDBMS. If you want the kinds of things an > > RDBMS is good for, you should use an RDBMS, not Solr. > > > > Handling ever-changing search requirements in Solr is typically going to > > require the kinds of schema changes that need a full reindex. So you > > probably wouldn't be able to use the same Solr index for your data > > storage as you do for searching anyway. > > > > If you're going to need to set up two Solr installs to handle your > > needs, you should probably NOT use Solr for the storage role. Use > > something that has been tested and hardened against data loss. Solr does > > do its best to never lose data, but guaranteed data durability is not > > one of its design goals. The changes that would be required to make > > that guarantee would most likely have an extremely adverse effect on > > search performance. > > > > Solr's core functionality has always been search. Search is what it's > > good at, and that's what will be optimized in future versions ... not > > any kind of database functionality. > > > > Thanks, > > Shawn > > > > >