Re: Solr as a dedicated data store?

James Greene Tue, 05 Apr 2022 06:26:08 -0700

The 'no' response is traditional and a bit dated.  If you have proper
backup/snapshots happening it is totally plausible to use solr (lucene) as
a primary data store. If you need field/config changes you can import a
collection from an existing collection doing the field transforms on the
fly.


There are a growing number of products built on lucene/elastic that act as
a primary datastore. There is no reason solr can't be used as the same
outside of the core devs slow response to bugs/documentation but that's a
topic for questioning using solr at all.

Like all software solutions your system should be designed with redundancy
and resiliency.

Good Luck!

On Tue, Apr 5, 2022, 12:44 AM Tim Casey <tca...@gmail.com> wrote:

> Srijan,
>
> Comments off the top of my head, so buyer beware.
>
> Almost always you want to be able to reindex your data from a 'source'.
> This makes things like indexes not good as a data store, or a source of
> truth.  The reasons for this vary.  Indexes age out data because there is
> frequently a weight towards more recent items, indexes need to be reindexed
> for new info to index/issues during indexing/processing, and the list would
> go on.
>
> I have built an index data POJO store in lucene a *long* time ago.  It is
> doable to hydrate a stored object into a language level object, such as a
> java object instance.  It is fairly straightforward to data model from a
> 'common' type of data model into an index as a data model.  But, it is not
> quite the same query expectations and so on.  It is is not that far, but
> again, this is not what the primary focus of an invertible index is.  The
> primary focus is to take unstructured language data and return results in a
> hopefully well ordered list.
>
> So, the first you might do is treat the different sources of data as
> different clusters with a different topology.  You might stripe the data
> less and have it be more nodes than you might otherwise because you will do
> less indexing with it, than you might a normal index.  Once you make a
> decision to separate out the data, then you have to contend with two
> different indexes having references to the same 'documents' with some id to
> tie them together and you would lose the ability to do any form of in-index
> join using document ids.  If you keep all the data in the same index, then
> you might be in a situation where the common answer is reindex and you
> would not know what to do about the "metadata".
>
> I strongly suspect what you want is to have a way to either maintain the
> metadata within the index and use it simply as you would along with the
> documents.  As you spider, keep the info about the document with the
> document contents.  I cannot think of a reason to keep all of the data in a
> kinda weird separate space.    If you want to be more sophisticated, then
> you can build an ETL which takes documents and forms indexable units, store
> the indexable units for reindexing.  This is usually pretty quick and
> separates out the crawling, ETL and indexing/query pieces, for all that
> means.   This is more complicated, but would be a bit more standard in how
> people think about it.
>
> tim
>
>
>
> On Mon, Apr 4, 2022 at 7:32 PM Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 4/4/2022 5:52 AM, Srijan wrote:
> > > I am working on designing a Solr based enterprise search solution. One
> > > requirement I have is to track crawled data from various different data
> > > sources with metadata like crawled date, indexing status and so on. I
> am
> > > looking into using Solr itself as my data store and not adding a
> separate
> > > database to my stack. Has anyone used Solr as a dedicated data store?
> How
> > > did it compare to an RDBMS?
> >
> > As you've been told, Solr is NOT a database.  It is most definitely not
> > equivalent in any way to an RDBMS.  If you want the kinds of things an
> > RDBMS is good for, you should use an RDBMS, not Solr.
> >
> > Handling ever-changing search requirements in Solr is typically going to
> > require the kinds of schema changes that need a full reindex.  So you
> > probably wouldn't be able to use the same Solr index for your data
> > storage as you do for searching anyway.
> >
> > If you're going to need to set up two Solr installs to handle your
> > needs, you should probably NOT use Solr for the storage role.  Use
> > something that has been tested and hardened against data loss. Solr does
> > do its best to never lose data, but guaranteed data durability is not
> > one of its design goals.  The changes that would be required to make
> > that guarantee would most likely have an extremely adverse effect on
> > search performance.
> >
> > Solr's core functionality has always been search.  Search is what it's
> > good at, and that's what will be optimized in future versions ... not
> > any kind of database functionality.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Solr as a dedicated data store?

Reply via email to