Re: Regarding indexing data in different cores or same core with different entities.

Thomas Corthals Mon, 11 Apr 2022 02:33:36 -0700

We have a similar setup where entities of different types all go in a
single core. Folding, stemming, managed synonyms … have to be the same for
all entity types. I find it easier to only have to keep one schema up to
date with business needs. Adding a new entity type to the index can usually
be done entirely in code. The logic for managing synonyms through the REST
API doesn't even have to be aware of these changes.


Our schema has 3 required fields:

   - uid is the uniqueKey (a combination of type + id) and is used for
   atomic updates and delete-by-id
   - type is a string field that is used in filter queries
   - id is the (usually autoincrement) identifier from the database and is
   what most queries retrieve

All other fields are dynamic. It's very rare that I have had to add a
dynamic field definition because most of our data is either natural
language text, verbatim strings, dates or integers that are foreign keys in
the database.

Another benefit of this approach is that you can query "across tables" on a
common field and get facet counts per entity type.

That's what works for us. If you don't need to facet across tables, if you
want to define each field explicitly because you don't want a different
field name in Solr (to match the dynamic field wildcard), if you have
specific schema requirements per table, if you don't have to bother with
managed synonyms … you might be better off with a core (collection) per
table.

Thomas

Op ma 11 apr. 2022 om 02:31 schreef Walter Underwood <wun...@wunderwood.org
>:

> I would make four cores (collections). With a single one, the schema is a
> union of all of the tables, so a mess to manage. There will be lots of
> comments about which field belongs to which table.
>
> Make four collections with four schemas that match the four tables. You
> can load them independently and update the schemas independently.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 10, 2022, at 4:56 PM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> >
> > The different field types for the same named field I didn’t take into
> > account.  That would throw a wrench into it if one table wanted facets
> on a
> > field but the other just wanted text searching on the same field name for
> > example.
> >
> > Guess without context the question becomes more difficult to answer, such
> > as is it one client for all the tables, or one each, or…. Why even use a
> > rdbms if there is no r in it in the first place?
> >
> > On Sun, Apr 10, 2022 at 7:51 PM Shawn Heisey <apa...@elyograg.org>
> wrote:
> >
> >> On 4/10/2022 2:31 PM, Neha Gupta wrote:
> >>> I want to query these tables differently in Solr as they don't have
> >>> any relation between them. Could you please tell whether i should
> >>> create different cores for each table or should i indexed them in one
> >>> core with different entities. If latter is the case then how i can
> >>> query Solr on basis of entity?
> >>
> >> I'm going to disagree with Saurabh Sharma here.
> >>
> >> If there truly is no relationship between the data in those tables, then
> >> I would index them as separate cores, or collections if running in cloud
> >> mode.  The configurations will be cleaner, and there is much less chance
> >> of a change in one table causing general problems for a combined core
> >> ... those effects would be limited to the core for the table that is
> >> changing.
> >>
> >> If you can use the same fields for data coming from multiple tables,
> >> there is a certain amount of space savings that can be realized by
> >> having one index instead of four, due to the way that Lucene file
> >> formats work.  For most setups, that space savings will be very small
> >> compared to the problems that you can avoid by not combining the data.
> >>
> >> The only time it makes sense to have the data from multiple database
> >> tables in the same core is when there is a definite relationship between
> >> the tables.  If you use JOIN queries on the DB server on a regular
> >> basis, and that extends to searching as well, performance in Solr will
> >> be better if Solr does not have to do a JOIN to accomplish its work.
> >> The cross-core join capability in Solr is fairly limited and is NOT what
> >> someone familiar with database joins would expect, particularly in the
> >> arena of performance.
> >>
> >> As mentioned by Dave, if you do combine the data, you will want to have
> >> at least one field indexed that can filter results as you need them
> >> filtered.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
>

Re: Regarding indexing data in different cores or same core with different entities.

Reply via email to