Re: Multiply connected data search

Nikola Smolenski Thu, 26 Dec 2024 03:17:57 -0800

I agree, solr should not be used as the primary data store. However, it
would still be handy to be able to retrieve as much information in a single
query as possible.


I am experimenting with a solution where every solr document has "otherids"
multivalued field, with books having the ids of all the authors who
contributed, authors having the ids of all the books they authored, and
every document including its own id in the list; then, everything can be
extracted using a single join query.

Does anyone see any drawbacks to this solution?

On Tue, Dec 24, 2024 at 6:22 PM Walter Underwood <wun...@wunderwood.org>
wrote:

> Do not use Solr as your primary data store. Solr is not a database. Put
> your data in a relational database where it is easy to track all those
> relationships and update them correctly.
>
> Extract the needed fields and load them into Solr.
>
> This can be a daily full dump and load job. That is what I did at Chegg
> with millions of books. That is simple and fast, should be under an hour
> for the whole job.
>
> An alternative to the all-in-one _text_ field is to use edismax and give
> different weights to the different fields. Something like this, with higher
> weighting for phrase matches.
>
> <qf>title^4 authors</qf>
> <pf>title^8 authors^2</qf>
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Dec 23, 2024, at 10:07 PM, Nikola Smolenski <smolen...@unilib.rs>
> wrote:
> >
> > Thank you for the suggestion, but that wouldn't work because there could
> be
> > multiple authors with the same name, who differ only by ID. If I were to
> > change the name of an author, I wouldn't know which one should I change
> and
> > which one should stay. Additionally, there could be additional author
> > information, such as external identifiers, that needs to be connected to
> > the author.
> >
> > On Mon, Dec 23, 2024 at 11:07 PM Dmitri Maziuk <dmitri.maz...@gmail.com>
> > wrote:
> >
> >> On 12/23/24 15:49, Nikola Smolenski wrote:
> >> ...
> >>> About the only way of doing this I can think of is to perform the
> search,
> >>> get all the found books and authors, then perform another query that
> >>> fetches all the books and authors referenced by any of books or authors
> >> in
> >>> the first query. Is there a smarter way of doing this? What are the
> best
> >>> practices?
> >>>
> >>
> >> A book is a "document" that has a title and authors as separate fields.
> >> Documents usually also have a "big search" field, called _text_ in the
> >> default config.
> >>
> >> Copy both author list and title into _text_, search in _text_, facet on
> >> authors and/or titles.
> >>
> >> Dima
> >>
> >>
>
>

Re: Multiply connected data search

Reply via email to