Re: Multiply connected data search

Walter Underwood Tue, 24 Dec 2024 09:22:33 -0800

Do not use Solr as your primary data store. Solr is not a database. Put your 
data in a relational database where it is easy to track all those relationships 
and update them correctly.


Extract the needed fields and load them into Solr.

This can be a daily full dump and load job. That is what I did at Chegg with 
millions of books. That is simple and fast, should be under an hour for the 
whole job. 

An alternative to the all-in-one _text_ field is to use edismax and give 
different weights to the different fields. Something like this, with higher 
weighting for phrase matches.

<qf>title^4 authors</qf>
<pf>title^8 authors^2</qf>

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 23, 2024, at 10:07 PM, Nikola Smolenski <smolen...@unilib.rs> wrote:
> 
> Thank you for the suggestion, but that wouldn't work because there could be
> multiple authors with the same name, who differ only by ID. If I were to
> change the name of an author, I wouldn't know which one should I change and
> which one should stay. Additionally, there could be additional author
> information, such as external identifiers, that needs to be connected to
> the author.
> 
> On Mon, Dec 23, 2024 at 11:07 PM Dmitri Maziuk <dmitri.maz...@gmail.com>
> wrote:
> 
>> On 12/23/24 15:49, Nikola Smolenski wrote:
>> ...
>>> About the only way of doing this I can think of is to perform the search,
>>> get all the found books and authors, then perform another query that
>>> fetches all the books and authors referenced by any of books or authors
>> in
>>> the first query. Is there a smarter way of doing this? What are the best
>>> practices?
>>> 
>> 
>> A book is a "document" that has a title and authors as separate fields.
>> Documents usually also have a "big search" field, called _text_ in the
>> default config.
>> 
>> Copy both author list and title into _text_, search in _text_, facet on
>> authors and/or titles.
>> 
>> Dima
>> 
>>

Re: Multiply connected data search

Reply via email to