Re: How to use MorelikeThis with duplicates

Dave Wed, 12 Apr 2023 08:52:08 -0700

The recent flag is super clever, and you can use it on other 
applications/situations as well.  I would do that in a heartbeat assuming you 
can reindex your data set quickly


> On Apr 12, 2023, at 10:49 AM, Alessandro Benedetti <a.benede...@sease.io> 
> wrote:
> 
> Following up on Mikhail good insights,
> I would probably recommend using the More Like This Query Parser followed
> by grouping/field collapsing on a field.
> It should solve your problem!
> 
> If your requirements are more advanced feel free to let us know!
> 
> Cheers
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
> 
> e-mail: a.benede...@sease.io
> 
> 
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
> 
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
> 
> 
>> On Wed, 12 Apr 2023 at 13:15, Mikhail Khludnev <m...@apache.org> wrote:
>> 
>> Hello Tom.
>> It's not clear which kind of MLT you are referring to: handler, queryparser
>> or component .
>> Generally there are two options for deduplication:
>> - query time: filed grouping or field collapsing
>> - index time:
>>  - mlt query might be limited to parents with titles and children might
>> carry editions with dates and so one
>>  - or mlt query can be filtered to the recent edition only for every
>> title, thus recent-flag should be set during indexing and then used by
>> filter.
>> 
>>> On Wed, Apr 12, 2023 at 1:22 PM Tom Tailor <aloras2...@gmail.com> wrote:
>>> 
>>> Hi all
>>> 
>>> 
>>> 
>>> I want to build a recommender using Solr MoreLikeThis. I work on
>>> bibliographic data I.e. books. I have multiple records of different
>>> editions of the same book.  For a given book MLT returns all different
>>> editions of the book this is not new content from the users point of
>> view.
>>> I can not deduplicate the records because the different editions are
>>> relevant for other applications.
>>> 
>>> 
>>> 
>>> Is it possible to circumvent this? I could use the books title which is
>> the
>>> same across all editions to filter duplicates from the MLT results
>>> 
>>> 
>>> 
>>> Thanks for your help
>>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> https://t.me/MUST_SEARCH
>> A caveat: Cyrillic!
>>

Re: How to use MorelikeThis with duplicates

Reply via email to