There are two cases.

Index has “well being”, query is “wellbeing”. This is solved by using a shingle 
filter. That will make lots of nonsense compounds, too, but they won’t match 
real queries. Well, almost never.

Index has “wellbeing”, query is “well being”. Best approach for this is synonym 
expansion at index time. Yes, you have to maintain that set of synoymns, but 
the list should grow to cover most cases quickly. These should be 
unidirectional mappings, like “wellbeing => well being”, with the synonym 
filter configured to keep the original term.

This is what I did at Netflix back when Solr was new (version 1.3). The 
synonyms covered “superman”, “babysitter”, “manhunt”, “fullmetal”, etc. The 
last was for “Full Metal Jacket” and “Fullmetal Alchemist”. There were about 
300 synonyms.

You might also need to consider hyphenated versions, like “Spider-man”.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Aug 15, 2023, at 10:25 PM, Tim Casey <tca...@gmail.com> wrote:
> 
> Index all diagrams.  If you use a dictionary then there is a lot of work to
> maintain it.  Also this does not translate well to other languages.  The
> downside to this is having partial token hits which decrease precision.
> But, usually people who are looking for "well being" or "wellbeing" will
> not expect to look for 'well*' in documents.  You would have to measure the
> results in your data.  An obvious example would be first and last names.
> 
> For every stream of tokens: t1 t2 t3...tn, you would index t1t2
> t2t3...tn-1tn as well as the normal tokens.  Index them into a separate
> non-stored field to allow control at query time.
> 
> On Tue, Aug 15, 2023 at 8:08 PM Ramkumar Krishnamoorthy <
> ramkumar1...@gmail.com> wrote:
> 
>> Hi All,
>> 
>> I am struggling to find the right filter that can make it work for search
>> queries like "well being" and "play space" to be able to match terms like
>> wellbeing and playspace in documents.
>> 
>> Tried to make it work with wordDelimiterGraph. But that only works if the
>> word in the document is "WellBeing". Another option I am considering is
>> using DictionaryCompoundWordTokenFilterFactory but I need to find a
>> dictionary file for English that I can pass to it..
>> 
>> Any suggestions on how this can be handled?
>> 
>> Thanks,
>> Kumar
>> 

Reply via email to