There are two cases. Index has “well being”, query is “wellbeing”. This is solved by using a shingle filter. That will make lots of nonsense compounds, too, but they won’t match real queries. Well, almost never.
Index has “wellbeing”, query is “well being”. Best approach for this is synonym expansion at index time. Yes, you have to maintain that set of synoymns, but the list should grow to cover most cases quickly. These should be unidirectional mappings, like “wellbeing => well being”, with the synonym filter configured to keep the original term. This is what I did at Netflix back when Solr was new (version 1.3). The synonyms covered “superman”, “babysitter”, “manhunt”, “fullmetal”, etc. The last was for “Full Metal Jacket” and “Fullmetal Alchemist”. There were about 300 synonyms. You might also need to consider hyphenated versions, like “Spider-man”. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 15, 2023, at 10:25 PM, Tim Casey <tca...@gmail.com> wrote: > > Index all diagrams. If you use a dictionary then there is a lot of work to > maintain it. Also this does not translate well to other languages. The > downside to this is having partial token hits which decrease precision. > But, usually people who are looking for "well being" or "wellbeing" will > not expect to look for 'well*' in documents. You would have to measure the > results in your data. An obvious example would be first and last names. > > For every stream of tokens: t1 t2 t3...tn, you would index t1t2 > t2t3...tn-1tn as well as the normal tokens. Index them into a separate > non-stored field to allow control at query time. > > On Tue, Aug 15, 2023 at 8:08 PM Ramkumar Krishnamoorthy < > ramkumar1...@gmail.com> wrote: > >> Hi All, >> >> I am struggling to find the right filter that can make it work for search >> queries like "well being" and "play space" to be able to match terms like >> wellbeing and playspace in documents. >> >> Tried to make it work with wordDelimiterGraph. But that only works if the >> word in the document is "WellBeing". Another option I am considering is >> using DictionaryCompoundWordTokenFilterFactory but I need to find a >> dictionary file for English that I can pass to it.. >> >> Any suggestions on how this can be handled? >> >> Thanks, >> Kumar >>