Re: Phonetic search

Christian Havel Wed, 07 Jul 2021 09:22:10 -0700

Hi,
thanks for your reply. Well, the following is my definition. And if I
understand correctly, if I create a request to search in the field "
**_txt_de*" or "*text_de*" or "*text_general*" it should work with the
following definition?













*<dynamicField name="*_txt_de" type="text_de"  indexed="true"
 stored="true"/>    <fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">      <analyzer>         <tokenizer
class="solr.StandardTokenizerFactory"/>        <filter
class="solr.LowerCaseFilterFactory"/>        <filter
class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_de.txt" format="snowball" />        <filter
class="solr.GermanNormalizationFilterFactory"/>        <filter
class="solr.GermanLightStemFilterFactory"/>        <filter
class="solr.BeiderMorseFilterFactory" nameType="GENERIC" ruleType="APPROX"
concat="true" languageSet="auto" />          </analyzer>    </fieldType>*
















*<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">      <analyzer
type="index">        <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />        <filter
class="solr.LowerCaseFilterFactory"/>        <filter
class="solr.BeiderMorseFilterFactory" nameType="GENERIC" ruleType="APPROX"
concat="true" languageSet="auto" />      </analyzer>      <analyzer
type="query">        <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />        <filter
class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>        <filter
class="solr.LowerCaseFilterFactory"/>        <filter
class="solr.BeiderMorseFilterFactory" nameType="GENERIC" ruleType="APPROX"
concat="true" languageSet="auto" />      </analyzer>    </fieldType>*

Thank you,
Christian

Am Di., 6. Juli 2021 um 18:39 Uhr schrieb Alexandre Rafalovitch <
arafa...@gmail.com>:

> You have your fields indexing with a particular field definition. That
> field definition has an analysis and query pipelines (could be same).
> When you search against that field (either by default or explicitly),
> it will go through the associated pipeline. So, that's when it
> matches.
>
> I don't know if that's directly helpful, but I did a demo a while ago
> with searching in English against Thai text using phonetic matching.
> It is at:
> https://github.com/arafalov/solr-thai-test/blob/master/collection1/conf/schema.xml#L34-L55
>
> Regards,
>    Alex.
> P.s. Remember that you can double-index the same text (with copyField)
> and the second (indexed/not-stored) copy can be processed much
> stricter or just differently; then you can search both fields but put
> different weights on the more strict one. So, "jones" will match/rank
> "Jones" first, and "johns" second.
>
> On Tue, 6 Jul 2021 at 10:59, Christian Havel <christian.ha...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > thanks a lot. And how should my request look like? Is the phonetic search
> > "activated" by a special "keyword" in the request?
> >
> >
> > Am Di., 29. Juni 2021 um 06:04 Uhr schrieb TK Solr <tksol...@sonic.net>:
> >
> > > According to the javadoc
> > >
> > >
> https://lucene.apache.org/core/8_9_0/analyzers-phonetic/org/apache/lucene/analysis/phonetic/BeiderMorseFilterFactory.html
> > > BeiderMorseFilterFactory is supposed to be used after the
> > > StandardTokenizer.
> > >
> > > Most likely GermanNormalizationFilterFactory and
> > > GermanLightStemFilterFactory
> > > shouldn't be used with BeiderMorseFilterFactory. After stems are cut,
> > > stems'
> > > pronunciation can't be matched.
> > >
> > > On the other hand, if you just want to match the German word spelled
> using
> > > different standards (ß <-> ss), GermanNormalizationFilterFactory
> should be
> > > enough. You don't need BeiderMorseFilterFactory.
> > >
> > > p.s. I'm not a German speaker and I haven't actually tested the above
> > > claim. I'm
> > > just speculating.
> > >
> > >
> > > On 6/28/21 7:25 AM, Christian Havel wrote:
> > > > Hi,
> > > >
> > > > I am using Solr 8.8.1 and want to use the Phonetic Search option.
> Because
> > > > of this I modified my schema.xml file, rebuild the index.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *   <!-- German -->    <dynamicField name="*_txt_de" type="text_de"
> > > >   indexed="true"  stored="true"/>    <fieldType name="text_de"
> > > > class="solr.TextField" positionIncrementGap="100">      <analyzer>
> > > > <tokenizer class="solr.StandardTokenizerFactory"/>        <filter
> > > > class="solr.LowerCaseFilterFactory"/>        <filter
> > > > class="solr.StopFilterFactory" ignoreCase="true"
> > > > words="lang/stopwords_de.txt" format="snowball" />        <filter
> > > > class="solr.GermanNormalizationFilterFactory"/>        <filter
> > > > class="solr.GermanLightStemFilterFactory"/> <filter
> > > > class="solr.BeiderMorseFilterFactory" nameType="GENERIC"
> > > ruleType="APPROX"
> > > > concat="true" languageSet="auto" />        <!-- less aggressive:
> <filter
> > > > class="solr.GermanMinimalStemFilterFactory"/> -->        <!-- more
> > > > aggressive: <filter class="solr.SnowballPorterFilterFactory"
> > > > language="German2"/> -->      </analyzer>*
> > > >      </fieldType>
> > > >
> > > > Well I hope that searching for "mueller" finds contacts with
> "müller",
> > > too.
> > > > But it seems that it has no effect.
> > > > Do you have any idea what could be missing?
> > > >
> > > > Thanks,
> > > > Christian
> > > >
> > >
> > >
>

Re: Phonetic search

Reply via email to