You’re not doing anything wrong, a dot is not a character so it splits the field in the index and the query. If you used a string instead it theoretically would maintain the non characters but also lead to more strict search constraints. If you tried this you need to re index a couple documents to Make sure you are getting what you want.
-Dave > On May 2, 2023, at 2:22 PM, Bill Tantzen <tantz...@umn.edu.invalid> wrote: > > I'm using the solrconfig.xml from the distribution, > ./server/solr/configsets/_default/conf/solrconfig.xml > > But this problem extends to the index as well; using the initial example, > if I search for <str name="parsedquery">metadata_txt:ab00001</str> (instead > of ab00001.tif), my result set includes ab00001.tif, ab00001.jpg, > ab00001.png, etc so the tokens in the index are split on dot as well, not > just the query. > > I'm doing something wrong, or I'm misunderstanding something!! > ~~Bill > >> On Tue, May 2, 2023 at 1:02 PM Mikhail Khludnev <m...@apache.org> wrote: >> >> Analyzer is configured in schema.xml. But literally, splitting on dot is >> what I expect from StandardTokenizer. >> >> On Tue, May 2, 2023 at 8:48 PM Bill Tantzen <tantz...@umn.edu.invalid> >> wrote: >> >>> Mikhail, >>> Thanks for the quick reply. Here is the parser info: >>> >>> <str name="QParser">LuceneQParser</str> >>> >>> ~~Bill >>> >>> On Tue, May 2, 2023 at 12:43 PM Mikhail Khludnev <m...@apache.org> >> wrote: >>> >>>> Hello Bill, >>>> Which analyzer is configured for metadata_txt? Perhaps you need to >> tune >>> it >>>> accordingly. >>>> >>>> On Tue, May 2, 2023 at 7:40 PM Bill Tantzen <tantz...@umn.edu.invalid> >>>> wrote: >>>> >>>>> In my solr 9.2 schema, I am leveraging the dynamicField >>>>> >>>>> <dynamicField name="*_txt" type="text_general" indexed="true" >>>>> stored="true"/> >>>>> >>>>> which tokenizes with solr.StandardTokenizerFactory for index and >> query. >>>>> >>>>> However, when I query with, for example, >>>>> <str name="q">metadata_txt:XYZ.tif</str> >>>>> >>>>> I see many more hits than I expect. When I add debug=true to the >>> query, >>>> I >>>>> see: >>>>> <str name="rawquerystring">metadata_txt:XYZ.tif</str> >>>>> <str name="querystring">metadata_txt:XYZ.tif</str> >>>>> <str name="parsedquery">metadata_txt:XYZ metadata_txt:tif</str> >>>>> >>>>> But I expect that dots not followed by whitespace will be kept as >> part >>> of >>>>> the token, that is, the parsed query should remain >>> "metadata_txt:XYZ.tif" >>>>> but solr appears to be splitting into two tokens. >>>>> >>>>> Can somebody point out what I am misunderstanding? >>>>> Thanks, >>>>> ~~Bill >>>>> >>>> >>>> >>>> -- >>>> Sincerely yours >>>> Mikhail Khludnev >>>> https://t.me/MUST_SEARCH >>>> A caveat: Cyrillic! >>>> >>> >>> >>> -- >>> Human wheels spin round and round >>> While the clock keeps the pace... -- John Mellencamp >>> ________________________________________________________________ >>> Bill Tantzen University of Minnesota Libraries >>> 612-626-9949 (U of M) 612-325-1777 (cell) >>> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> https://t.me/MUST_SEARCH >> A caveat: Cyrillic! >> > > > -- > Human wheels spin round and round > While the clock keeps the pace... -- John Mellencamp > ________________________________________________________________ > Bill Tantzen University of Minnesota Libraries > 612-626-9949 (U of M) 612-325-1777 (cell)