standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
In my solr 9.2 schema, I am leveraging the dynamicField which tokenizes with solr.StandardTokenizerFactory for index and query. However, when I query with, for example, metadata_txt:XYZ.tif I see many more hits than I expect. When I add debug=true to the query, I see: metadata_txt:XYZ.tif met

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
gt; > ~~Bill > > > > > -- > Sincerely yours > Mikhail Khludnev > https://t.me/MUST_SEARCH > A caveat: Cyrillic! > -- Human wheels spin round and round While the clock keeps the pace... -- John Mellencamp Bill TantzenUniversity of Minnesota Libraries 612-626-9949 (U of M)612-325-1777 (cell)

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
on dot is > what I expect from StandardTokenizer. > > On Tue, May 2, 2023 at 8:48 PM Bill Tantzen > wrote: > > > Mikhail, > > Thanks for the quick reply. Here is the parser info: > > > > LuceneQParser > > > > ~~Bill > > > > On Tue, May

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
ically would maintain the non characters but also lead to more > strict search constraints. If you tried this you need to re index a couple > documents to > Make sure you are getting what you want. > > -Dave > > > On May 2, 2023, at 2:22 PM, Bill Tantzen > wrote: >

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
d in on this! ~~Bill On Tue, May 2, 2023 at 3:56 PM Shawn Heisey wrote: > On 5/2/23 13:16, Bill Tantzen wrote: > > This tokenizer splits the text field into tokens, treating whitespace and > > punctuation as delimiters. > > Delimiter characters are discarded, with the foll

Re: standard tokenizer seemingly splitting on dot

2023-05-03 Thread Bill Tantzen
pected! ~~Bill On Wed, May 3, 2023 at 10:04 AM Shawn Heisey wrote: > On 5/2/23 15:30, Bill Tantzen wrote: > > This works as I expected: > > ab00c.tif -- tokenizes as it should with a value of ab00c.tif > > > > This doesn't work as I expected > > ab00

Re: standard tokenizer seemingly splitting on dot

2023-05-04 Thread Bill Tantzen
e/lucene/issues/12264. > > Let's look at what devs say. > > > > On Wed, May 3, 2023 at 6:13 PM Bill Tantzen > > wrote: > > > > > Shawn, > > > No, email addresses are not preserved -- from the docs: > > > > > > > > >