Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Gus Heck
That looks like a bug. Seems to be splitting if the character class before and after differ, but not if they are the same. ST XYZ123 tif SF XYZ123 tif LCF xyz123 tif and ST XYZ 123tif SF XYZ 123tif LCF xyz 123tif But... ST XYZ123.123tif SF XYZ123.123tif LCF xyz123.123tif On Tue, May 2,

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
OK, I see what's going on. I should not have used a generic example like XYZ. In my specific case, as you can see, I'm working with filenames. This works as I expected: ab00c.tif -- tokenizes as it should with a value of ab00c.tif This doesn't work as I expected ab003.tif -- tokenizes with a re

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Shawn Heisey
On 5/2/23 13:16, Bill Tantzen wrote: This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions: Periods (dots) that are not followed by whitespace are kept as part of the token, including

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Gus Heck
I concur that the docs clearly state your expected behavior should be true: Standard Tokenizer This tokenizer splits the text field into tokens, treating whitespace and punctuation as delimiters. Delimiter characters are discarded, with the following exceptions: - Periods (dots) that are n

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
Thanks Dave! Using a string field instead would work fine for my purposes I think... I'm just trying to understand why it doesn't work with a field of type text_general which uses the standard tokenizer in both the index and the query analyzer. The docs state: This tokenizer splits the text field

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Dave
You’re not doing anything wrong, a dot is not a character so it splits the field in the index and the query. If you used a string instead it theoretically would maintain the non characters but also lead to more strict search constraints. If you tried this you need to re index a couple documents

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
I'm using the solrconfig.xml from the distribution, ./server/solr/configsets/_default/conf/solrconfig.xml But this problem extends to the index as well; using the initial example, if I search for metadata_txt:ab1 (instead of ab1.tif), my result set includes ab1.tif, ab1.jpg, ab

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Mikhail Khludnev
Analyzer is configured in schema.xml. But literally, splitting on dot is what I expect from StandardTokenizer. On Tue, May 2, 2023 at 8:48 PM Bill Tantzen wrote: > Mikhail, > Thanks for the quick reply. Here is the parser info: > > LuceneQParser > > ~~Bill > > On Tue, May 2, 2023 at 12:43 PM Mi

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
Mikhail, Thanks for the quick reply. Here is the parser info: LuceneQParser ~~Bill On Tue, May 2, 2023 at 12:43 PM Mikhail Khludnev wrote: > Hello Bill, > Which analyzer is configured for metadata_txt? Perhaps you need to tune it > accordingly. > > On Tue, May 2, 2023 at 7:40 PM Bill Tantzen

Re: standard tokenizer seemingly splitting on dot

2023-05-02 Thread Mikhail Khludnev
Hello Bill, Which analyzer is configured for metadata_txt? Perhaps you need to tune it accordingly. On Tue, May 2, 2023 at 7:40 PM Bill Tantzen wrote: > In my solr 9.2 schema, I am leveraging the dynamicField > > stored="true"/> > > which tokenizes with solr.StandardTokenizerFactory for index

standard tokenizer seemingly splitting on dot

2023-05-02 Thread Bill Tantzen
In my solr 9.2 schema, I am leveraging the dynamicField which tokenizes with solr.StandardTokenizerFactory for index and query. However, when I query with, for example, metadata_txt:XYZ.tif I see many more hits than I expect. When I add debug=true to the query, I see: metadata_txt:XYZ.tif met

Re: Nifi Processor

2023-05-02 Thread Mikhail Khludnev
Hello Doug, The cause is not clear from the single log line. Share some more please. On Tue, May 2, 2023 at 6:29 PM Matthias Krüger < mkrue...@opensourceconnections.com> wrote: > Hi Doug, > > Did you see any errors in the Solr server logs (of any of the 7 nodes) at > the time? What makes you thin

Re: Nifi Processor

2023-05-02 Thread Matthias Krüger
Hi Doug, Did you see any errors in the Solr server logs (of any of the 7 nodes) at the time? What makes you think GET vs POST is causing the problem? From a quick look at Nifi's PutSolrContentStream

Nifi Processor

2023-05-02 Thread Doug Whitfield
Hi folks, Not sure if this is a better for the Nifi list, but going to start here since the issue is in Solr. ENV: Apache Solr 7.7 OS Red Hat Enterprise Linux Server release 7.9 (Maipo). PROBLEM: We occasionally get request errors in the Apache Solr service, which consists of 7 nodes. I am sh

Re: crossCollection on multivalue fields

2023-05-02 Thread Mikhail Khludnev
Hi, FWIW I'm working on https://issues.apache.org/jira/browse/SOLR-16717 which allows to join equally sharded collections. The point is to distribute collocate join operation. On Tue, May 2, 2023 at 11:27 AM Sergio García Maroto wrote: > Ok. I am using a different join to work on sharding. This

Re: crossCollection on multivalue fields

2023-05-02 Thread Sergio García Maroto
Ok. I am using a different join to work on sharding. This one seems like doesn´t allow multivalue field join. {!join method=crossCollection from=PersonID to=PersonID fromIndex=document v='type:(pdf)'} Thanks Sergio Maroto On Fri, 28 Apr 2023 at 17:52, Ron Haines wrote: > using Solr 8.11, and 9