All,
I have a multi-valued field of type text_general and a specific document
contains one field value with text "foo:bar". When searching for either
"foo" or "bar", I do not get this document in search results.
However, when searching for "foo:bar" or "foo*" or "*bar" I do get the
document, so it's definitely there and the field value is being searched.
Is a colon (:) not a word-breaking token?
I have another field containing email address and if I search for e.g.
"gmail.com" (without quotes), I'll get everyone whose email addresses
end with "gmail.com".
Hmm. I just checked, and if I search for "gmail" (without .com) I don't
fine them. Maybe without whitespace, those characters (:, .) do not
cause a word-split?
I do have full control over how the indexing takes place, and the
foo:bar is actually a compound value. So I am able to use "foo bar" or
"foo: bar" or whatever. Users are much more likely to want to search for
just "bar" in this case, but also might want to search for "foo:bar"
specifically (and not get baz:bar in the results, or at least not ranked
as highly).
What am I missing as far as tokenization, here?
I haven't specified anything special when it comes to tokenization,
etc.: I'm using a pretty much stock Solr 8.1 install with a core created
using the default config set.
Thanks,
-chris