All,

I have a multi-valued field of type text_general and a specific document contains one field value with text "foo:bar". When searching for either "foo" or "bar", I do not get this document in search results.

However, when searching for "foo:bar" or "foo*" or "*bar" I do get the document, so it's definitely there and the field value is being searched.

Is a colon (:) not a word-breaking token?

I have another field containing email address and if I search for e.g. "gmail.com" (without quotes), I'll get everyone whose email addresses end with "gmail.com".

Hmm. I just checked, and if I search for "gmail" (without .com) I don't fine them. Maybe without whitespace, those characters (:, .) do not cause a word-split?

I do have full control over how the indexing takes place, and the foo:bar is actually a compound value. So I am able to use "foo bar" or "foo: bar" or whatever. Users are much more likely to want to search for just "bar" in this case, but also might want to search for "foo:bar" specifically (and not get baz:bar in the results, or at least not ranked as highly).

What am I missing as far as tokenization, here?

I haven't specified anything special when it comes to tokenization, etc.: I'm using a pretty much stock Solr 8.1 install with a core created using the default config set.

Thanks,
-chris

Reply via email to