: We are using Solr as a user index, and users have email addresses.
:
: Our old search behavior used a SQL substring match for any search
: terms entered, and so users are used to being able to search for e.g.
: "chr" and finding my email address ("[email protected]").
:
: By default, Solr doesn't perform substring matches, and it might be
: difficult to re-train users to use *chr* to find email addresses by
: substring.
In the past, were you really doing arbitrary substring matching, or just
prefix matching? ie would a search for "sto" match
"[email protected]"
Personally, if you know you have an email field, would suggest using a
custom tokenizer that splits on "@" and "." (and maybe other punctuation
characters like "-") and then take your raw user input and feed it to the
prefix parser (instead of requiring your users to add the "*")...
q={!prefix f=email v=$user_input}&user_input=chr
...which would match [email protected], [email protected], [email protected] etc.
(this wouldn't help you though if you *really* want arbitrary substring
matching -- as erick suggested ngrams is pretty much your best bet for
something like that)
Bear in mind, you can combine that "forced prefix" query against
the (otkenized) email field with other queries that
could parse your input in other ways...
user_input=...
q=({!prefix f=email v=$user_input}
OR {!dismax qf="first_name last_name" ..etc.. v=$user_input})
so if your user input is "chris" you'll get term matches on the
first_name field, or the last_name field as well as prefix matches on the
email field.
-Hoss
http://www.lucidworks.com/