On 2/5/2015 5:24 PM, Erick Erickson wrote: > Hmmm, driving away from my client, I got to wondering about routing in > SolrCloud. You'd have to apply the analysis chain _before_ you routed > on ID, and I have no clue what would happen with things like the ! > operator in the id field.
I didn't even think about SolrCloud. Fun. > So to handle my "rule of thumb", which is that anything that a human > could possibly enter should _not_ be case sensitive, the <uniqueKey> > field needs to be > 1> normalized as far as case is concerned at index time > 2> have a query-time transformation done to match <1>. So something > like this should do it assuming that > the indexer took care to uppercase the <uniqueKey>: > <fieldType name="eoe_test" class="solr.TextField"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.UpperCaseFilterFactory" /> > </analyzer> > </fieldType> I realize with what I'm saying below that it is outside "typical user" land, but it might work. For an advanced user it wouldn't even be all that messy. Proceeding into "thinking out loud" territory: A custom UpdateRequestProcessor could do all the normalization on the uniqueKey field at index time. If we used that processor in combination with a fieldType like the one you outlined above, I think it would work. The simple version of that processor would just be a case-changing filter. Getting back to what a typical user wants to happen ... an update processor could be included in Solr that figures out the configured uniqueKey field and lowercases the input on that field. We could provide documentation showing how to insert it into the default update chain to allow case-insensitive unique IDs. If somebody needs more complicated normalization (perhaps they want to use the ICU folding class instead of Java's built-in lowercase capability, or do some really wild stuff that's domain-specific), they can write their own processor, and maybe even their own analysis component for the query side. Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
