Perhaps you can lowercase the text prior to passing it to Lucene? Or perhaps you can have a custom Analyzer that treats the whole input as 1 Token (see KeywordAnalyzer -- http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/KeywordAnalyzer.html ), but also includes LowerCaseFilter that's applied to that 1 Token.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Andre Rubin <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Wednesday, August 13, 2008 12:15:25 AM > Subject: Re: Searching Tokenized x Un_tokenized > > Thanks Otis, that was exactly what was happening. > > 1) According to here: > http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35a > wildcard queries are not passed through the Analyzer, but they are > always set to lower case. > > 2) And according to here: > http://wiki.apache.org/lucene-java/LuceneFAQ#head-0f374b0fe1483c90fe7d6f2c44472d10961ba63c > un_tokenized fields are not passed through the Analyze as well. > > So by creating an untokenized field and setting > parser.setLowercaseExpandedTerms(false), I manage to make my use case > work in a case-sensitive manner. That is, 'u*' returns 'usa' and 'U*' > returns USA.... > > The thing is, how to make this case-insensitive? I can make #1 work by > settting it to lowercase: parser.setLowercaseExpandedTerms(true). But > how make #2 work, that is, using a LowerCaseFilter to an Untokenized > field? > > Thanks, > > > Andre > > On Tue, Aug 12, 2008 at 7:57 PM, Otis Gospodnetic > wrote: > > Andre, > > > > Check the Lucene FAQ, there is an entry about wildcards and analysis (which > doesn't take place for wildcard queries). Could that be it? > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > >> From: Andre Rubin > >> To: java-user@lucene.apache.org > >> Sent: Tuesday, August 12, 2008 5:30:47 PM > >> Subject: Re: Searching Tokenized x Un_tokenized > >> > >> My searches for my String tokenized field was working properly. I > >> switched the field to un_tokenized, rebuilt the index, and now my > >> searches only return strings that match the query string in lower > >> case. > >> > >> For example, searching for 'us*': > >> > >> The tokenized field version would find 'USA' and 'usa' > >> > >> The untokenized field version only finds 'usa' > >> > >> I'm using the StandardAnalyzer in both cases. > >> > >> Thanks > >> > >> > >> Andre > >> > >> On Thu, Aug 7, 2008 at 8:16 PM, Otis Gospodnetic > >> wrote: > >> > Hi, > >> > > >> > Perhaps you can give some examples. Yes, untokenized means "full > >> > string" - > it > >> requires an "exact match". > >> > > >> > Otis > >> > -- > >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > > >> > > >> > > >> > ----- Original Message ---- > >> >> From: Andre Rubin > >> >> To: java-user@lucene.apache.org > >> >> Sent: Thursday, August 7, 2008 8:04:04 PM > >> >> Subject: Searching Tokenized x Un_tokenized > >> >> > >> >> Hi all, > >> >> > >> >> When I switched a String field from tokenized to untokenized, some > >> >> searches started not returning some obvious values. Am I missing > >> >> something on querying untokenized fields? Another question is, do I > >> >> need an Analyzer if my search is on an Untokenized field, wouldn't the > >> >> search be based on the full String rather than its tokens? > >> >> > >> >> Thanks, > >> >> > >> >> > >> >> Andre > >> >> > >> >> --------------------------------------------------------------------- > >> >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: [EMAIL PROTECTED] > >> > For additional commands, e-mail: [EMAIL PROTECTED] > >> > > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]