Re: Searching Tokenized x Un_tokenized

Andre Rubin Tue, 12 Aug 2008 21:16:14 -0700

Thanks Otis, that was exactly what was happening.

1) According to here:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35a
wildcard queries are not passed through the Analyzer, but they are
always set to lower case.


2) And according to here:
http://wiki.apache.org/lucene-java/LuceneFAQ#head-0f374b0fe1483c90fe7d6f2c44472d10961ba63c
un_tokenized fields are not passed through the Analyze as well.

So by creating an untokenized field and setting
parser.setLowercaseExpandedTerms(false), I manage to make my use case
work in a case-sensitive manner. That is, 'u*' returns 'usa' and 'U*'
returns USA....

The thing is, how to make this case-insensitive? I can make #1 work by
settting it to lowercase: parser.setLowercaseExpandedTerms(true). But
how make #2 work, that is, using a LowerCaseFilter to an Untokenized
field?

Thanks,


Andre

On Tue, Aug 12, 2008 at 7:57 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Andre,
>
> Check the Lucene FAQ, there is an entry about wildcards and analysis (which 
> doesn't take place for wildcard queries).  Could that be it?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Andre Rubin <[EMAIL PROTECTED]>
>> To: java-user@lucene.apache.org
>> Sent: Tuesday, August 12, 2008 5:30:47 PM
>> Subject: Re: Searching Tokenized x Un_tokenized
>>
>> My searches for my String tokenized field was working properly. I
>> switched the field to un_tokenized, rebuilt the index, and now my
>> searches only return strings that match the query string in lower
>> case.
>>
>> For example, searching for 'us*':
>>
>> The tokenized field version would find 'USA' and 'usa'
>>
>> The untokenized field version only finds 'usa'
>>
>> I'm using the StandardAnalyzer in both cases.
>>
>> Thanks
>>
>>
>> Andre
>>
>> On Thu, Aug 7, 2008 at 8:16 PM, Otis Gospodnetic
>> wrote:
>> > Hi,
>> >
>> > Perhaps you can give some examples.  Yes, untokenized means "full string" 
>> > - it
>> requires an "exact match".
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > ----- Original Message ----
>> >> From: Andre Rubin
>> >> To: java-user@lucene.apache.org
>> >> Sent: Thursday, August 7, 2008 8:04:04 PM
>> >> Subject: Searching Tokenized x Un_tokenized
>> >>
>> >> Hi all,
>> >>
>> >> When I switched a String field from tokenized to untokenized, some
>> >> searches started not returning some obvious values. Am I missing
>> >> something on querying untokenized fields? Another question is, do I
>> >> need an Analyzer if my search is on an Untokenized field, wouldn't the
>> >> search be based on the full String rather than its tokens?
>> >>
>> >> Thanks,
>> >>
>> >>
>> >> Andre
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> >> For additional commands, e-mail: [EMAIL PROTECTED]
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > For additional commands, e-mail: [EMAIL PROTECTED]
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Searching Tokenized x Un_tokenized

Reply via email to