Re: case insensitivity

John Byrne Wed, 25 Jun 2008 07:00:44 -0700

What I had in mind was actually very simple: when you create a Term(programatically) you normally set the text and the field. I would alsolike to be able to set the case sensitivity to true or false for thatspecific Term object.

I imangined (and maybe I am over simplifying it!) that somewhere in theAPI there must be a string comparison using 'String.equals()' thatdetermines if a document contains the term or not - and that use of'equals()' has permanently locked Lucene into case-sensitive searching.The values being compared could be first lower-cased (orequalsIgnoreCase could be used) depending on the value of a boolean flagin the Term object.

If that option was there, there would be no need to ever change the casein the analyzer - you'd be able to control case-sensitivity regardlessof the field used.

Of course, I realize that there is currently no way to take advantage ofsuch a feature in the QueryParser. It could only be doneprogramatically. But I don't think that's a reason not to do it, sincethe API already has features that aren't implemented in the QueryParser(like SpanQuerys). In a perfect world, the parser would support all thefeatures, but for the time being anyone who wants to take advantage ofthe newer features has to find an alternative anyway.

The problem that it would solve for me is, as I mentioned, that I couldmix case-sensitive Terms with case-insensitive Terms when usingSpanQuerys. I currently have no way to do that.


Regards,
-John

Erick Erickson wrote:

Well, it depends on what you mean by "per term". There's already
PerFieldAnalyzerWrapper for each field, but I don't think that's what
you want.

How do you expect a per term analyzer to behave? I'm having a hard
time thinking of a use case that's general. You could always
roll your own analyzer that didn't change case for your particular
list of words.

But the problem is your users. In your example, suppose a user
typed in "dell computers". Would that match "Dell computers"?
Does your analyzer automatically upper-case some words? If it
does, that's the same as lower casing them all. If it doesn't,
how do you explain that to your users?

All in all, I'm having a tough time imagining how this would work.
It's easy enough to say "let's assume", but I suspect that
whatever solution satisfied your example will have its own problems
that are far worse than just lower-casing things.

Best
Erick


On Wed, Jun 25, 2008 at 5:37 AM, John Byrne <[EMAIL PROTECTED]> wrote:

Hi,

I know that case-insensitive searching is normally done by creating an
all-lower-case version of the documents, and turning the search terms into
lower case whenever this field is searched, but this approach has it's
disadvantages.

Let's say, for example, you want to find "Dell" (with a capital "D"), near
"computers" (with or without capitals, ie. in any case). The problem is that
you would need to use a SpanQuery to find terms near each other; but if the
case-sensitivity required is different for each term, then they will be in
different fields, making the use of SpanQuerys inpossible.

There might be ways to work around this, but my question is: will
case-insensitvity ever be added to Lucene as per-Term option? If not, can
anyone tell me where I should start looking in order to make this change
myself?

Thanks!

-JB



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

------------------------------------------------------------------------


No virus found in this incoming message.

Checked by AVG.Version: 7.5.524 / Virus Database: 270.4.1/1517 - Release Date: 24/06/2008 20:41



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: case insensitivity

Reply via email to