What I had in mind was actually very simple: when you create a Term (programatically) you normally set the text and the field. I would also like to be able to set the case sensitivity to true or false for that specific Term object.

I imangined (and maybe I am over simplifying it!) that somewhere in the API there must be a string comparison using 'String.equals()' that determines if a document contains the term or not - and that use of 'equals()' has permanently locked Lucene into case-sensitive searching. The values being compared could be first lower-cased (or equalsIgnoreCase could be used) depending on the value of a boolean flag in the Term object.

If that option was there, there would be no need to ever change the case in the analyzer - you'd be able to control case-sensitivity regardless of the field used.

Of course, I realize that there is currently no way to take advantage of such a feature in the QueryParser. It could only be done programatically. But I don't think that's a reason not to do it, since the API already has features that aren't implemented in the QueryParser (like SpanQuerys). In a perfect world, the parser would support all the features, but for the time being anyone who wants to take advantage of the newer features has to find an alternative anyway.

The problem that it would solve for me is, as I mentioned, that I could mix case-sensitive Terms with case-insensitive Terms when using SpanQuerys. I currently have no way to do that.

Regards,
-John

Erick Erickson wrote:
Well, it depends on what you mean by "per term". There's already
PerFieldAnalyzerWrapper for each field, but I don't think that's what
you want.

How do you expect a per term analyzer to behave? I'm having a hard
time thinking of a use case that's general. You could always
roll your own analyzer that didn't change case for your particular
list of words.

But the problem is your users. In your example, suppose a user
typed in "dell computers". Would that match "Dell computers"?
Does your analyzer automatically upper-case some words? If it
does, that's the same as lower casing them all. If it doesn't,
how do you explain that to your users?

All in all, I'm having a tough time imagining how this would work.
It's easy enough to say "let's assume", but I suspect that
whatever solution satisfied your example will have its own problems
that are far worse than just lower-casing things.

Best
Erick


On Wed, Jun 25, 2008 at 5:37 AM, John Byrne <[EMAIL PROTECTED]> wrote:

Hi,

I know that case-insensitive searching is normally done by creating an
all-lower-case version of the documents, and turning the search terms into
lower case whenever this field is searched, but this approach has it's
disadvantages.

Let's say, for example, you want to find "Dell" (with a capital "D"), near
"computers" (with or without capitals, ie. in any case). The problem is that
you would need to use a SpanQuery to find terms near each other; but if the
case-sensitivity required is different for each term, then they will be in
different fields, making the use of SpanQuerys inpossible.

There might be ways to work around this, but my question is: will
case-insensitvity ever be added to Lucene as per-Term option? If not, can
anyone tell me where I should start looking in order to make this change
myself?

Thanks!

-JB



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



------------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG. Version: 7.5.524 / Virus Database: 270.4.1/1517 - Release Date: 24/06/2008 20:41


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to