I suppose something like that might work, but I still think that presenting a user with matches that sometimes work case sensitive and sometimes doesn't would be...er..fraught.
If you can programmatically restrict your query construction and you're *sure* this is what your users expect, you can make it work. Just index each term twice, once lowercased and once in the native case, with 0 term increment between them. Then you can simply construct your terms however you want and fire the result at the search. In fact, you only need to double-index the terms you want to do case-sensitive searches on. This will increase the size of your index less than you think... Best Erick On Wed, Jun 25, 2008 at 9:59 AM, John Byrne <[EMAIL PROTECTED]> wrote: > What I had in mind was actually very simple: when you create a Term > (programatically) you normally set the text and the field. I would also like > to be able to set the case sensitivity to true or false for that specific > Term object. > > I imangined (and maybe I am over simplifying it!) that somewhere in the API > there must be a string comparison using 'String.equals()' that determines if > a document contains the term or not - and that use of 'equals()' has > permanently locked Lucene into case-sensitive searching. The values being > compared could be first lower-cased (or equalsIgnoreCase could be used) > depending on the value of a boolean flag in the Term object. > > If that option was there, there would be no need to ever change the case in > the analyzer - you'd be able to control case-sensitivity regardless of the > field used. > > Of course, I realize that there is currently no way to take advantage of > such a feature in the QueryParser. It could only be done programatically. > But I don't think that's a reason not to do it, since the API already has > features that aren't implemented in the QueryParser (like SpanQuerys). In a > perfect world, the parser would support all the features, but for the time > being anyone who wants to take advantage of the newer features has to find > an alternative anyway. > > The problem that it would solve for me is, as I mentioned, that I could mix > case-sensitive Terms with case-insensitive Terms when using SpanQuerys. I > currently have no way to do that. > > Regards, > -John > > Erick Erickson wrote: > >> Well, it depends on what you mean by "per term". There's already >> PerFieldAnalyzerWrapper for each field, but I don't think that's what >> you want. >> >> How do you expect a per term analyzer to behave? I'm having a hard >> time thinking of a use case that's general. You could always >> roll your own analyzer that didn't change case for your particular >> list of words. >> >> But the problem is your users. In your example, suppose a user >> typed in "dell computers". Would that match "Dell computers"? >> Does your analyzer automatically upper-case some words? If it >> does, that's the same as lower casing them all. If it doesn't, >> how do you explain that to your users? >> >> All in all, I'm having a tough time imagining how this would work. >> It's easy enough to say "let's assume", but I suspect that >> whatever solution satisfied your example will have its own problems >> that are far worse than just lower-casing things. >> >> Best >> Erick >> >> >> On Wed, Jun 25, 2008 at 5:37 AM, John Byrne <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> Hi, >>> >>> I know that case-insensitive searching is normally done by creating an >>> all-lower-case version of the documents, and turning the search terms >>> into >>> lower case whenever this field is searched, but this approach has it's >>> disadvantages. >>> >>> Let's say, for example, you want to find "Dell" (with a capital "D"), >>> near >>> "computers" (with or without capitals, ie. in any case). The problem is >>> that >>> you would need to use a SpanQuery to find terms near each other; but if >>> the >>> case-sensitivity required is different for each term, then they will be >>> in >>> different fields, making the use of SpanQuerys inpossible. >>> >>> There might be ways to work around this, but my question is: will >>> case-insensitvity ever be added to Lucene as per-Term option? If not, can >>> anyone tell me where I should start looking in order to make this change >>> myself? >>> >>> Thanks! >>> >>> -JB >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >>> >> >> ------------------------------------------------------------------------ >> >> No virus found in this incoming message. >> Checked by AVG. Version: 7.5.524 / Virus Database: 270.4.1/1517 - Release >> Date: 24/06/2008 20:41 >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >