Dino, you lost me half-way through your email :( NO_NORMS does not mean the field is not tokenized. UN_TOKENIZED does mean the field is not tokenized.
Otis-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Dino Korah <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, August 26, 2008 9:17:49 AM > Subject: RE: Case Sensitivity > > I think I should rephrase my question. > > [ Context: Using out of the box StandardAnalyzer for indexing and searching. > ] > > Is it right to say that a field, if either UN_TOKENIZED or NO_NORMS-ized ( > field.setOmitNorms(true) ), it doesn't get analyzed while indexing? > Which means that when we search, it gets thru the analyzer and we need to > analyze them differently in the analyzer we use for searching? > Doesn't it mean that a setOmitNorms(true) field also doesn't get tokenized? > > What is the best solution if one was to add a set of fields UN_TOKENIZED and > others TOKENIZED, of the later set a few with setOmitNorms(true) (the index > writer is plain StandardAnalyzer based)? A per field analyzer at query time > ?! > > Many thanks, > Dino > > > -----Original Message----- > From: Dino Korah [mailto:[EMAIL PROTECTED] > Sent: 26 August 2008 12:12 > To: 'java-user@lucene.apache.org' > Subject: RE: Case Sensitivity > > A little more case sensitivity questions. > > Based on the discussion on http://markmail.org/message/q7dqr4r7o6t6dgo5 and > on this thread, is it right to say that a field, if either UN_TOKENIZED or > NO_NORMS-ized, it doesn't get analyzed while indexing? Which means we need > to case-normalize (down-case) those fields before hand? > > Doest it mean that if I can afford, I should use norms. > > Many thanks, > Dino > > > > -----Original Message----- > From: Steven A Rowe [mailto:[EMAIL PROTECTED] > Sent: 19 August 2008 17:43 > To: java-user@lucene.apache.org > Subject: RE: Case Sensitivity > > Hi Dino, > > I think you'd benefit from reading some FAQ answers, like: > > "Why is it important to use the same analyzer type during indexing and > search?" > > 4472d10961ba63c> > > Also, have a look at the AnalysisParalysis wiki page for some hints: > > > On 08/19/2008 at 8:57 AM, Dino Korah wrote: > > From the discussion here what I could understand was, if I am using > > StandardAnalyzer on TOKENIZED fields, for both Indexing and Querying, > > I shouldn't have any problems with cases. > > If by "shouldn't have problems with cases" you mean "can match > case-insensitively", then this is true. > > > But if I have any UN_TOKENIZED fields there will be problems if I do > > not case-normalize them myself before adding them as a field to the > > document. > > Again, assuming that by "case-normalize" you mean "downcase", and that you > want case-insensitive matching, and that you use the StandardAnalyzer (or > some other downcasing analyzer) at query-time, then this is true. > > > In my case I have a mixed scenario. I am indexing emails and the email > > addresses are indexed UN_TOKENIZED. I do have a second set of custom > > tokenized field, which keep the tokens in individual fields with same > > name. > [...] > > Does it mean that where ever I use UN_TOKENIZED, they do not get > > through the StandardAnalyzer before getting Indexed, but they do when > > they are searched on? > > This is true. > > > If that is the case, Do I need to normalise them before adding to > > document? > > If you want case-insensitive matching, then yes, you do need to normalize > them before adding them to the document. > > > I also would like to know if it is better to employ an EmailAnalyzer > > that makes a TokenStream out of the given email address, rather than > > using a simplistic function that gives me a list of string pieces and > > adding them one by one. With searches, would both the approaches give > > same result? > > Yes, both approaches give the same result. When you add string pieces > one-by-one, you are adding multiple same-named fields. By contrast, the > EmailAnalyzer approach would add a single field, and would allow you to > control positions (via Token.setPositionIncrement(): > > ml#setPositionIncrement(int)>), e.g. to improve phrase handling. Also, if > you make up an EmailAnalyzer, you can use it to search against your > tokenized email field, along with other analyzer(s) on other field(s), using > the PerFieldAnalyzerWrapper > > AnalyzerWrapper.html>. > > Steve > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]