Thanks! for pointing this out. -----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, May 06, 2005 8:15 PM To: java-user@lucene.apache.org Subject: Re: alpha numeric searching or highlighting problem.
On May 6, 2005, at 5:43 PM, Yagnesh Shah wrote: > Hi folks, > I am playing with HighlighIt.java of "Luceneinaction" code. I > have modified the text string so that "45 BC" will now "45BC" and > another "45 BC" to "Z3950". I have also modified this line and my > output file do not creates and highlighting. > > Works: > TermQuery query = new TermQuery(new Term("f", "ipsum")); > TermQuery query = new TermQuery(new Term("f", "2000")); > > Do not work: > > TermQuery query = new TermQuery(new Term("f", "45BC")); > TermQuery query = new TermQuery(new Term("f", "Z3950")); This is a classic "analysis paralysis" issue. The HighlightIt code uses the StandardAnalyzer. Analyzing 45BC and Z3950 yields the following (from the Lucene In Action code, run "ant AnalyzerDemo"): AnalyzerDemo: [echo] [echo] Demonstrates analysis of sample text. [echo] [echo] Refer to the "Analysis" chapter for much more on this [echo] extremely crucial topic. [echo] [input] Press return to continue... [input] String to analyze: [This string will be analyzed.] 45BC Z3950 [echo] Running lia.analysis.AnalyzerDemo... [java] Analyzing "45BC Z3950" ... [java] StandardAnalyzer: [java] [45bc] [z3950] Notice that it has been lowercased. A TermQuery must match the case exactly as the tokens returned from the analysis process. Change to "45bc" in your TermQuery and you'll see highlighting. Erik > > Modified text string from: > > private static final String text = > "Contrary to popular belief, Lorem Ipsum is" + > " not simply random text. It has roots in a piece of" + > " classical Latin literature from 45 BC, making it over" + > " 2000 years old. Richard McClintock, a Latin professor" + > " at Hampden-Sydney College in Virginia, looked up one" + > " of the more obscure Latin words, consectetur, from" + > " a Lorem Ipsum passage, and going through the cites" + > " of the word in classical literature, discovered the" + > " undoubtable source. Lorem Ipsum comes from sections" + > " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" + > " Malorum\" (The Extremes of Good and Evil) by Cicero," + > " written in 45 BC. This book is a treatise on the" + > " theory of ethics, very popular during the" + > " Renaissance. The first line of Lorem Ipsum, \"Lorem" + > " ipsum dolor sit amet..\", comes from a line in" + > " section 1.10.32."; > to > > private static final String text = > "Contrary to popular belief, Lorem Ipsum is" + > " not simply random text. It has roots in a piece of" + > " classical Latin literature from 45BC, making it over" + > " 2000 years old. Richard McClintock, a Latin professor" + > " at Hampden-Sydney College in Virginia, looked up one" + > " of the more obscure Latin words, consectetur, from" + > " a Lorem Ipsum passage, and going through the cites" + > " of the word in classical literature, discovered the" + > " undoubtable source. Lorem Ipsum comes from sections" + > " 1.10.32 and 1.10.33 of \"de Finibus Bonorum et" + > " Malorum\" (The Extremes of Good and Evil) by Cicero," + > " written in Z3950. This book is a treatise on the" + > " theory of ethics, very popular during the" + > " Renaissance. The first line of Lorem Ipsum, \"Lorem" + > " ipsum dolor sit amet..\", comes from a line in" + > " section 1.10.32."; > > > Yagnesh N. Shah > Senior Technology Engineer > CS Dept., 4th Floor > H. W. Wilson > 950 University Avenue, > Bronx NY 10452 > (718) 588 8400 x2721 > http://www.hwwilson.com > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]