Hi Srinivas, It works for the latest Lucene Version 3.3.0 (in fact for versions after 3.0.0). Standard Analyzer just splits the text ignoring a set of STOP_WORDS like "is", "in", etc.
In the class definition of StandardAnalyzer in Lucene 3.3.0 API, it is clearly stated :- "As of 3.1, StandardTokenizer implements Unicode text segmentation, and StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords." I guess that takes care of the 'underscore' character now. So I suggest that you should switch to the latest version for better performance and functionality. Hope that helps. Regards, Govind On Mon, Aug 22, 2011 at 11:17 AM, <srinu.he...@gmail.com> wrote: > Hello All, > I observed some unexpected behavior using StandardAnalyzer to > parse the query. Here is the demonstration. > > I am passing the query as (key:xyz_abc) && (text:blabla) > > Expecting the parsed query to be +key:xyz_abc +text:blabla > > Actual Result is +key:"xyz abc" +text:blabla > > As per my understanding StandardAnalyzer splits the word boundaries into > multiple words but the above word xyz_abc is a single word. Please correct > me if i am wrong. > > I also observed if number is there after underscore the parsed query is as > expected. i.e > > If i give the query as (key:xyz_1abc) && (text:blabla) the parsed query is > +key:xyz_1abc +text:blabla > > This is the behavior i am expecting. > > Please help. > > Thanks, > Srinivas > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- No trees were harmed in the creation of this message, but several thousand electrons were mildly inconvenienced.