Re: Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)

Erick Erickson Mon, 22 Aug 2011 06:52:51 -0700

No, that's expected. StandardAnalyzer breaks on '_' as far as I know.

NOTE: the behavior changed a bit as of Solr 3.1. To get the old
StandardAnalyzer behavior, I believe you need ClassicAnalyzer...


More than you ever want to know about breaking lines (3.1+)
http://unicode.org/reports/tr29/#Word_Boundaries
Linked to from:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory


Best
ERick

On Mon, Aug 22, 2011 at 1:47 AM,  <srinu.he...@gmail.com> wrote:
> Hello All,
>           I observed  some unexpected behavior using StandardAnalyzer to 
> parse the query. Here is the demonstration.
>
> I am passing the query as (key:xyz_abc) && (text:blabla)
>
> Expecting the parsed query to be +key:xyz_abc +text:blabla
>
> Actual Result is +key:"xyz abc" +text:blabla
>
> As per my understanding StandardAnalyzer splits the word boundaries into 
> multiple words but the above word xyz_abc is a single word. Please correct me 
> if i am wrong.
>
> I also observed if number is there after underscore the parsed query is as 
> expected. i.e
>
> If i give the query as (key:xyz_1abc) && (text:blabla) the parsed query is 
> +key:xyz_1abc +text:blabla
>
> This is the behavior i am expecting.
>
> Please help.
>
> Thanks,
> Srinivas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Issue with StandardAnalyzer which splits single word with _(Lucene Version: 3.0)

Reply via email to