That's expected. Non letters are not mapped to letters, correctly.
On Oct 19, 2017 9:38 AM, "Chitra" wrote:
> Hi,
> I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was
> indexed as "er l n", some characters were trimmed while indexing.
>
> Here is my code
>
> protected Analyz
on it.
Standard is ... "standard" ... it implements that Unicode Standard text
segmentation rules.
: Date: Fri, 20 Oct 2017 18:58:35 +0530
: From: Chitra
: Reply-To: java-user@lucene.apache.org
: To: Lucene Users
: Subject: Re: ClassicAnalyzer Behavior on accent character
:
: Hi,
:
Hi,
I found the difference and understand the behavior of both
tokenizers appropriately.
Could you please suggest me which one is the better to use
ClassicTokenizer/StandardTokenizer?
--
Regards,
Chitra
Hi Robert,
Yes, standardTokenizer solves my case... could you please
explain the difference between ClassicalTokenizer and StandardTokenizer?
How does standardTokenizer solve my case? I surf the web but I was unable
to understand...
Any help is greatly appreciated.
On Fri, Oct 2
easy, don't use classictokenizer: use standardtokenizer instead.
On Thu, Oct 19, 2017 at 9:37 AM, Chitra wrote:
> Hi,
> I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was
> indexed as "er l n", some characters were trimmed while indexing.
>
> Here is my code
>
> protected Ana
Hi,
I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was
indexed as "er l n", some characters were trimmed while indexing.
Here is my code
protected Analyzer.TokenStreamComponents createComponents(final String
> fieldName, final Reader reader)
> {
> final ClassicTok