[ 
https://issues.apache.org/jira/browse/LUCENE-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912231#action_12912231
 ] 

Simon Willnauer commented on LUCENE-2653:
-----------------------------------------

bq. I dont understand the question. ThaiWordFilter has always been broken this 
way, it is broken by design.
could somebody have used the broken behavior and relies on it? Just making sure 
its not a bw break somehow which we should document.



> ThaiAnalyzer assumes things about your jre
> ------------------------------------------
>
>                 Key: LUCENE-2653
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2653
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/analyzers
>    Affects Versions: 3.1, 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-2653.patch
>
>
> The ThaiAnalyzer/ThaiWordFilter depends on the fact that 
> BreakIterator.getWordInstance(new Locale("th")) returns a dictionary-based 
> break iterator that can segment thai phrases into words (it does not use 
> whitespace).
> But this is non-standard that the JRE will specialize this locale in this 
> way, its nice, but you can't depend on it.
> For example, if you are running on IBM JRE, this analyzer/wordfilter is 
> completely "broken" in the sense it won't do what it claims to do.
> At the minimum, we need to document this and suggest users look at 
> ICUTokenizer for thai, which always has this breakiterator and is not 
> jre-dependent.
> Better, would be to check statically that the thing actually works.
> when creating a new ThaiWordFilter we could clone() the BreakIterator, which 
> is often cheaper than making a new one anyway.
> we could throw an exception, if its not supported, and add a boolean so the 
> user knows it works.
> and we could refer to this boolean with Assert.assume in its tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to