[
https://issues.apache.org/jira/browse/LUCENE-8527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713437#comment-16713437
]
Steve Rowe commented on LUCENE-8527:
------------------------------------
[~rcmuir ] mentioned on LUCENE-8125 that StandardTokenizer should give such
sequences the {{<EMOJI>}} token type - see the logic in the {{icu}} module's
{{BreakIteratorWrapper}}.
JFlex 1.7.0 supports Unicode 9.0, which, if I'm interpreting the discussion at
http://www.unicode.org/L2/L2016/16315r-handling-seg-emoji.pdf properly, does
not (fully) include Emoji sequence support (though customized rules that would
do that properly in Unicode 9.0 are listed in that doc).
Should we include the (post-9.0) customized rules for Unicode 9.0?
> Upgrade JFlex to 1.7.0
> ----------------------
>
> Key: LUCENE-8527
> URL: https://issues.apache.org/jira/browse/LUCENE-8527
> Project: Lucene - Core
> Issue Type: Improvement
> Components: general/build, modules/analysis
> Reporter: Steve Rowe
> Priority: Minor
>
> JFlex 1.7.0, supporting Unicode 9.0, was released recently:
> [http://jflex.de/changelog.html#jflex-1.7.0]. We should upgrade.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]