Hi gophers, I’ve implemented Unicode text segmentation for 
Go: https://github.com/clipperhouse/uax29/words

It tokenizes text into words, sentences or graphemes according to the Unicode 
spec <https://unicode.org/reports/tr29/>. I’d been tokenizing text in ad 
hoc ways, and then learned that there is a Unicode standard.

Hopefully useful for you, feedback welcome. (I’m also talking to @mpvl 
about how such functionality might be useful in x/text.)

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/69cec677-b512-45a1-a9d6-592302d878ae%40googlegroups.com.

Reply via email to