Hi gophers, I’ve implemented Unicode text segmentation for Go: https://github.com/clipperhouse/uax29/words
It tokenizes text into words, sentences or graphemes according to the Unicode spec <https://unicode.org/reports/tr29/>. I’d been tokenizing text in ad hoc ways, and then learned that there is a Unicode standard. Hopefully useful for you, feedback welcome. (I’m also talking to @mpvl about how such functionality might be useful in x/text.) -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/69cec677-b512-45a1-a9d6-592302d878ae%40googlegroups.com.