Most of the x/text packages use tries and not rangetables. These allow arbitrary data (as long as it fits in an int) to be associated with runes and allow operating on utf8 without having to convert to tunes. https://godoc.org/golang.org/x/text/internal/triegen. But that’s not a requirement.
The package https://godoc.org/golang.org/x/text/internal/gen/bitfield converts Go structs to ints and can be used to pack the rune data in a convenient way. Furthermore Package https://godoc.org/golang.org/x/text/internal/ucd can be used for reading UCD files And Package https://godoc.org/golang.org/x/text/internal/gen can be used to generate Go tables other than the trie and include utilities to generate canonical x/text files, such as including the Unicode and CLDR versions. The top-level file gen.go is used to orchestrate building x/text and captured dependencies between packages. I may have some designs laying around for the API. On Thu, 16 Apr 2020 at 21:46 Matt Sherman <mwsher...@gmail.com> wrote: > Great. Yes, the data files are here: > https://unicode.org/reports/tr41/tr41-26.html#Props0 > > I’ve done a proof of concept here: https://github.com/clipperhouse/uax29 > > To do it properly, I assume we’d want to use the house style here? > https://github.com/golang/text/blob/master/unicode/rangetable/gen.go > > On Thu, Apr 16, 2020 at 1:52 PM <m...@golang.org> wrote: > >> Yes that would be interesting. Especially if it can be generated from the >> Unicode raw data upon updates. >> >> On Wed, 15 Apr 2020 at 23:56 Ian Lance Taylor <i...@golang.org> wrote: >> >>> [ +mpvl ] >>> >>> On Wed, Apr 15, 2020 at 2:30 PM Matt Sherman <mwsher...@gmail.com> >>> wrote: >>> > >>> > Hi, I am working on a tokenizer based on Unicode text segmentation >>> (UAX 29). I am wondering if there would be an interest in adding range >>> tables for word break categories to the x/text or unicode packages. It >>> appears they could be code-gen’d alongside the rest of the range tables. >>> > >>> > Pardon if this is already being done and I have missed it. I see some >>> mention of those categories (e.g. ALetter) in other places. >>> > >>> > My code is here. Thanks. >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "golang-nuts" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email to golang-nuts+unsubscr...@googlegroups.com. >>> > To view this discussion on the web visit >>> https://groups.google.com/d/msgid/golang-nuts/2a058556-da51-46d0-a41b-28e323541332%40googlegroups.com >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAPQTvz1%3D2AL2HOTCdUsEuhjcnsmBK0Np-BMowojm91-XY4rr%2BQ%40mail.gmail.com.