The specification mandates UTF-8 encoding [1]. UTF-16 may make sense as a canonical extension type, but otherwise could just go into a binary array.
[1]: https://github.com/apache/arrow/blob/902781d1f3a41563a23d6755433a8e40ce82de7b/format/Schema.fbs#L155-L157 On Thu, Sep 29, 2022, at 13:57, Larry White wrote: > Hi Kevin, > > I don't know of any particular restriction regarding string encoding. > VarCharVector stores data as a byte array, and the encoding can be set > using the Charset class when you convert Strings to and from bytes. Since > java strings use UTF-16 internally, I would expect this to 'just work'. > > larry > > On Thu, Sep 29, 2022 at 12:46 PM Kevin Bambrick <kevinbambri...@gmail.com> > wrote: > >> Hi. >> >> Was just wondering was support for UTF-16 Strings considered? As far as I >> am aware VarChar vectors only support UTF-8. Are they something that may be >> supported in the future? >> >> Regards. >> Kevin. >>