The specification mandates UTF-8 encoding [1].

UTF-16 may make sense as a canonical extension type, but otherwise could just 
go into a binary array.

[1]: 
https://github.com/apache/arrow/blob/902781d1f3a41563a23d6755433a8e40ce82de7b/format/Schema.fbs#L155-L157

On Thu, Sep 29, 2022, at 13:57, Larry White wrote:
> Hi Kevin,
>
> I don't know of any particular restriction regarding string encoding.
> VarCharVector stores data as a byte array, and the encoding can be set
> using the Charset class when you convert Strings to and from bytes. Since
> java strings use UTF-16 internally, I would expect this to 'just work'.
>
> larry
>
> On Thu, Sep 29, 2022 at 12:46 PM Kevin Bambrick <kevinbambri...@gmail.com>
> wrote:
>
>> Hi.
>>
>> Was just wondering was support for UTF-16 Strings considered? As far as I
>> am aware VarChar vectors only support UTF-8. Are they something that may be
>> supported in the future?
>>
>> Regards.
>> Kevin.
>>

Reply via email to