Also, the driver shouldn't assume UTF-8 (or any encoding) when constructing
String from a Binary vector, since that defeats the point of a binary vector!
Perhaps this should somehow be configurable (though having a lot of little
configuration options is also not ideal). A parameterized extension
Le 30/09/2022 à 18:57, Kevin Bambrick a écrit :
The issue I am facing is sending a UTF-16 string over the wire.
Ok, then you can just transcode the strings before sending them as
String, *or* you can send them as Binary (not String).
Where do these UTF-16 strings come from?
> What would t
The issue I am facing is sending a UTF-16 string over the wire. The
application I am working on needs to support UTF-16 strings. The specific
issue I am stuck on is integrating with the flight SQL driver
(experimentally working on uptaking it for when its released). Right now in
my implementation o
On Thu, 29 Sep 2022 15:19:59 -0400
Larry White wrote:
> Interesting. This doesn't seem to be a Java issue, per se then. I've seen
> admonations in various Arrow Java threads to always specify the Charset for
> the conversion - and so assumed more than one Charset was legal - and have
> written Arr
>
> I've never attempted to transport that data over the wire or export it
> using the C-Data Interface, however. It seems like that's where it would
> fall down.
Yeah, there would be funny characters or validation failures someplace down
the line when trying to transfer the data.
On Thu, Sep 29,
Interesting. This doesn't seem to be a Java issue, per se then. I've seen
admonations in various Arrow Java threads to always specify the Charset for
the conversion - and so assumed more than one Charset was legal - and have
written Arrow Java test code that uses other charsets without ill effect.
>
> Was just wondering was support for UTF-16 Strings considered? As far as I
> am aware VarChar vectors only support UTF-8. Are they something that may be
> supported in the future?
This hasn't really been discussed and is a pretty large change because it
would specification updates and other imp
FWIW we'd made a similar assumption. In Schema.fbs [1] the type is called
Utf8, as well as the Java `ArrowType.Utf8` class - is this a required
assumption to work with other language Arrow libs, maybe?
James
[1] https://github.com/apache/arrow/blob/master/format/Schema.fbs
On Thu, 29 Sept 2022 a
The specification mandates UTF-8 encoding [1].
UTF-16 may make sense as a canonical extension type, but otherwise could just
go into a binary array.
[1]:
https://github.com/apache/arrow/blob/902781d1f3a41563a23d6755433a8e40ce82de7b/format/Schema.fbs#L155-L157
On Thu, Sep 29, 2022, at 13:57, La
Hi Kevin,
I don't know of any particular restriction regarding string encoding.
VarCharVector stores data as a byte array, and the encoding can be set
using the Charset class when you convert Strings to and from bytes. Since
java strings use UTF-16 internally, I would expect this to 'just work'.
Hi.
Was just wondering was support for UTF-16 Strings considered? As far as I
am aware VarChar vectors only support UTF-8. Are they something that may be
supported in the future?
Regards.
Kevin.
11 matches
Mail list logo