For better or for worse the Rust implementation requires the underlying buffer 
is UTF-8 including null slots, as this allows returning the buffer as a native 
string type, which in turn allows kernels to use Rust's native string 
functionality. Whilst I agree the specification is ambiguous on this note, this 
interpretation doesn't appear to have caused issues so far.

On 2 July 2023 13:07:20 BST, Antoine Pitrou <anto...@python.org> wrote:
>
>
>Le 02/07/2023 à 14:00, Raphael Taylor-Davies a écrit :
>> 
>> More an observation than an issue, but UTF-8 validation for StringArray can 
>> be done very efficiently by first verifying the entire buffer, and then 
>> verifying the offsets correspond to the start of a UTF-8 codepoint.
>
>Caveat: null slots could potentially contain invalid UTF-8 data. Not likely of 
>course, but it should probably not be an error.
>
>That said, yes, it is a smart strategy for the common case!
>
>Regards
>
>Antoine.

Reply via email to