Wes McKinney created ARROW-3536: ----------------------------------- Summary: [C++] Fast UTF8 validation functions Key: ARROW-3536 URL: https://issues.apache.org/jira/browse/ARROW-3536 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Fix For: 0.13.0
[~lemire] discusses this topic in https://lemire.me/blog/2018/05/16/validating-utf-8-strings-using-as-little-as-0-7-cycles-per-byte/ In Java there is also https://lemire.me/blog/2018/10/16/validating-utf-8-bytes-java-edition/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)