[ https://issues.apache.org/jira/browse/AVRO-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17572777#comment-17572777 ]
Christophe Le Saec commented on AVRO-3532: ------------------------------------------ The other point to take into account is that current Avro Java code ({_}at least from version 1.8.2 to 1.11.0{_}) already accepts fields with name like {*}"Âge"{*}, due to the used method "Character.isLetter" that return true for Â. So, here, the idea is to keep this code, changing documentation and adapt others languages. (FI : Apache Arrow project [limits field names to UTF8|https://arrow.apache.org/docs/format/Columnar.html#struct-layout] : "{_}Each field must have a UTF8-encoded name{_}") > Align naming rules on code > -------------------------- > > Key: AVRO-3532 > URL: https://issues.apache.org/jira/browse/AVRO-3532 > Project: Apache Avro > Issue Type: Wish > Reporter: Christophe Le Saec > Priority: Major > > Description of [naming rule on > documentation|https://avro.apache.org/docs/current/spec.html#names] is > {noformat} > - start with [A-Za-z_] > - subsequently contain only [A-Za-z0-9_] > {noformat} > But [java > code|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1578] > use Character.isLetter method > {code:java} > char first = name.charAt(0); > if (!(Character.isLetter(first) || first == '_')) > throw new SchemaParseException("Illegal initial character: " + name); > for (int i = 1; i < length; i++) { > char c = name.charAt(i); > if (!(Character.isLetterOrDigit(c) || c == '_')) > throw new SchemaParseException("Illegal character in: " + name); > } > return name; > {code} > This method accept accent éùàçË ... and also chinese character (我) ... > So, the aim of this ticket is to see if we can update the documentation, if > other implementations (rust, C# ...) are also compatible with ? -- This message was sent by Atlassian Jira (v8.20.10#820010)