[ https://issues.apache.org/jira/browse/FLINK-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16300881#comment-16300881 ]
ASF GitHub Bot commented on FLINK-8301: --------------------------------------- GitHub user Xpray opened a pull request: https://github.com/apache/flink/pull/5203 [FLINK-8301] Support Unicode in codegen for SQL && TableAPI ## What is the purpose of the change *support unicode literal in sql and handles code generation correctly ## Brief change log - *SQL && TableAPI has different literals if using unicode. After sql parse, the literal is "\\u0001" with length = 6 but TableAPI get "\u0001" with length = 1 - *before generating code, unescape first to make \uxxxx in one character , and escape to generate a valid Java String. - *the literal '\uxxxx' from TableAPI has already been an one character String, it needs escaping before code generation - *so in SQL path, a literal needs unescape and escape, in TableAPI path a literal needs escape first and join the same path with SQL. ## Verifying this change This change added tests and can be verified as follows: - *Added test for both SQL && TableAPI with unicode parameter* no ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no You can merge this pull request into a Git repository by running: $ git pull https://github.com/Xpray/flink FLINK-8301 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5203.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5203 ---- commit 945d997c612b1048d600adabb73bf16b637c7f2b Author: Xpray <leonxpray@...> Date: 2017-12-22T02:09:01Z [FLINK-8301] Support Unicode in codegen for TableAPI && SQL ---- > Support Unicode in codegen for SQL && TableAPI > ---------------------------------------------- > > Key: FLINK-8301 > URL: https://issues.apache.org/jira/browse/FLINK-8301 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Reporter: Ruidong Li > Assignee: Ruidong Li > > The current code generation do not support Unicode, "\u0001" will be > generated to "\\u0001", function call like concat(str, "\u0001") will lead to > wrong result. > This issue intend to handle char/varchar literal correctly, some examples > followed as below. > literal: '\u0001abc' -> codegen: "\u0001abc" > literal: '\u0022\' -> codegen: "\"\\" -- This message was sent by Atlassian JIRA (v6.4.14#64029)