zhxiaoping commented on a change in pull request #4179: URL: https://github.com/apache/zeppelin/pull/4179#discussion_r673628437
########## File path: livy/src/main/java/org/apache/zeppelin/livy/LivySparkSQLInterpreter.java ########## @@ -197,7 +200,18 @@ public FormType getFormType() { return rows; } - protected List<String> parseSQLOutput(String output) { + protected List<String> parseSQLOutput(String str) { + String fullWidthRegex = "([" + + "\u1100-\u115F" + + "\u2E80-\uA4CF" + + "\uAC00-\uD7A3" + + "\uF900-\uFAFF" + + "\uFE10-\uFE19" + + "\uFE30-\uFE6F" + + "\uFF00-\uFF60" + + "\uFFE0-\uFFE6" + + "])"; + String output = str.replaceAll(fullWidthRegex, "$1\u0001"); Review comment: the regex is refered to org.apache.spark.util.Utils#fullWidthRegex  for spark chinese character has two placeholder, one placeholder is one char. for zeppelin chinese has only one placeholder. they have different standards. so zeppelin can not parse columns based on column size. just because zeppelin take chinese character as one placeholder, but actually it is two placeholder. this pr do two things the first one thing insert a special character (/u0001) which nerver use after every chinese character, so zeppline can split string correctly, replace /u0001 with empty string, before add record to rows the second thing avoid that chinese character is escaped. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org