[ 
https://issues.apache.org/jira/browse/HIVE-16763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025790#comment-16025790
 ] 

Carter Shanklin commented on HIVE-16763:
----------------------------------------

If we're considering adding double-quoted identifiers I have some additional 
unsolicited opinions here.

Hive has a number of non-standard restrictions on what characters are allowed, 
for example Hive won't allow $, :, /, #, or |, even in a delimited (or quoted) 
identifier. This causes problems for many migration scenarios. Many users hope 
to see better Unicode support. Aligning with the SQL standard allows us to 
solve both of these problems.

Per the SQL:2011 spec:
A regular identifier starts with an <identifier start> and is optionally 
followed by a sequence of <identifer start> or <identifier extend>.
These are comprised of allowed types of Unicode characters.

Additional details:
1) An <identifer start> is any character in the Unicode General Category 
classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, or “Nl”.
NOTE 94 — The Unicode General Category classes “Lu”, “Ll”, “Lt”, “Lm”, “Lo”, 
and “Nl” are assigned to Unicode
characters that are, respectively, upper-case letters, lower-case letters, 
title-case letters, modi er letters,
other letters, and letter numbers.

2) An <identifer extend> is U+00B7, “Middle Dot”, or any character in the 
Unicode General Category classes “Mn”, “Mc”, “Nd”, “Pc”, or “Cf”.
NOTE 95 — The Unicode General Category classes “Mn”, “Mc”, “Nd”, “Pc”, and “Cf” 
are assigned to Unicode
characters that are, respectively, nonspacing marks, spacing combining marks, 
decimal numbers, connector
punctuations, and formatting codes.

Based on these definitions, the following are valid regular identifiers and 
should not require special quoting.
aďƪȸβҵᴟệⰼꜷꮾ𝐩𝖌𝛑
ミムㄪㅉ了泥
C̲̅r̲̅a̲̅y̲̅o̲̅l̲̲a̲̅

If you're skeptical, try any of these in Postgres and you will see that they 
work without quotes.

Delimited identifiers, per standard, start with a double quote (") and end with 
a double quote ("). Anything may be placed within the quotes, including 
whitespace.

It seems to be fairly common among the SQL-on-Hadoop space (Presto, SparkSQL, 
maybe others) to allow both ` and " for quoting.

Punctuation is generally not allowed in regular identifiers and must be quoted.

> Support space in quoted column alias
> ------------------------------------
>
>                 Key: HIVE-16763
>                 URL: https://issues.apache.org/jira/browse/HIVE-16763
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>
> {code}
> select key as 'k y' from src;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to