[ 
https://issues.apache.org/jira/browse/HIVE-19250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448753#comment-16448753
 ] 

Naveen Gangam commented on HIVE-19250:
--------------------------------------

Thanks [~aihuaxu] As I was working thru the inconsistencies between the dumps 
as part of this jira, I had wondered that as well. Hive schema for MySQL use 
the latin1 CHARSET for all its tables. It does not specify theĀ 
{{collation_name}} at the table level which causes it to use 
{{{{latin1_swedish_ci}}}} forĀ {{latin1}} charset. However, the schema overrides 
this collation for select columns by specifying it in the column definition of 
the create table command. These overrides use \{{latin1_bin}} collation name. 
So some select columns use {{latin1_bin}} while the rest of them use the 
default.{{}}

I am not sure which is better for performance for comparing values or how it 
was determined what collation to use and what columns to use it on. I am just 
trying to make the schema definitions consistent.

I had assumed that this CHARACTER SET and COLLATION NAMEs have been vetted thru 
when this schema was first designed. Changing charsets entirely needs to scoped 
out separately as we have to ensure that it works across all DBs.

Hope this helps. Thanks

> Schema column definitions inconsistencies in MySQL
> --------------------------------------------------
>
>                 Key: HIVE-19250
>                 URL: https://issues.apache.org/jira/browse/HIVE-19250
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.0
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>            Priority: Minor
>         Attachments: HIVE-19250.patch
>
>
> There are some inconsistencies in column definitions in MySQL between a 
> schema that was upgraded to 2.1 (from an older release) vs installing the 
> 2.1.0 schema directly.
> >   `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 117d117
> <   `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 135a136
> >   `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 143d143
> <   `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 156c156
> <   `CTC_TXNID` bigint(20) DEFAULT NULL,
> ---
> >   `CTC_TXNID` bigint(20) NOT NULL,
> 158c158
> <   `CTC_TABLE` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT 
> NULL,
> ---
> >   `CTC_TABLE` varchar(256) DEFAULT NULL,
> 476c476
> <   `TBL_NAME` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT 
> NULL,
> ---
> >   `TBL_NAME` varchar(256) DEFAULT NULL,
> 664c664
> <   KEY `PCS_STATS_IDX` 
> (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`),
> ---
> >   KEY `PCS_STATS_IDX` 
> > (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`) USING BTREE,
> 768c768
> <   `PARAM_VALUE` mediumtext,
> ---
> >   `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin,
> 814c814
> <   `PARAM_VALUE` mediumtext,
> ---
> >   `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin,
> 934c934
> <   `PARAM_VALUE` mediumtext,
> ---
> >   `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin,
> 1066d1065
> <   `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL,
> 1067a1067
> >   `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL,
> 1080c1080
> <   `TC_TXNID` bigint(20) DEFAULT NULL,
> ---
> >   `TC_TXNID` bigint(20) NOT NULL,
> 1082c1082
> <   `TC_TABLE` varchar(128) DEFAULT NULL,
> ---
> >   `TC_TABLE` varchar(128) NOT NULL,
> 1084c1084
> <   `TC_OPERATION_TYPE` char(1) DEFAULT NULL,
> ---
> >   `TC_OPERATION_TYPE` char(1) NOT NULL,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to