[ https://issues.apache.org/jira/browse/HIVE-19250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448753#comment-16448753 ]
Naveen Gangam commented on HIVE-19250: -------------------------------------- Thanks [~aihuaxu] As I was working thru the inconsistencies between the dumps as part of this jira, I had wondered that as well. Hive schema for MySQL use the latin1 CHARSET for all its tables. It does not specify theĀ {{collation_name}} at the table level which causes it to use {{{{latin1_swedish_ci}}}} forĀ {{latin1}} charset. However, the schema overrides this collation for select columns by specifying it in the column definition of the create table command. These overrides use \{{latin1_bin}} collation name. So some select columns use {{latin1_bin}} while the rest of them use the default.{{}} I am not sure which is better for performance for comparing values or how it was determined what collation to use and what columns to use it on. I am just trying to make the schema definitions consistent. I had assumed that this CHARACTER SET and COLLATION NAMEs have been vetted thru when this schema was first designed. Changing charsets entirely needs to scoped out separately as we have to ensure that it works across all DBs. Hope this helps. Thanks > Schema column definitions inconsistencies in MySQL > -------------------------------------------------- > > Key: HIVE-19250 > URL: https://issues.apache.org/jira/browse/HIVE-19250 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.1.0 > Reporter: Naveen Gangam > Assignee: Naveen Gangam > Priority: Minor > Attachments: HIVE-19250.patch > > > There are some inconsistencies in column definitions in MySQL between a > schema that was upgraded to 2.1 (from an older release) vs installing the > 2.1.0 schema directly. > > `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL, > 117d117 > < `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL, > 135a136 > > `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL, > 143d143 > < `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL, > 156c156 > < `CTC_TXNID` bigint(20) DEFAULT NULL, > --- > > `CTC_TXNID` bigint(20) NOT NULL, > 158c158 > < `CTC_TABLE` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT > NULL, > --- > > `CTC_TABLE` varchar(256) DEFAULT NULL, > 476c476 > < `TBL_NAME` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT > NULL, > --- > > `TBL_NAME` varchar(256) DEFAULT NULL, > 664c664 > < KEY `PCS_STATS_IDX` > (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`), > --- > > KEY `PCS_STATS_IDX` > > (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`) USING BTREE, > 768c768 > < `PARAM_VALUE` mediumtext, > --- > > `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin, > 814c814 > < `PARAM_VALUE` mediumtext, > --- > > `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin, > 934c934 > < `PARAM_VALUE` mediumtext, > --- > > `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin, > 1066d1065 > < `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL, > 1067a1067 > > `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL, > 1080c1080 > < `TC_TXNID` bigint(20) DEFAULT NULL, > --- > > `TC_TXNID` bigint(20) NOT NULL, > 1082c1082 > < `TC_TABLE` varchar(128) DEFAULT NULL, > --- > > `TC_TABLE` varchar(128) NOT NULL, > 1084c1084 > < `TC_OPERATION_TYPE` char(1) DEFAULT NULL, > --- > > `TC_OPERATION_TYPE` char(1) NOT NULL, -- This message was sent by Atlassian JIRA (v7.6.3#76005)