[ https://issues.apache.org/jira/browse/HIVE-9445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chao Sun updated HIVE-9445: --------------------------- Fix Version/s: (was: 1.0.1) > Revert HIVE-5700 - enforce single date format for partition column storage > -------------------------------------------------------------------------- > > Key: HIVE-9445 > URL: https://issues.apache.org/jira/browse/HIVE-9445 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 0.13.0, 0.14.0, 0.13.1, 0.15.0, 0.14.1 > Reporter: Brock Noland > Assignee: Brock Noland > Priority: Blocker > Fix For: 1.1.0 > > Attachments: HIVE-9445.1.patch, HIVE-9445.1.patch > > > HIVE-5700 has the following issues: > * HIVE-8730 - fails mysql upgrades > * Does not upgrade all metadata, e.g. {{PARTITIONS.PART_NAME}} See comments > in HIVE-5700. > * Completely corrupts postgres, see below. > With a postgres metastore on 0.12, I executed the following: > {noformat} > CREATE TABLE HIVE5700_DATE_PARTED (line string) PARTITIONED BY (ddate date); > CREATE TABLE HIVE5700_STRING_PARTED (line string) PARTITIONED BY (ddate > string); > ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='NOT_DATE'); > ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150121'); > ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='20150122'); > ALTER TABLE HIVE5700_DATE_PARTED ADD PARTITION (ddate='2015-01-23'); > ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='NOT_DATE'); > ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150121'); > ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='20150122'); > ALTER TABLE HIVE5700_STRING_PARTED ADD PARTITION (ddate='2015-01-23'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_DATE_PARTED PARTITION (ddate='NOT_DATE'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_DATE_PARTED PARTITION (ddate='20150121'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_DATE_PARTED PARTITION (ddate='20150122'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_DATE_PARTED PARTITION (ddate='2015-01-23'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_STRING_PARTED PARTITION (ddate='NOT_DATE'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_STRING_PARTED PARTITION (ddate='20150121'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_STRING_PARTED PARTITION (ddate='20150122'); > LOAD DATA LOCAL INPATH '/tmp/single-line-of-data' INTO TABLE > HIVE5700_STRING_PARTED PARTITION (ddate='2015-01-23'); > hive> show partitions HIVE5700_DATE_PARTED; > OK > ddate=20150121 > ddate=20150122 > ddate=2015-01-23 > ddate=NOT_DATE > Time taken: 0.052 seconds, Fetched: 4 row(s) > hive> show partitions HIVE5700_STRING_PARTED; > OK > ddate=20150121 > ddate=20150122 > ddate=2015-01-23 > ddate=NOT_DATE > Time taken: 0.051 seconds, Fetched: 4 row(s) > {noformat} > I then took a dump of the database named {{postgres-pre-upgrade.sql}} and the > data in the dump looks good: > {noformat} > [root@hive5700-1-1 ~]# egrep -A9 '^COPY "PARTITIONS"|^COPY > "PARTITION_KEY_VALS"' postgres-pre-upgrade.sql > COPY "PARTITIONS" ("PART_ID", "CREATE_TIME", "LAST_ACCESS_TIME", "PART_NAME", > "SD_ID", "TBL_ID") FROM stdin; > 3 1421943647 0 ddate=NOT_DATE 6 2 > 4 1421943647 0 ddate=20150121 7 2 > 5 1421943648 0 ddate=20150122 8 2 > 6 1421943664 0 ddate=NOT_DATE 9 3 > 7 1421943664 0 ddate=20150121 10 3 > 8 1421943665 0 ddate=20150122 11 3 > 9 1421943694 0 ddate=2015-01-23 12 2 > 10 1421943695 0 ddate=2015-01-23 13 3 > \. > -- > COPY "PARTITION_KEY_VALS" ("PART_ID", "PART_KEY_VAL", "INTEGER_IDX") FROM > stdin; > 3 NOT_DATE 0 > 4 20150121 0 > 5 20150122 0 > 6 NOT_DATE 0 > 7 20150121 0 > 8 20150122 0 > 9 2015-01-23 0 > 10 2015-01-23 0 > \. > {noformat} > I then upgraded to 0.13 and subsequently upgraded the MS with the following > command: {{schematool -dbType postgres -upgradeSchema -verbose}} > The file {{postgres-post-upgrade.sql}} is the post-upgrade db dump. As you > can see the data is completely corrupt. > {noformat} > [root@hive5700-1-1 ~]# egrep -A9 '^COPY "PARTITIONS"|^COPY > "PARTITION_KEY_VALS"' postgres-post-upgrade.sql > COPY "PARTITIONS" ("PART_ID", "CREATE_TIME", "LAST_ACCESS_TIME", "PART_NAME", > "SD_ID", "TBL_ID") FROM stdin; > 3 1421943647 0 ddate=NOT_DATE 6 2 > 4 1421943647 0 ddate=20150121 7 2 > 5 1421943648 0 ddate=20150122 8 2 > 6 1421943664 0 ddate=NOT_DATE 9 3 > 7 1421943664 0 ddate=20150121 10 3 > 8 1421943665 0 ddate=20150122 11 3 > 9 1421943694 0 ddate=2015-01-23 12 2 > 10 1421943695 0 ddate=2015-01-23 13 3 > \. > -- > COPY "PARTITION_KEY_VALS" ("PART_ID", "PART_KEY_VAL", "INTEGER_IDX") FROM > stdin; > 3 2015-01-23 0 > 4 2015-01-23 0 > 5 2015-01-23 0 > 6 2015-01-23 0 > 7 2015-01-23 0 > 8 2015-01-23 0 > 9 2015-01-23 0 > 10 2015-01-23 0 > \. > {noformat} > Same corruption from the HIVE CLI: > {noformat} > hive> show partitions HIVE5700_DATE_PARTED; > OK > ddate=20150121 > ddate=20150122 > ddate=2015-01-23 > ddate=NOT_DATE > Time taken: 0.844 seconds, Fetched: 4 row(s) > hive> show partitions HIVE5700_STRING_PARTED; > OK > ddate=20150121 > ddate=20150122 > ddate=2015-01-23 > ddate=NOT_DATE > Time taken: 0.114 seconds, Fetched: 4 row(s) > hive> select * from HIVE5700_STRING_PARTED; > OK > some data! 2015-01-23 > some data! 2015-01-23 > some data! 2015-01-23 > some data! 2015-01-23 > Time taken: 0.536 seconds, Fetched: 4 row(s) > hive> select * from HIVE5700_DATE_PARTED; > OK > some data! 2015-01-23 > some data! 2015-01-23 > some data! 2015-01-23 > some data! 2015-01-23 > Time taken: 0.145 seconds, Fetched: 4 row(s) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)