-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/
-----------------------------------------------------------
(Updated 2011-08-08 20:55:11.546253)
Review request for hive, Ning Zhang and Paul Yang.
Changes
-------
added derby upgrade and revert-the-upgrade script
Summary
-------
This patch tries to make minimal changes to the API while keeping migration
short and somewhat easy to revert.
The new schema can be described as follows:
- CDS is a table corresponding to Column Descriptor objects. Currently, it
only stores a CD_ID.
- COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A
Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to the
CD_ID to which it belongs.
- SDS was modified to reference a Column Descriptor. So SDS now has a foreign
key to a CD_ID which describes its columns.
During migration, we create Column Descriptors for tables in a straightforward
manner: their columns are now just wrapped inside a column descriptor. The SDS
of partitions use their parent table's column descriptor, since currently a
partition and its table share the same list of columns.
When altering or adding a partition, give it it's parent table's column
descriptor IF the columns they describe are the same. Otherwise, create a new
column descriptor for its columns.
When adding or altering a table, create a new column descriptor every time.
Whenever you drop a storage descriptor (e.g, when dropping tables or
partitions), check to see if the related column descriptor has any other
references in the table. That is, check to see if any other storage
descriptors point to that column descriptor. If none do, then delete that
column descriptor. This check is in place so we don't have unreferenced column
descriptors and columns hanging around after schema evolution for tables.
This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246
Diffs (updated)
-----
trunk/metastore/scripts/upgrade/derby/008-HIVE-2246.derby.sql PRE-CREATION
trunk/metastore/scripts/upgrade/derby/008-REVERT-HIVE-2246.derby.sql
PRE-CREATION
trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
1153927
trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
1153927
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1153927
trunk/metastore/src/model/package.jdo 1153927
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1153927
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java
1153927
Diff: https://reviews.apache.org/r/1183/diff
Testing
-------
Passes facebook's regression testing and all existing test cases. In one
instance, before migration, the overhead involved with storage descriptors and
columns was ~11 GB. After migration, the overhead was ~1.5 GB.
Thanks,
Sohan