-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/985/
-----------------------------------------------------------
Review request for hive.
Summary
-------
We can re-organize the JDO models to reduce space usage to keep the metastore
scalable for the future. Currently, partitions are the fastest growing objects
in the metastore, and the metastore keeps a separate copy of the columns list
for each partition. We can normalize the metastore db by decoupling Columns
from Storage Descriptors and not storing duplicate lists of the columns for
each partition.
An idea is to create an additional level of indirection with a "Column
Descriptor" that has a list of columns. A table has a reference to its latest
Column Descriptor (note: a table may have more than one Column Descriptor in
the case of schema evolution). Partitions and Indexes can reference the same
Column Descriptors as their parent table.
Currently, the COLUMNS table in the metastore has roughly (number of partitions
+ number of tables) * (average number of columns pertable) rows. We can reduce
this to (number of tables) * (average number of columns per table) rows, while
incurring a small cost proportional to the number of tables to store the Column
Descriptors.
This addresses bug HIVE-2246.
https://issues.apache.org/jira/browse/HIVE-2246
Diffs
-----
trunk/metastore/if/hive_metastore.thrift 1140399
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
PRE-CREATION
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MDatabase.java
1140399
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MFieldSchema.java
1140399
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MIndex.java
1140399
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartition.java
1140399
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
1140399
trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MTable.java
1140399
trunk/metastore/src/model/package.jdo 1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/index/TableBasedIndexHandler.java
1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/index/bitmap/BitmapIndexHandler.java
1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/index/compact/CompactIndexHandler.java
1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/MetaDataFormatUtils.java
1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java 1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 1140399
trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java
1140399
Diff: https://reviews.apache.org/r/985/diff
Testing
-------
Haven't run any unit tests yet, just qualitative testing so far.
Thanks,
Sohan