[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

Hudson (JIRA) Tue, 09 Aug 2011 21:38:04 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082131#comment-13082131
 ]


Hudson commented on HIVE-2246:
------------------------------

Integrated in Hive-trunk-h0.21 #885 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/885/])
    HIVE-2246. Dedupe tables' column schemas from partitions in the metastore 
db (Sohan Jain via pauly)

pauly : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1155573
Files : 
* 
/hive/trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
* /hive/trunk/metastore/scripts/upgrade/derby/008-REVERT-HIVE-2246.derby.sql
* /hive/trunk/metastore/scripts/upgrade/derby/008-HIVE-2246.derby.sql
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java
* /hive/trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
* /hive/trunk/metastore/src/model/package.jdo


> Dedupe tables' column schemas from partitions in the metastore db
> -----------------------------------------------------------------
>
>                 Key: HIVE-2246
>                 URL: https://issues.apache.org/jira/browse/HIVE-2246
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Sohan Jain
>            Assignee: Sohan Jain
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2246.2.patch, HIVE-2246.3.patch, HIVE-2246.4.patch, 
> HIVE-2246.8.patch
>
>
> Note: this patch proposes a schema change, and is therefore incompatible with 
> the current metastore.
> We can re-organize the JDO models to reduce space usage to keep the metastore 
> scalable for the future.  Currently, partitions are the fastest growing 
> objects in the metastore, and the metastore keeps a separate copy of the 
> columns list for each partition.  We can normalize the metastore db by 
> decoupling Columns from Storage Descriptors and not storing duplicate lists 
> of the columns for each partition. 
> An idea is to create an additional level of indirection with a "Column 
> Descriptor" that has a list of columns.  A table has a reference to its 
> latest Column Descriptor (note: a table may have more than one Column 
> Descriptor in the case of schema evolution).  Partitions and Indexes can 
> reference the same Column Descriptors as their parent table.
> Currently, the COLUMNS table in the metastore has roughly (number of 
> partitions + number of tables) * (average number of columns pertable) rows.  
> We can reduce this to (number of tables) * (average number of columns per 
> table) rows, while incurring a small cost proportional to the number of 
> tables to store the Column Descriptors.
> Please see the latest review board for additional implementation details.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2246) Dedupe tables' column schemas from partitions in the metastore db

Reply via email to