> On 2011-07-25 06:46:04, Ning Zhang wrote:
> > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, 
> > line 1752
> > <https://reviews.apache.org/r/1183/diff/2/?file=26825#file26825line1752>
> >
> >     here do you check if the 'alter table' command changes the schema 
> > (columns definition)? If it just set a table property, then you don't need 
> > to create a new ColumnDescriptor right?
> >     
> >     Also if a table's schema got changed, a new CD will be created, but the 
> > old partition will still have the old CDs. When we query the old partition, 
> > do we use the old partitons's CD or the table's CD? 
> >     
> >     Also in the above case, when you run 'desc table partition 
> > <old_partition>', do you return the old partition's CD or the table's CD?
> 
> Sohan Jain wrote:
>     Good point; I should check whether the table columns have changed; I do 
> this already when altering partitions.  I added that in the next diff.
>     
>     If a table's schema changes, it does not update existing partition CDs.  
> If we ever grab the partition object after the schema change, it will refer 
> to its old CD, not the table's CD.  However, when querying tables on the CLI, 
> we almost always use the table's set of columns.  E.g., if did:
>     > create table test (a string) partitioned by (p1 string, p2 string);
>     > alter table test add partition(p1=1, p2=1);
>     > # populate the p1=1, p2=1 partition with some data now
>     > alter table test add columns (b string)
>     > select * from test where p1 = 1 and p2 = 1,
>     
>     it'd use the table's latest schema; i.e., return the column 'a's values 
> and the column 'b' as all NULL.

Also, I fixed the "desc table partition" to use the partition's column schema, 
not the table's.


- Sohan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1183/#review1176
-----------------------------------------------------------


On 2011-07-22 05:30:29, Sohan Jain wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1183/
> -----------------------------------------------------------
> 
> (Updated 2011-07-22 05:30:29)
> 
> 
> Review request for hive, Ning Zhang and Paul Yang.
> 
> 
> Summary
> -------
> 
> This patch tries to make minimal changes to the API while keeping migration 
> short and somewhat easy to revert.
> 
> The new schema can be described as follows:
> - CDS is a table corresponding to Column Descriptor objects.  Currently, it 
> only stores a CD_ID.
> - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns.  A 
> Column Descriptor holds a list of columns.  COLUMNS_V2 has a foreign key to 
> the CD_ID to which it belongs.
> - SDS was modified to reference a Column Descriptor. So SDS now has a foreign 
> key to a CD_ID which describes its columns.
> 
> During migration, we create Column Descriptors for tables in a 
> straightforward manner: their columns are now just wrapped inside a column 
> descriptor.  The SDS of partitions use their parent table's column 
> descriptor, since currently a partition and its table share the same list of 
> columns.
> 
> When altering or adding a partition, give it it's parent table's column 
> descriptor IF the columns they describe are the same.  Otherwise, create a 
> new column descriptor for its columns.
> 
> When adding or altering a table, create a new column descriptor every time.
> 
> Whenever you drop a storage descriptor (e.g, when dropping tables or 
> partitions), check to see if the related column descriptor has any other 
> references in the table.  That is, check to see if any other storage 
> descriptors point to that column descriptor.  If none do, then delete that 
> column descriptor.  This check is in place so we don't have unreferenced 
> column descriptors and columns hanging around after schema evolution for 
> tables.
> 
> 
> This addresses bug HIVE-2246.
>     https://issues.apache.org/jira/browse/HIVE-2246
> 
> 
> Diffs
> -----
> 
>   trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION 
>   trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 
> 1148945 
>   
> trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java
>  PRE-CREATION 
>   
> trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java
>  1148945 
>   trunk/metastore/src/model/package.jdo 1148945 
> 
> Diff: https://reviews.apache.org/r/1183/diff
> 
> 
> Testing
> -------
> 
> Passes facebook's regression testing and all existing test cases.  In one 
> instance, before migration, the overhead involved with storage descriptors 
> and columns was ~11 GB.  After migration, the overhead was ~1.5 GB.
> 
> 
> Thanks,
> 
> Sohan
> 
>

Reply via email to