> On 2011-07-25 06:46:04, Ning Zhang wrote: > > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java, > > line 1752 > > <https://reviews.apache.org/r/1183/diff/2/?file=26825#file26825line1752> > > > > here do you check if the 'alter table' command changes the schema > > (columns definition)? If it just set a table property, then you don't need > > to create a new ColumnDescriptor right? > > > > Also if a table's schema got changed, a new CD will be created, but the > > old partition will still have the old CDs. When we query the old partition, > > do we use the old partitons's CD or the table's CD? > > > > Also in the above case, when you run 'desc table partition > > <old_partition>', do you return the old partition's CD or the table's CD? > > Sohan Jain wrote: > Good point; I should check whether the table columns have changed; I do > this already when altering partitions. I added that in the next diff. > > If a table's schema changes, it does not update existing partition CDs. > If we ever grab the partition object after the schema change, it will refer > to its old CD, not the table's CD. However, when querying tables on the CLI, > we almost always use the table's set of columns. E.g., if did: > > create table test (a string) partitioned by (p1 string, p2 string); > > alter table test add partition(p1=1, p2=1); > > # populate the p1=1, p2=1 partition with some data now > > alter table test add columns (b string) > > select * from test where p1 = 1 and p2 = 1, > > it'd use the table's latest schema; i.e., return the column 'a's values > and the column 'b' as all NULL.
Also, I fixed the "desc table partition" to use the partition's column schema, not the table's. - Sohan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1183/#review1176 ----------------------------------------------------------- On 2011-07-22 05:30:29, Sohan Jain wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/1183/ > ----------------------------------------------------------- > > (Updated 2011-07-22 05:30:29) > > > Review request for hive, Ning Zhang and Paul Yang. > > > Summary > ------- > > This patch tries to make minimal changes to the API while keeping migration > short and somewhat easy to revert. > > The new schema can be described as follows: > - CDS is a table corresponding to Column Descriptor objects. Currently, it > only stores a CD_ID. > - COLUMNS_V2 is a table corresponding to MFieldSchema objects, or columns. A > Column Descriptor holds a list of columns. COLUMNS_V2 has a foreign key to > the CD_ID to which it belongs. > - SDS was modified to reference a Column Descriptor. So SDS now has a foreign > key to a CD_ID which describes its columns. > > During migration, we create Column Descriptors for tables in a > straightforward manner: their columns are now just wrapped inside a column > descriptor. The SDS of partitions use their parent table's column > descriptor, since currently a partition and its table share the same list of > columns. > > When altering or adding a partition, give it it's parent table's column > descriptor IF the columns they describe are the same. Otherwise, create a > new column descriptor for its columns. > > When adding or altering a table, create a new column descriptor every time. > > Whenever you drop a storage descriptor (e.g, when dropping tables or > partitions), check to see if the related column descriptor has any other > references in the table. That is, check to see if any other storage > descriptors point to that column descriptor. If none do, then delete that > column descriptor. This check is in place so we don't have unreferenced > column descriptors and columns hanging around after schema evolution for > tables. > > > This addresses bug HIVE-2246. > https://issues.apache.org/jira/browse/HIVE-2246 > > > Diffs > ----- > > trunk/metastore/scripts/upgrade/mysql/008-HIVE-2246.mysql.sql PRE-CREATION > trunk/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 1148945 > > trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MColumnDescriptor.java > PRE-CREATION > > trunk/metastore/src/model/org/apache/hadoop/hive/metastore/model/MStorageDescriptor.java > 1148945 > trunk/metastore/src/model/package.jdo 1148945 > > Diff: https://reviews.apache.org/r/1183/diff > > > Testing > ------- > > Passes facebook's regression testing and all existing test cases. In one > instance, before migration, the overhead involved with storage descriptors > and columns was ~11 GB. After migration, the overhead was ~1.5 GB. > > > Thanks, > > Sohan > >