[ https://issues.apache.org/jira/browse/HIVE-25779?focusedWorklogId=787561&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-787561 ]
ASF GitHub Bot logged work on HIVE-25779: ----------------------------------------- Author: ASF GitHub Bot Created on: 04/Jul/22 11:16 Start Date: 04/Jul/22 11:16 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on code in PR #3221: URL: https://github.com/apache/hive/pull/3221#discussion_r912900559 ########## standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestObjectStore.java: ########## @@ -1483,6 +1483,135 @@ public void testListPackage() throws Exception { Assert.assertEquals(1, result.size()); } + @Test + public void testSerDeCreatedOnDemand() throws Exception { + // Enable USE_TABLE_SERDES + final boolean origUseTableSerDes = MetastoreConf.getBoolVar(conf, ConfVars.USE_TABLE_SERDES); + if (!origUseTableSerDes) { + MetastoreConf.setBoolVar(conf, ConfVars.USE_TABLE_SERDES, true); + objectStore.setConf(conf); + } + + createPartitionedTable(true, true); + // Partitions should reuse table's serde info + checkBackendTableSize("PARTITIONS", 3); + checkBackendTableSize("SERDES", 1); + checkBackendTableSize("SERDE_PARAMS", 1); + Table tbl; + Partition newPart; + // Alters table's serde info + try (AutoCloseable c = deadline()) { + tbl = objectStore.getTable(DEFAULT_CATALOG_NAME, DB1, TABLE1); + Table newTbl = tbl.deepCopy(); + newTbl.getSd().getSerdeInfo().setDescription("To test SerDe is created on demand"); + objectStore.alterTable(DEFAULT_CATALOG_NAME, DB1, TABLE1, newTbl, null); + } + // A new SERDE should be created + checkBackendTableSize("SERDES", 2); + checkBackendTableSize("SERDE_PARAMS", 2); + // Alter a partition's serde info + List<String> partVals = Collections.singletonList("a0"); + try (AutoCloseable c = deadline()) { + Partition part = objectStore.getPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, partVals); + newPart = part.deepCopy(); + newPart.getSd().getSerdeInfo().setDescription("To test SerDe is created on demand"); + objectStore.alterPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, partVals, newPart, null); + } + // A new SERDE should be created + checkBackendTableSize("SERDES", 3); + checkBackendTableSize("SERDE_PARAMS", 3); + // Adds a partition with different serde + newPart.setValues(Collections.singletonList("a3")); + newPart.getSd().getSerdeInfo().setDescription("To test adding a partition with different SerDe"); + objectStore.addPartition(newPart); + // A new SERDE should be created + checkBackendTableSize("SERDES", 4); + checkBackendTableSize("SERDE_PARAMS", 4); + + // Restore conf + if (!origUseTableSerDes) { + MetastoreConf.setBoolVar(conf, ConfVars.USE_TABLE_SERDES, false); + objectStore.setConf(conf); + } + } + + @Test + public void testSerDesCleanup() throws Exception { + // Enable USE_TABLE_SERDES + final boolean origUseTableSerDes = MetastoreConf.getBoolVar(conf, ConfVars.USE_TABLE_SERDES); + if (!origUseTableSerDes) { + MetastoreConf.setBoolVar(conf, ConfVars.USE_TABLE_SERDES, true); + objectStore.setConf(conf); + } + + createPartitionedTable(true, true); + // Partitions should reuse table's serde info + checkBackendTableSize("PARTITIONS", 3); + checkBackendTableSize("SERDES", 1); + checkBackendTableSize("SERDE_PARAMS", 1); + // Alters a partition's serde info + List<String> partVals = Collections.singletonList("a0"); + Partition newPart; + try (AutoCloseable c = deadline()) { + Partition part = objectStore.getPartition(DEFAULT_CATALOG_NAME, DB1, TABLE1, partVals); + newPart = part.deepCopy(); Review Comment: could we should check the serde info before and after the change? Issue Time Tracking ------------------- Worklog Id: (was: 787561) Time Spent: 1h 50m (was: 1h 40m) > Deduplicate SerDe Info > ---------------------- > > Key: HIVE-25779 > URL: https://issues.apache.org/jira/browse/HIVE-25779 > Project: Hive > Issue Type: New Feature > Components: Standalone Metastore > Reporter: Yu-Wen Lai > Assignee: Yu-Wen Lai > Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > The proposal is that we can reuse serde info as how we reuse column > descriptors. (HIVE-2246) > Currently, we store the metadata for partitions as PARTITIONS (N partitions) > -> SDS (N locations) -> SERDES (N entries). However, all the SERDES for the > partitions in a table are the same if we don't explicitly specify it. That > is, each storage descriptor has a associated and exclusive serde info, but > the partitions' serde infos are mostly just the same as the table's. By > reusing the serde info, we can save some database storage and enhance the > query performance from HMS to the backend database. > For backward compatibility, we also need to introduce a config for this > feature because there will be issues if HMS old instance and HMS new instance > with this feature are running together. With this feature, we will need to > check if others reference the serdes before deleting it, but the old instance > will just delete it. > The other thing we need to take care of is custom serdes. If a partition's > serde is modified, we need to create a new record in SERDES so that we don't > interfere other partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010)