[ https://issues.apache.org/jira/browse/HIVE-18696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marta Kuczora updated HIVE-18696: --------------------------------- Attachment: HIVE-18696.3.patch > The partition folders might not get cleaned up properly in the > HiveMetaStore.add_partitions_core method if an exception occurs > ------------------------------------------------------------------------------------------------------------------------------ > > Key: HIVE-18696 > URL: https://issues.apache.org/jira/browse/HIVE-18696 > Project: Hive > Issue Type: Bug > Components: Metastore > Reporter: Marta Kuczora > Assignee: Marta Kuczora > Priority: Major > Attachments: HIVE-18696.1.patch, HIVE-18696.2.patch, > HIVE-18696.3.patch > > > When trying to add multiple partitions, but one of them cannot be created > successfully, none of the partitions are created, but the folders might not > be cleaned up properly. See the test case "testAddPartitionsOneInvalid" in > the TestAddPartitions test. > This is the problematic code in the HiveMetaStore.add_partitions_core method: > {code:java} > for (final Partition part : parts) { > if (!part.getTableName().equals(tblName) || > !part.getDbName().equals(dbName)) { > throw new MetaException("Partition does not belong to target > table " > + dbName + "." + tblName + ": " + part); > } > boolean shouldAdd = startAddPartition(ms, part, ifNotExists); > if (!shouldAdd) { > existingParts.add(part); > LOG.info("Not adding partition " + part + " as it already > exists"); > continue; > } > final UserGroupInformation ugi; > try { > ugi = UserGroupInformation.getCurrentUser(); > } catch (IOException e) { > throw new RuntimeException(e); > } > partFutures.add(threadPool.submit(new Callable<Partition>() { > @Override > public Partition call() throws Exception { > ugi.doAs(new PrivilegedExceptionAction<Object>() { > @Override > public Object run() throws Exception { > try { > boolean madeDir = createLocationForAddedPartition(table, > part); > if (addedPartitions.put(new PartValEqWrapper(part), > madeDir) != null) { > // Technically, for ifNotExists case, we could insert > one and discard the other > // because the first one now "exists", but it seems > better to report the problem > // upstream as such a command doesn't make sense. > throw new MetaException("Duplicate partitions in the > list: " + part); > } > initializeAddedPartition(table, part, madeDir); > } catch (MetaException e) { > throw new IOException(e.getMessage(), e); > } > return null; > } > }); > return part; > } > })); > } > {code} > When going through the partitions, let's say for the first two partitions the > threads are successfully submitted to create the folders. But an exception > occurs for the third partition in the code before submitting the thread. (It > can happen if the partition has different table or db name as the others or > it has invalid value.) > In this case the execution will jump to the finally part where the folders > in the "addedPartitions" map will be cleaned up. However it can happen that > the threads for the first two partitions are not finished with the folder > creation yet, so the map can be empty or it can contain only one of the > partitions. > This issue also happens in the HiveMetastore.add_partitions_pspec_core > method, as this code part is the same as in the add_partitions_core method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)