[
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838974#comment-17838974
]
ASF GitHub Bot commented on HADOOP-18679:
-----------------------------------------
steveloughran commented on code in PR #6726:
URL: https://github.com/apache/hadoop/pull/6726#discussion_r1572230146
##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md:
##########
@@ -161,124 +105,20 @@ store.hasPathCapability(path,
"fs.capability.bulk.delete")
### Invocation through Reflection.
-The need for many Libraries to compile against very old versions of Hadoop
+The need for many libraries to compile against very old versions of Hadoop
means that most of the cloud-first Filesystem API calls cannot be used except
through reflection -And the more complicated The API and its data types are,
The harder that reflection is to implement.
-To assist this, the class `org.apache.hadoop.fs.FileUtil` has two methods
+To assist this, the class `org.apache.hadoop.io.wrappedio.WrappedIO` has few
methods
which are intended to provide simple access to the API, especially
through reflection.
```java
- /**
- * Get the maximum number of objects/files to delete in a single request.
- * @param fs filesystem
- * @param path path to delete under.
- * @return a number greater than or equal to zero.
- * @throws UnsupportedOperationException bulk delete under that path is not
supported.
- * @throws IllegalArgumentException path not valid.
- * @throws IOException problems resolving paths
- */
+
public static int bulkDeletePageSize(FileSystem fs, Path path) throws
IOException;
- /**
- * Delete a list of files/objects.
- * <ul>
- * <li>Files must be under the path provided in {@code base}.</li>
- * <li>The size of the list must be equal to or less than the page
size.</li>
- * <li>Directories are not supported; the outcome of attempting to delete
- * directories is undefined (ignored; undetected, listed as
failures...).</li>
- * <li>The operation is not atomic.</li>
- * <li>The operation is treated as idempotent: network failures may
- * trigger resubmission of the request -any new objects created under
a
- * path in the list may then be deleted.</li>
- * <li>There is no guarantee that any parent directories exist after this
call.
- * </li>
- * </ul>
- * @param fs filesystem
- * @param base path to delete under.
- * @param paths list of paths which must be absolute and under the base path.
- * @return a list of all the paths which couldn't be deleted for a reason
other than "not found" and any associated error message.
- * @throws UnsupportedOperationException bulk delete under that path is not
supported.
- * @throws IOException IO problems including networking, authentication and
more.
- * @throws IllegalArgumentException if a path argument is invalid.
- */
- public static List<Map.Entry<Path, String>> bulkDelete(FileSystem fs, Path
base, List<Path> paths)
-```
+ public static int bulkDeletePageSize(FileSystem fs, Path path) throws
IOException;
-## S3A Implementation
-
-The S3A client exports this API.
Review Comment:
this needs to be covered, along with the default implementation "maps to
delete(path, false)"
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java:
##########
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Default implementation of the {@link BulkDelete} interface.
+ */
+public class DefaultBulkDeleteOperation implements BulkDelete {
+
+ private final int pageSize;
+
+ private final Path basePath;
+
+ private final FileSystem fs;
+
+ public DefaultBulkDeleteOperation(int pageSize,
+ Path basePath,
+ FileSystem fs) {
+ checkArgument(pageSize == 1, "Page size must be equal to 1");
+ this.pageSize = pageSize;
+ this.basePath = requireNonNull(basePath);
+ this.fs = fs;
+ }
+
+ @Override
+ public int pageSize() {
+ return pageSize;
+ }
+
+ @Override
+ public Path basePath() {
+ return basePath;
+ }
+
+ @Override
+ public List<Map.Entry<Path, String>> bulkDelete(Collection<Path> paths)
+ throws IOException, IllegalArgumentException {
+ validateBulkDeletePaths(paths, pageSize, basePath);
+ List<Map.Entry<Path, String>> result = new ArrayList<>();
+ // this for loop doesn't make sense as pageSize must be 1.
+ for (Path path : paths) {
Review Comment:
there's only even going to be 1 entry here
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/DefaultBulkDeleteOperation.java:
##########
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.fs;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.hadoop.util.functional.Tuples;
+
+import static java.util.Objects.requireNonNull;
+import static org.apache.hadoop.fs.BulkDeleteUtils.validateBulkDeletePaths;
+import static org.apache.hadoop.util.Preconditions.checkArgument;
+
+/**
+ * Default implementation of the {@link BulkDelete} interface.
+ */
+public class DefaultBulkDeleteOperation implements BulkDelete {
+
+ private final int pageSize;
+
+ private final Path basePath;
+
+ private final FileSystem fs;
+
+ public DefaultBulkDeleteOperation(int pageSize,
+ Path basePath,
+ FileSystem fs) {
+ checkArgument(pageSize == 1, "Page size must be equal to 1");
+ this.pageSize = pageSize;
+ this.basePath = requireNonNull(basePath);
+ this.fs = fs;
+ }
+
+ @Override
+ public int pageSize() {
+ return pageSize;
+ }
+
+ @Override
+ public Path basePath() {
+ return basePath;
+ }
+
+ @Override
+ public List<Map.Entry<Path, String>> bulkDelete(Collection<Path> paths)
+ throws IOException, IllegalArgumentException {
+ validateBulkDeletePaths(paths, pageSize, basePath);
+ List<Map.Entry<Path, String>> result = new ArrayList<>();
+ // this for loop doesn't make sense as pageSize must be 1.
+ for (Path path : paths) {
+ try {
+ fs.delete(path, false);
+ // What to do if this return false?
+ // I think we should add the path to the result list with
value "Not Deleted".
Review Comment:
good q. I'd say yes. or actually do a getFileStatus() cal and see what is
there for
* file doesn't exist (not an error)
* path is a directory: add to result
key is that file not found isn't escalated
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractBulkDeleteTest.java:
##########
@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.contract;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import org.assertj.core.api.Assertions;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hadoop.fs.CommonPathCapabilities;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.wrappedio.WrappedIO;
+
+import static org.apache.hadoop.fs.contract.ContractTestUtils.skip;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.touch;
+import static org.apache.hadoop.io.wrappedio.WrappedIO.bulkDelete;
+import static org.apache.hadoop.test.LambdaTestUtils.intercept;
+
+/**
+ * Contract tests for bulk delete operation.
+ */
+public abstract class AbstractContractBulkDeleteTest extends
AbstractFSContractTestBase {
+
+ private static final Logger LOG =
+ LoggerFactory.getLogger(AbstractContractBulkDeleteTest.class);
+
+ protected int pageSize;
+
+ protected Path basePath;
+
+ protected FileSystem fs;
+
+ @Before
+ public void setUp() throws Exception {
+ fs = getFileSystem();
+ basePath = path(getClass().getName());
+ pageSize = WrappedIO.bulkDeletePageSize(getFileSystem(), basePath);
+ fs.mkdirs(basePath);
+ }
+
+ public Path getBasePath() {
+ return basePath;
+ }
+
+ /**
+ * Validate the page size for bulk delete operation. Different stores can
have different
+ * implementations for bulk delete operation thus different page size.
+ */
+ @Test
+ public void validatePageSize() throws Exception {
+ Assertions.assertThat(pageSize)
+ .describedAs("Page size should be 1 by default for all stores")
+ .isEqualTo(1);
+ }
+
+ @Test
+ public void testPathsSizeEqualsPageSizePrecondition() throws Exception {
+ List<Path> listOfPaths = createListOfPaths(pageSize, basePath);
+ // Bulk delete call should pass with no exception.
+ bulkDelete(getFileSystem(), basePath, listOfPaths);
+ }
+
+ @Test
+ public void testPathsSizeGreaterThanPageSizePrecondition() throws
Exception {
+ List<Path> listOfPaths = createListOfPaths(pageSize + 1, basePath);
+ intercept(IllegalArgumentException.class,
+ () -> bulkDelete(getFileSystem(), basePath, listOfPaths));
+ }
+
+ @Test
+ public void testPathsSizeLessThanPageSizePrecondition() throws Exception {
+ List<Path> listOfPaths = createListOfPaths(pageSize - 1, basePath);
+ // Bulk delete call should pass with no exception.
+ bulkDelete(getFileSystem(), basePath, listOfPaths);
+ }
+
+ @Test
+ public void testBulkDeleteSuccessful() throws Exception {
+ List<Path> listOfPaths = createListOfPaths(pageSize, basePath);
+ for (Path path : listOfPaths) {
+ touch(fs, path);
+ }
+ FileStatus[] fileStatuses = fs.listStatus(basePath);
+ Assertions.assertThat(fileStatuses)
+ .describedAs("File count after create")
+ .hasSize(pageSize);
+ assertSuccessfulBulkDelete(
+ bulkDelete(getFileSystem(), basePath, listOfPaths));
+ FileStatus[] fileStatusesAfterDelete = fs.listStatus(basePath);
+ Assertions.assertThat(fileStatusesAfterDelete)
+ .describedAs("File statuses should be empty after delete")
+ .isEmpty();
+ }
+
+ @Test
+ public void validatePathCapabilityDeclared() throws Exception {
+ Assertions.assertThat(fs.hasPathCapability(basePath,
CommonPathCapabilities.BULK_DELETE))
+ .describedAs("Path capability BULK_DELETE should be declared")
+ .isTrue();
+ }
+
+ @Test
+ public void testDeletePathsNotUnderBase() throws Exception {
+ List<Path> paths = new ArrayList<>();
+ Path pathNotUnderBase = path("not-under-base");
+ paths.add(pathNotUnderBase);
+ // Should fail as path is not under the base path.
+ intercept(IllegalArgumentException.class,
+ () -> bulkDelete(getFileSystem(), basePath, paths));
+ }
+
+ @Test
+ public void testDeletePathsNotAbsolute() throws Exception {
+ List<Path> paths = new ArrayList<>();
+ Path pathNotAbsolute = new Path("not-absolute");
+ paths.add(pathNotAbsolute);
+ // Should fail as path is not absolute.
+ intercept(IllegalArgumentException.class,
+ () -> bulkDelete(getFileSystem(), basePath, paths));
+ }
+
+ @Test
+ public void testDeletePathsNotExists() throws Exception {
+ List<Path> paths = new ArrayList<>();
+ Path pathNotExists = new Path(basePath, "not-exists");
+ paths.add(pathNotExists);
+ // bulk delete call doesn't verify if a path exist or not before
deleting.
+ assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath,
paths));
+ }
+
+ @Test
+ public void testDeletePathsDirectory() throws Exception {
+ List<Path> paths = new ArrayList<>();
+ Path dirPath = new Path(basePath, "dir");
+ fs.mkdirs(dirPath);
+ paths.add(dirPath);
+ Path filePath = new Path(dirPath, "file");
+ touch(fs, filePath);
+ paths.add(filePath);
+ pageSizePreconditionForTest(paths.size());
+ // Outcome is undefined. But call shouldn't fail. In case of S3
directories will still be present.
+ assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath,
paths));
+ }
+
+ @Test
+ public void testDeleteEmptyDirectory() throws Exception {
+ List<Path> paths = new ArrayList<>();
+ Path emptyDirPath = new Path(basePath, "empty-dir");
+ fs.mkdirs(emptyDirPath);
+ paths.add(emptyDirPath);
+ // Should pass as empty directory.
+ assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath,
paths));
+ }
+
+ @Test
+ public void testDeleteEmptyList() throws Exception {
+ List<Path> paths = new ArrayList<>();
+ // Empty list should pass.
+ assertSuccessfulBulkDelete(bulkDelete(getFileSystem(), basePath,
paths));
+ }
+
+ @Test
+ public void testDeleteSamePathsMoreThanOnce() throws Exception {
+ List<Path> paths = new ArrayList<>();
+ Path path = new Path(basePath, "file");
+ touch(fs, path);
+ paths.add(path);
+ paths.add(path);
+ Path another = new Path(basePath, "another-file");
+ touch(fs, another);
+ paths.add(another);
+ pageSizePreconditionForTest(paths.size());
Review Comment:
build the paths and this precondition before the touch; saves any network IO
before skipping
##########
hadoop-common-project/hadoop-common/src/site/markdown/filesystem/bulkdelete.md:
##########
@@ -0,0 +1,124 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# <a name="BulkDelete"></a> interface `BulkDelete`
+
+<!
> Add API for bulk/paged object deletion
> --------------------------------------
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.5
> Reporter: Steve Loughran
> Priority: Major
> Labels: pull-request-available
>
> iceberg and hbase could benefit from being able to give a list of individual
> files to delete -files which may be scattered round the bucket for better
> read peformance.
> Add some new optional interface for an object store which allows a caller to
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit,
> without any probes first.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]