Steve Loughran created HADOOP-19388:
---------------------------------------

             Summary: S3A: Validate bulk delete through Iceberg HadoopFileIO
                 Key: HADOOP-19388
                 URL: https://issues.apache.org/jira/browse/HADOOP-19388
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3, test
    Affects Versions: 3.4.1
            Reporter: Steve Loughran
            Assignee: Steve Loughran




Now Hadoop 3.4.1 has shipped we can link up Iceberg to it
through reflection: https://github.com/apache/iceberg/pull/10233

However, we can't put a test in there, even just to talk to
the minio docker image which S3FileIO tests with, because
the tests would only work with hadoop 3.4.1+

Proposed: add a validation test here, initially just with a JAR built from the 
PR.
Initially this just says "it works as expected".
However, it will go on to become the regression tests "it still works",
so there's no need to wait for test downstream to be run and failures to be 
reported back.

We need a test suite which 
* Adds a test-time dependency on iceberg JAR with bulk delete through the 
HadoopFileIO class.
* Runs compliance tests, single/multi delete, complex names, directories, 
missing paths
* Parameterized on single/multi delete enables in s3a, iceberg to use/not use 
bulk delete
* includes IOStats assertions to verify bulk delete was actually used.
* mixes in some local file:// files to so as to validate multiple stores with 
different page sizes.

I had started this within HADOOP-19385, with iceberg jar one of the formats and 
the new test module to include the base contract test suite.

But as the iceberg JAR is java17+, it rapidly becomes unworkable.

Instead, it will all go into s3a with a new java17 profile which will
* add iceberg jar dependency
* add a new src/test/java17 test source tree.
* contain a minimal abstract base test
* s3a implementation

Once Hadoop is java17 then it can be moved into to the main branch. 

Note also: until iceberg actually ships with the PR in, this cannot be
merged.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to