Steve Loughran created HADOOP-18651:
---------------------------------------

             Summary: Add "versions" tool to s3a command line entry point
                 Key: HADOOP-18651
                 URL: https://issues.apache.org/jira/browse/HADOOP-18651
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/s3
    Affects Versions: 3.3.9
            Reporter: Steve Loughran


having just implemented some version command support in the cloudstore jar, I 
can see benefit in actually implementing it in hadoop-aws module

https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/versioned-objects.md

https://github.com/steveloughran/cloudstore/blob/trunk/src/main/extra/org/apache/hadoop/fs/s3a/extra/)
 

this code
* uses v1 sdk by asking the s3a fs for it; this will break with the move to v2 
sdk
* doesn't have any tests
* doesn't have any review, maintenance plan
* bypasses audit log/referrer header creation

we could just say "use the aws CLI", but there are some benefits in using the 
s3a connector code
* support for s3a:// urls
* can use the s3a auth/signing chain (knox, etc)
* plus proxy, region settings etc.
* could integrate with other bits of the stack (e.g spark RDD to get at all 
versions of objects)
* would be really useful to have a tool to purge all directory delete markers 
down a path, to speed up listing on versioned buckets.
* gets bundled everywhere

For use by downstream code we would want to have a public/evolving API to 
access operations, e.g. 

# taking an S3AFileStatus for rename/purge/restore operations
# listing all versions of objects under a path within a given time range and 
mapping to RemoteIterator.
# HADOOP-16387. S3A openFile() options to allow etag/version to be set

Core code straightforward (it takes exactly two days to write, *excluding 
tests*), public API and tests more work.

note, we should also move the entry point to being "s3a" with "s3guard" 
retained for compatibility)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to