[jira] [Created] (HADOOP-16950) Extend Hadoop S3a access from single endpoint to multiple endpoints

Ocean Lua (Jira) Tue, 31 Mar 2020 01:54:54 -0700

Ocean Lua created HADOOP-16950:
----------------------------------

             Summary: Extend Hadoop S3a access from single endpoint to multiple 
endpoints
                 Key: HADOOP-16950
                 URL: https://issues.apache.org/jira/browse/HADOOP-16950
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs/s3
    Affects Versions: 3.1.3
            Reporter: Ocean Lua



The client API of Hadoop aws can only support a single endpoint to access. 
However, there are multiple endpoints in object storage (such as ceph), and 
therefore the storage resources could not be fully used. To address the issue, 
we create a new Implementation of S3AFileSystem, which support multi-endpoint 
access. After the optimization, system performance will increase significantly.
        
Usage:
1.Ensure hadoop-aws API availiable.
2.Copy hadoop-aws-3.1.1.jar and aws-java-sdk-bundle-1.11.271.jar to directory 
share/hadoop/common/lib in hadoop (hadoop-aws-3.1.1.jar and 
aws-java-sdk-bundle-1.11.271.jar are normally located at directory 
share/hadoop/tools/lib).
3.In file etc/hadoop/hadoop-env.sh, add the following:
export HADOOP_CLASSPATH=/(hadoop root 
directory)/share/hadoop/common/lib/hadoop-aws-3.1.1.jar:/(hadoop root 
directory)/share/hadoop/common/lib/hadoop-aws-3.1.3.jar:$HADOOP_CLASSPATH
4.Edit configuration file "core-site.xml" and set properties below:
  <property>
    <name>fs.s3a.s3.client.factory.impl</name>
    <value>org.apache.hadoop.fs.s3a.MultiAddrS3ClientFactory</value>
  </property>
  <property>
        <name>fs.s3a.endpoint</name>
        <value>http://addr1:port1,http://addr2:port2,...</value>
  </property>
5.Optional configuration in "core-site.xml":
    <property>
                <name>fs.s3a.S3ClientSelector.class</name>
                <value>org.apache.hadoop.fs.s3a.RandomS3ClientSelector</value>
        </property>
        This configuration is used to set the s3a service selection policy. The 
default value is org.apache.hadoop.fs.s3a.RandomS3ClientSelector, which is a 
completely random selector. The configuration can be set to  
org.apache.hadoop.fs.s3a.PathS3ClientSelector, which is a selector according to 
the file path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-16950) Extend Hadoop S3a access from single endpoint to multiple endpoints

Reply via email to