Maxim Martynov created HADOOP-18838:
---------------------------------------

             Summary: Some fs.s3a.* config values are different in sources and 
documentation
                 Key: HADOOP-18838
                 URL: https://issues.apache.org/jira/browse/HADOOP-18838
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.3.6
            Reporter: Maxim Martynov


For config option {{fs.s3a.retry.throttle.interval}} default value in source 
code is {{500ms}}:
{code:java}
public static final String RETRY_THROTTLE_INTERVAL_DEFAULT = "500ms";
{code}
https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java#L921

In {{core-default.xml}} it has value {{100ms}}:
{code:xml}
<property>
  <name>fs.s3a.retry.throttle.interval</name>
  <value>100ms</value>
  <description>
    Initial between retry attempts on throttled requests, +/- 50%. chosen at 
random.
    i.e. for an intial value of 3000ms, the initial delay would be in the range 
1500ms to 4500ms.
    Backoffs are exponential; again randomness is used to avoid the thundering 
heard problem.
    500ms is the default value used by the AWS S3 Retry policy.
  </description>
</property>
{code}
https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml#L1750
This change introduced in HADOOP-16823.

In Hadoop-AWS module index it has value {{1000ms}}:
{code:xml}
<property>
  <name>fs.s3a.retry.throttle.interval</name>
  <value>1000ms</value>
  <description>
    Interval between retry attempts on throttled requests.
  </description>
</property>
{code}
https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md?plain=1#L1223
File was created in HADOOP-13786, and value is left unchanged since when.

In performance tuning page it has up-to-date value {{500ms}}:
{code:xml}
<property>
  <name>fs.s3a.retry.throttle.interval</name>
  <value>500ms</value>
  <description>
    Interval between retry attempts on throttled requests.
  </description>
</property>
{code}
https://github.com/apache/hadoop/blob/rel/release-3.3.6/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md?plain=1#L435

The same issue with:
* {{fs.s3a.retry.throttle.limit}} - in source code it has value {{20}}, but in 
some documents still old value ${fs.s3a.attempts.maximum}
* {{fs.s3a.connection.establish.timeout}} - in source code it has value 
{{50_000}}, in config file & documentation {{5_000}}
* {{fs.s3a.attempts.maximum}} - in source code it has value {{10}}, in config 
file & documentation {{20}}
* {{fs.s3a.threads.max} - in source & documentation code it has value {{10}}, 
in config file {{64}}
* {{fs.s3a.max.total.tasks}} - in source code & config it has value {{32}}, in 
documentation {{5}}
* {{fs.s3a.connection.maximum}} - in source code & config it has value {{96}}, 
in documentation {{15}} or {{30}}

Please sync these values, outdated documentation is very painful to work with.
As an idea, is it possible to use {{core-default.xml}} directly in 
documentation, or generate this documentation from docstrings in Java code?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to