This is an automated email from the ASF dual-hosted git repository. stigahuang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/impala.git
commit fa64be7cc7074f201fff1eccc9cbf19520a19c55 Author: Riza Suminto <[email protected]> AuthorDate: Thu Feb 23 16:05:31 2023 -0800 IMPALA-11940: [DOCS] Document manifest caching settings for Iceberg IMPALA-11658 implements Iceberg manifest caching for Impala. This patch adds documentation for configuring the cache(s). Testing: - Built docs locally Change-Id: Idd761a81f5c81a25a5ec0889402f85157c23e9fe Reviewed-on: http://gerrit.cloudera.org:8080/19530 Reviewed-by: Daniel Becker <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Zoltan Borok-Nagy <[email protected]> --- docs/topics/impala_iceberg.xml | 60 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) diff --git a/docs/topics/impala_iceberg.xml b/docs/topics/impala_iceberg.xml index 32363f3de..62abca615 100644 --- a/docs/topics/impala_iceberg.xml +++ b/docs/topics/impala_iceberg.xml @@ -606,4 +606,64 @@ ALTER TABLE ice_tbl EXECUTE expire_snapshots(now() - interval 5 days); </p> </conbody> </concept> + + <concept id="iceberg_manifest_caching"> + <title>Iceberg manifest caching</title> + <conbody> + <p> + Starting from version 1.1.0, Apache Iceberg provides a mechanism to cache the + contents of Iceberg manifest files in memory. This manifest caching feature helps + to reduce repeated reads of small Iceberg manifest files from remote storage by + Coordinators and Catalogd. This feature can be enabled for Impala Coordinators and + Catalogd by setting properties in Hadoop's core-site.xml as in the following: + <codeblock> +iceberg.io-impl=org.apache.iceberg.hadoop.HadoopFileIO; +iceberg.io.manifest.cache-enabled=true; +iceberg.io.manifest.cache.max-total-bytes=104857600; +iceberg.io.manifest.cache.expiration-interval-ms=3600000; +iceberg.io.manifest.cache.max-content-length=8388608; + </codeblock> + </p> + <p> + The description of each property is as follows: + <ul> + <li> + <codeph>iceberg.io-impl</codeph>: custom FileIO implementation to use in a + catalog. Must be set to enable manifest caching. Impala defaults to + HadoopFileIO. It is recommended to not change this to other than HadoopFileIO. + </li> + <li> + <codeph>iceberg.io.manifest.cache-enabled</codeph>: enable/disable the + manifest caching feature. + </li> + <li> + <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>: maximum total + amount of bytes to cache in the manifest cache. Must be a positive value. + </li> + <li> + <codeph>iceberg.io.manifest.cache.expiration-interval-ms</codeph>: maximum + duration for which an entry stays in the manifest cache. Must be a + non-negative value. Setting zero means cache entries expire only if it gets + evicted due to memory pressure from + <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>. + </li> + <li> + <codeph>iceberg.io.manifest.cache.max-content-length</codeph>: maximum length + of a manifest file to be considered for caching in bytes. Manifest files with + a length exceeding this property value will not be cached. Must be set with a + positive value and lower than + <codeph>iceberg.io.manifest.cache.max-total-bytes</codeph>. + </li> + </ul> + </p> + <p> + Manifest caching only works for tables that are loaded with either of + HadoopCatalogs or HiveCatalogs. Individual HadoopCatalog and HiveCatalog will have + separate manifest caches with the same configuration. By default, only 8 catalogs + can have their manifest cache active in memory. This number can be raised by + setting a higher value in the java system property + <codeph>iceberg.io.manifest.cache.fileio-max</codeph>. + </p> + </conbody> + </concept> </concept>
