This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 43603dc3ed95b0fcfb4407cdb01ca94fd9c9c76d
Author: Joe McDonnell <[email protected]>
AuthorDate: Fri Jan 9 11:25:16 2026 -0800

    IMPALA-14298: Add documentation for intermediate results caching
    
    This adds basic documentation about enabling the intermediate
    results caching feature.
    
    Tests:
     - Built PDF, asf-site-html, and plain-html
    
    Change-Id: I2e08c91a694f1d333bb903b105623fb73efc3a2e
    Reviewed-on: http://gerrit.cloudera.org:8080/23846
    Tested-by: Impala Public Jenkins <[email protected]>
    Reviewed-by: Peter Rozsa <[email protected]>
---
 docs/impala.ditamap                               |  1 +
 docs/topics/impala_intermediate_results_cache.xml | 87 +++++++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 8a0f96292..88854d59a 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -324,6 +324,7 @@ under the License.
     <topicref href="topics/impala_data_cache.xml"/>
     <topicref href="topics/impala_perf_testing.xml"/>
     <topicref href="topics/impala_explain_plan.xml"/>
+    <topicref href="topics/impala_intermediate_results_cache.xml"/>
   </topicref>
   <topicref href="topics/impala_scalability.xml">
     <topicref href="topics/impala_scaling_limits.xml"/>
diff --git a/docs/topics/impala_intermediate_results_cache.xml 
b/docs/topics/impala_intermediate_results_cache.xml
new file mode 100644
index 000000000..bbc44c1a7
--- /dev/null
+++ b/docs/topics/impala_intermediate_results_cache.xml
@@ -0,0 +1,87 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="intermediate_results_cache">
+
+  <title>Intermediate Results Cache</title>
+
+  <conbody>
+
+    <p>
+      In Impala, query execution always starts from scratch, computing
+      intermediate results in several stages to produce the final results.
+      These intermediate results are discarded at the end of query execution,
+      so the computation must be repeated for a new run of the query even
+      if none of the underlying data has changed. Caching intermediate results
+      can improve the latency for repetitive work while also freeing up
+      resources for other queries.
+    </p>
+
+    <p>
+      The intermediate results cache is enabled via the following 
configurations:
+      <ul>
+        <li>
+          <codeph>--allow_tuple_caching</codeph> is a startup flag that gates
+          the intermediate results caching feature. It must be set to true on 
coordinators
+          and executors to allow the use of the intermediate results cache, 
but it does
+          not enable the cache by itself.
+        </li>
+        <li>
+          The <codeph>--tuple_cache</codeph> startup flag specifies the storage
+          directory and quota for the intermediate results cache on 
coordinators and
+          executors. The flag is set to a directory name followed by a 
<codeph>:</codeph>
+          and a capacity for that directory. For example:
+          <codeblock>--tuple_cache=/data/cache:20GB</codeblock>
+          This setting uses the <codeph>/data/cache</codeph> directory and 
allows the
+          cache to consume up to 20GB in that directory. The directory must 
exist in the
+          local filesystem of each Impala Daemon, or Impala will fail to start.
+        </li>
+        <li>
+          The <codeph>enable_tuple_caching</codeph> query option determines 
whether a
+          query uses the intermediate results cache. To use the feature, this 
must be
+          set to true via the session or 
<codeph>default_query_options</codeph>.
+        </li>
+      </ul>
+      All three of these settings must be specified to use the intermediate 
results cache.
+      The default value for all three configurations is for the feature to be 
disabled.
+    </p>
+
+    <p>
+      The cache key incorporates information about all the settings that can 
impact the
+      query results, including information about the base tables and any query 
options.
+      When any of those settings change, it results in a new cache entry.
+      For example, if new data is ingested into a base table, the key will 
change. This
+      means that there is no need for an administrator to manually refresh or 
invalidate
+      the cache entries.
+    </p>
+
+    <p>
+      When the cache reaches the quota, cache entries are evicted to make 
space for new
+      entries. The cache eviction policy can be specified by the
+      <codeph>--tuple_cache_eviction_policy</codeph> startup flag. Currently, 
the cache
+      supports the following cache eviction policies:
+      <ul>
+        <li>LRU (Least Recently Used--the default)</li>
+        <li>LIRS (Least Inter-reference Recency Set)</li>
+      </ul>
+      LIRS is a scan-resistant, low performance-overhead policy.
+    </p>
+  </conbody>
+</concept>

Reply via email to