Csaba Ringhofer created IMPALA-13878: ----------------------------------------
Summary: Prioritize recently read files in scan range order Key: IMPALA-13878 URL: https://issues.apache.org/jira/browse/IMPALA-13878 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Csaba Ringhofer Currently in a query each host computes a fixed scan range order and fragment instances/scanner threads pop from this queue (IMPALA-11539). The order only considers scan range size (preferring larger) and HDFS caching. Giving preference to recently read files could improve the efficiency of caches involved in IO (OS's IO cache, remote data cache, file handle cache). E.g. assuming LRU if a query reads file f1 ... f100 (in this order), and only 10 files with to cache, then when repeating the query it is more efficient to read in the opposite order to read the last 10 files first that are still in cache. Ideally the order would be dynamic as parallel queries may read a file making it "hot" even if it was cold during planning. -- This message was sent by Atlassian Jira (v8.20.10#820010)