Steve Loughran created HADOOP-18852:
---------------------------------------
Summary: S3ACachingInputStream.ensureCurrentBuffer(): lazy seek
means all reads look like random IO
Key: HADOOP-18852
URL: https://issues.apache.org/jira/browse/HADOOP-18852
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/s3
Affects Versions: 3.3.6
Reporter: Steve Loughran
noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with
separately.
# all seeks are lazy; no fetching is kicked off after an open
# the first read is treated as an out of order read, so cancels any active
reads (don't think there are any) and then only asks for 1 block
{code}
if (outOfOrderRead) {
LOG.debug("lazy-seek({})", getOffsetStr(readPos));
blockManager.cancelPrefetches();
// We prefetch only 1 block immediately after a seek operation.
prefetchCount = 1;
}
{code}
* for any read fully we should prefetch all blocks in the range requested
* for other reads, we may want a bigger prefech count than 1, depending on:
split start/end, file read policy (random, sequential, whole-file)
* also, if a read is in a block other than the current one, but which is
already being fetched or cached, is this really an OOO read to the extent that
outstanding fetches should be cancelled?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]