[ https://issues.apache.org/jira/browse/HADOOP-19229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-19229. ------------------------------------- Fix Version/s: 3.4.2 Release Note: The thresholds at which adjacent vector IO read ranges are coalesced into a single range has been increased, as has the limit at which point they are considered large enough that parallel reads are faster. * The min/max for local filesystems and any other FS without custom support are now 16K and 1M * s3a and abfs use 128K as the minimum size, 2M for max. These values are based on the Facebook Velox paper which stated their thresholds for merging were 20K for local SSD and 500K for cloud storage Resolution: Fixed > Vector IO on cloud storage: what is a good minimum seek size? > ------------------------------------------------------------- > > Key: HADOOP-19229 > URL: https://issues.apache.org/jira/browse/HADOOP-19229 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.4.1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > Labels: pull-request-available > Fix For: 3.4.2 > > > vector iO has a max size to coalesce ranges, but it also needs a maximum gap > between ranges to justify the merge. Right now we could have a read where two > vectors of size 8 bytes can be merged with a 1 MB gap between them -and > that's wasteful. > We could also consider an "efficiency" metric which looks at the ratio of > bytes-read to bytes-discarded. Not sure what we'd do with it, but we could > track it as an IOStat -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org