[ https://issues.apache.org/jira/browse/HADOOP-17833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-17833. ------------------------------------- Fix Version/s: 3.3.4 Release Note: S3A filesytem's createFile() operation supports an option to disable all safety checks when creating a file. Consult the documentation and use with care Resolution: Fixed > Improve Magic Committer Performance > ----------------------------------- > > Key: HADOOP-17833 > URL: https://issues.apache.org/jira/browse/HADOOP-17833 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Affects Versions: 3.3.1 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Minor > Labels: pull-request-available > Fix For: 3.3.4 > > Time Spent: 14h > Remaining Estimate: 0h > > Magic committer tasks can be slow because every file created with > overwrite=false triggers a HEAD (verify there's no file) and a LIST (that > there's no dir). And because of delayed manifestations, it may not behave as > expected. > ParquetOutputFormat is one example of a library which does this. > we could fix parquet to use overwrite=true, but (a) there may be surprises in > other uses (b) it'd still leave the list and (c) do nothing for other formats > call > Proposed: createFile() under a magic path to skip all probes for file/dir at > end of path > Only a single task attempt Will be writing to that directory and it should > know what it is doing. If there is conflicting file names and parts across > tasks that won't even get picked up at this point. Oh and none of the > committers ever check for this: you'll get the last file manifested (s3a) or > renamed (file) > If we skip the checks we will save 2 HTTP requests/file. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org