[
https://issues.apache.org/jira/browse/HADOOP-19251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874140#comment-17874140
]
Alkis Evlogimenos commented on HADOOP-19251:
--------------------------------------------
Thank you Steve.
* path capabilities: while a filesystem can be able to provide rename
atomicity it might not work in all cases. For local filesystems rename across
mounts/drives is practically never atomic. This means the call to rename itself
cannot be atomic. Similarly for `fs.rename.file.fast`: while a cloud can
provide this capability, in some implementaitons it can only work intra-region.
* virtual filesystems (delegates) present issues because of lazyness. virtual
filesystems typically translate one path namespace to another but this
translation happens late. So if `/my/path/to/file` translates to
`s3://path/to/file` and `/my/path/to/other-file` tralsnates to
`gcs://path/to/file` means querying capabilities of `MyFileSystem` is path
dependent.
Given that we want this check to be dynamic and lazy (needs to happen during
rename) I feel we need to pass an argument to make it work consistently.
Changing API to builders is likely the most flexible but also would involve a
lot of work to change the callsites completely. Adding an option to `rename`
seems like a low friction change that could get us closer to a safer `rename`.
> Add Options.Rename.THROW_NON_ATOMIC
> -----------------------------------
>
> Key: HADOOP-19251
> URL: https://issues.apache.org/jira/browse/HADOOP-19251
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 3.3.6
> Reporter: Alkis Evlogimenos
> Priority: Major
>
> I propose we add an option `Options.Rename.THROW_NON_ATOMIC` to change
> `rename()` behavior to throw when the underlying filesystem's rename
> operation is not atomic.
> This would be useful for callers that expect to perform an atomic op but want
> to fail if when an atomic rename fails.
>
> At first this might seem something that can be done by querying capabilities
> of the filesystem but that would only work on real filesystems. A motivating
> example would be a virtual filesystem for which paths can resolve to any
> concrete filesystem (s3, etc). If `rename()` is called with two virtual paths
> that resolve to different filesystems (s3 and gcs for example) then obviously
> the operation can't be atomic since bytes must be copied from one fs to
> another.
>
> What do you think [~steve_l] ?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]