[ 
https://issues.apache.org/jira/browse/HADOOP-19251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874140#comment-17874140
 ] 

Alkis Evlogimenos commented on HADOOP-19251:
--------------------------------------------

Thank you Steve.
 * path capabilities: while a filesystem can be able to provide rename 
atomicity it might not work in all cases. For local filesystems rename across 
mounts/drives is practically never atomic. This means the call to rename itself 
cannot be atomic. Similarly for `fs.rename.file.fast`: while a cloud can 
provide this capability, in some implementaitons it can only work intra-region.
 * virtual filesystems (delegates) present issues because of lazyness. virtual 
filesystems typically translate one path namespace to another but this 
translation happens late. So if `/my/path/to/file` translates to 
`s3://path/to/file` and `/my/path/to/other-file` tralsnates to 
`gcs://path/to/file` means querying capabilities of `MyFileSystem` is path 
dependent.

Given that we want this check to be dynamic and lazy (needs to happen during 
rename) I feel we need to pass an argument to make it work consistently. 
Changing API to builders is likely the most flexible but also would involve a 
lot of work to change the callsites completely. Adding an option to `rename` 
seems like a low friction change that could get us closer to a safer `rename`.

> Add Options.Rename.THROW_NON_ATOMIC
> -----------------------------------
>
>                 Key: HADOOP-19251
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19251
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 3.3.6
>            Reporter: Alkis Evlogimenos
>            Priority: Major
>
> I propose we add an option `Options.Rename.THROW_NON_ATOMIC` to change 
> `rename()` behavior to throw when the underlying filesystem's rename 
> operation is not atomic.
> This would be useful for callers that expect to perform an atomic op but want 
> to fail if when an atomic rename fails.
>  
> At first this might seem something that can be done by querying capabilities 
> of the filesystem but that would only work on real filesystems. A motivating 
> example would be a virtual filesystem for which paths can resolve to any 
> concrete filesystem (s3, etc). If `rename()` is called with two virtual paths 
> that resolve to different filesystems (s3 and gcs for example) then obviously 
> the operation can't be atomic since bytes must be copied from one fs to 
> another.
>  
> What do you think [~steve_l] ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to