Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-16 Thread Aldrin
Ah, okay. Then, I suppose that an approach between 1 and 2 makes some sense to me: add an option to disable creating the marker on object deletion/removal. I think this alone isn't the best solution but it seems to at least add a mode where creating the marker is more controlled. As an aside, ar

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-16 Thread Antoine Pitrou
Hello Aldrin, It's not either/or, the directory marker is created everytime necessary, for example when CreateDir() is called. Regards Antoine. Le 15/07/2024 à 19:20, Aldrin a écrit : Thanks Antoine! Preserving the property across multiple clients (and presumably across independent ses

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-15 Thread Aldrin
And to clarify, by "other clients" I mean "other remote clients on other systems concurrently accessing the same data." I still think that many cients on a single system could use a local filesystem to gate directory-based operations more efficiently (since a local filesystem is optimized for t

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-15 Thread Aldrin
Thanks Antoine! Preserving the property across multiple clients (and presumably across independent sessions of the same client) is the part that I was missing. >From the link you shared, I saw an aws page discussing the use of folders in >the s3 console [1]. Their approach is to create the mark

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-15 Thread Antoine Pitrou
No, because these markers also communicate the information to other implementations of S3 abstractions. An example of this is: https://docs.cyberduck.io/protocols/s3/#folders Regards Antoine. Le 13/07/2024 à 07:15, Aldrin a écrit : ...then I still expect the directory /foo to exist Rig

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
> ...then I still expect the directory /foo to exist Right, but if that is the sole purpose of empty directory markers, I'm curious if there was an attempt at keeping track of the prefixes/directories locally? # -- # Aldrin https://github.com/drin/ https://gitla

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Hyunseok Seo
I wonder why S3 (object storage) operates based on file system semantics. Python users are usually data scientists. They might not be familiar with the differences between object storage and file storage. Furthermore, I think there are a lot of pyarrow users. > Avoiding file by file operations so

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Weston Pace
>I think my question is still relevant: no matter what semantics `S3FileSystem` is trying to provide, I'm still not sure how the placeholder object helps. I assume it's for listing objects, but what else? If I have a local filesystem and I delete a file /foo/bar then I still expect the directory /

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
But I think the issue being addressed [1] is essentially, "`delete_file` shouldn't create additional files/directories in S3." I think discussion about the semantics at large is interesting but may be a digression? Also, I think there are varying degrees of "filesystem semantics" that are even

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Raphael Taylor-Davies
Many people are familiar with object stores these days. You could create a new abstraction `ObjectStore` which is very similar to `FileSystem` except the semantics are object store semantics and not filesystem semantics. FWIW in the Arrow Rust ecosystem we only provide an object store abstractio

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Weston Pace
> The markers are necessary to offer file system semantics on top of object > stores. You will get a ton of subtle bugs otherwise. Yes, object stores and filesystems are different. If you expect your filesystem to act like a filesystem then these things need to be done in order to avoid these bug

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
Hello! This may be naive, but why does the empty directory marker need to exist on the S3 side at all? If a local directory is created (because filesystem semantics), then I am not sure why a fake object needs to exist on the object-store side. # -- # Aldrin h

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Felipe Oliveira Carvalho
Hi, The markers are necessary to offer file system semantics on top of object stores. You will get a ton of subtle bugs otherwise. If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore interface that wraps local filesystems and object stores with object-store semantics (i.e. no con

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Hyunseok Seo
Hello. Thank you for your feedback!! > In which situation does this make a sizable difference in number of > requests? The issue I am addressing does not completely resolve the problem, but there is also the problem caused by *EnsureParentExists* as described in [2]. *The 42,129 requests with t

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Antoine Pitrou
Hi, Le 12/07/2024 à 12:21, Hyunseok Seo a écrit : *### Why Maintain Empty Directory Markers?* From what I understand, object stores like S3 do not have a concept of directories. The motivation behind maintaining these markers could be to manage the object store as if it were a traditional fi

[DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Hyunseok Seo
Hello. community! I am currently working on addressing the issue described in [[C++] Add option to not create parent directory with S3 delete_file]( https://github.com/apache/arrow/issues/36275). In this process, I have found it necessary to gather feedback on how to best resolve this issue. Below