Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
> ...then I still expect the directory /foo to exist Right, but if that is the sole purpose of empty directory markers, I'm curious if there was an attempt at keeping track of the prefixes/directories locally? # -- # Aldrin https://github.com/drin/ https://gitla

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Hyunseok Seo
I wonder why S3 (object storage) operates based on file system semantics. Python users are usually data scientists. They might not be familiar with the differences between object storage and file storage. Furthermore, I think there are a lot of pyarrow users. > Avoiding file by file operations so

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Weston Pace
>I think my question is still relevant: no matter what semantics `S3FileSystem` is trying to provide, I'm still not sure how the placeholder object helps. I assume it's for listing objects, but what else? If I have a local filesystem and I delete a file /foo/bar then I still expect the directory /

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
But I think the issue being addressed [1] is essentially, "`delete_file` shouldn't create additional files/directories in S3." I think discussion about the semantics at large is interesting but may be a digression? Also, I think there are varying degrees of "filesystem semantics" that are even

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Raphael Taylor-Davies
Many people are familiar with object stores these days. You could create a new abstraction `ObjectStore` which is very similar to `FileSystem` except the semantics are object store semantics and not filesystem semantics. FWIW in the Arrow Rust ecosystem we only provide an object store abstractio

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Weston Pace
> The markers are necessary to offer file system semantics on top of object > stores. You will get a ton of subtle bugs otherwise. Yes, object stores and filesystems are different. If you expect your filesystem to act like a filesystem then these things need to be done in order to avoid these bug

Re: [VOTE] Release Apache Arrow 17.0.0 - RC2

2024-07-12 Thread Sam Albers
Hi all, We have also generated a release report between 16.1.0 and 17.0.0 - RC2 which is available here [1]. Note: Some folks looking at Conbench benchmark results may notice that we are now benchmarking exclusively on cloud machines. These cloud machines provide comparable environments in which

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Aldrin
Hello! This may be naive, but why does the empty directory marker need to exist on the S3 side at all? If a local directory is created (because filesystem semantics), then I am not sure why a fake object needs to exist on the object-store side. # -- # Aldrin h

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Felipe Oliveira Carvalho
Hi, The markers are necessary to offer file system semantics on top of object stores. You will get a ton of subtle bugs otherwise. If instead of arrow::FileSystem, Arrow offered an arrow::ObjectStore interface that wraps local filesystems and object stores with object-store semantics (i.e. no con

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Hyunseok Seo
Hello. Thank you for your feedback!! > In which situation does this make a sizable difference in number of > requests? The issue I am addressing does not completely resolve the problem, but there is also the problem caused by *EnsureParentExists* as described in [2]. *The 42,129 requests with t

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Antoine Pitrou
Hi, Le 12/07/2024 à 12:21, Hyunseok Seo a écrit : *### Why Maintain Empty Directory Markers?* From what I understand, object stores like S3 do not have a concept of directories. The motivation behind maintaining these markers could be to manage the object store as if it were a traditional fi

[DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Hyunseok Seo
Hello. community! I am currently working on addressing the issue described in [[C++] Add option to not create parent directory with S3 delete_file]( https://github.com/apache/arrow/issues/36275). In this process, I have found it necessary to gather feedback on how to best resolve this issue. Below

Re: [VOTE] Release Apache Arrow 17.0.0 - RC2

2024-07-12 Thread Raúl Cumplido
Hi, There has been an issue identified while verifying the wheels on Linux with conda due to a test failure with ORC, this can be solved by installing tzdata to the conda environment applying this minor change to the verification script [1]. Thanks, Raul [1] https://github.com/apache/arrow/pull/

[VOTE] Release Apache Arrow 17.0.0 - RC2

2024-07-12 Thread Raúl Cumplido
Hi, I would like to propose the following release candidate (RC2) of Apache Arrow version 17.0.0. This is a release consisting of 321 resolved GitHub issues[1]. This release candidate is based on commit: 6a2e19a852b367c72d7b12da4d104456491ed8b7 [2] The source release rc2 is hosted at [3]. The bi