Hi Ben, would you be able to elaborate on this part:

> Since URIs must be complete specifications of a filesystem, this necessitates 
> inclusion of the secrets required by S3 in the URI. Since anyone with a URI 
> has access to the filesystem to which it refers, these filesystem URIs are 
> transitively secret.

Why must URIs be the complete specification of a filesystem? Why does
having a URI confer access to the resource, or does that point just
follow from your previous?

On Wed, Apr 9, 2025 at 7:37 AM Benjamin Kietzman <bengil...@gmail.com> wrote:
>
> I have been working on modularizing the C++ library by extending FileSystem
> construction from URIs. I recently merged a PR which prompted some
> discussion [1] of how the library should handle secrets.
>
> Some FileSystems cannot be constructed without one or more secrets. For
> example, an S3FileSystem might require a proxy's username and password in
> order to configure the client which the S3FileSystem wraps. Since the
> usefulness of S3 and other filesystems which may only use default
> credentials is very limited, I think it's safe to say that any interface
> for construction of filesystems must accept secrets as parameters.
>
> In the C++ library and its bindings, FileSystems can be constructed from a
> URI. This modular interface means that libarrow can construct an
> S3FileSystem even without being compiled with/linked to the AWS SDK. Since
> URIs must be complete specifications of a filesystem, this necessitates
> inclusion of the secrets required by S3 in the URI. Since anyone with a URI
> has access to the filesystem to which it refers, these filesystem URIs are
> transitively secret.
>
> This can and should be better documented, but first we should discuss
> whether URIs-which-are-secrets is an acceptable interface. As a minimal
> example of an alternative design, we could extend the FileSystemFactory
> interface, allowing URIs to reference secrets registered by name elsewhere:
> "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New secrets
> may be added like GetSecretRegistry()->AddSecret({.key =
> "my-s3-secret-key", .secret = "sw0rdf1sh"});)
>
> Is explicit out-of-URI secret management necessary, or is it sufficient to
> document that since filesystem URIs represent access to their referent they
> must be guarded accordingly?
>
> Ben Kietzman
> [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077

Reply via email to