Thanks Raphael,

Do you have a reference which explains the rationale for that separation?
It's not obvious to me what the priorities are.

I can guess that a URI without secrets might be shared between multiple
users,
and their individual tokens etc inserted to grant distinct access. However
for that
case it seems to me that there wouldn't be a significant difference between

    FileSystem.from_uri(uri, **extra_options_and_secrets)
    FileSystem.from_uri(uri_template.format(**extra_options_and_secrets))


On Wed, Apr 9, 2025 at 9:44 AM Raphael Taylor-Davies
<r.taylordav...@googlemail.com.invalid> wrote:

> I'm not all that familiar with the C++ filesystem abstraction, but for
> ObjectStore, the closest equivalent abstraction in the Rust ecosystem,
> we follow what fsspec [1] and Hadoop [2] do and allow providing a set of
> key-value string pairs along with the URI [3]. This provides a great
> deal of flexibility to end-users as to where and how to source this
> configuration, including potentially fetching secrets from other sources
> or the environment.
>
> [1]:
> https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.filesystem
> [2]:
>
> https://hadoop.apache.org/docs/r3.0.0/api/org/apache/hadoop/fs/FileSystem.html#get-java.net.URI-org.apache.hadoop.conf.Configuration-
> [3]:
> https://docs.rs/object_store/latest/object_store/fn.parse_url_opts.html
>
> On 09/04/2025 15:35, Benjamin Kietzman wrote:
> > I have been working on modularizing the C++ library by extending
> FileSystem
> > construction from URIs. I recently merged a PR which prompted some
> > discussion [1] of how the library should handle secrets.
> >
> > Some FileSystems cannot be constructed without one or more secrets. For
> > example, an S3FileSystem might require a proxy's username and password in
> > order to configure the client which the S3FileSystem wraps. Since the
> > usefulness of S3 and other filesystems which may only use default
> > credentials is very limited, I think it's safe to say that any interface
> > for construction of filesystems must accept secrets as parameters.
> >
> > In the C++ library and its bindings, FileSystems can be constructed from
> a
> > URI. This modular interface means that libarrow can construct an
> > S3FileSystem even without being compiled with/linked to the AWS SDK.
> Since
> > URIs must be complete specifications of a filesystem, this necessitates
> > inclusion of the secrets required by S3 in the URI. Since anyone with a
> URI
> > has access to the filesystem to which it refers, these filesystem URIs
> are
> > transitively secret.
> >
> > This can and should be better documented, but first we should discuss
> > whether URIs-which-are-secrets is an acceptable interface. As a minimal
> > example of an alternative design, we could extend the FileSystemFactory
> > interface, allowing URIs to reference secrets registered by name
> elsewhere:
> > "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New
> secrets
> > may be added like GetSecretRegistry()->AddSecret({.key =
> > "my-s3-secret-key", .secret = "sw0rdf1sh"});)
> >
> > Is explicit out-of-URI secret management necessary, or is it sufficient
> to
> > document that since filesystem URIs represent access to their referent
> they
> > must be guarded accordingly?
> >
> > Ben Kietzman
> > [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077
> >
>

Reply via email to