Hi Bryce,

I meant to say that since the C++ interface for constructing filesystems is
[1]

    Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string
&uri);

it follows that the only argument (the uri) must contain all the
information required.

Certainly alternate interfaces such as the uri-and-kv-pairs which Raphael
described
would be possible, sorry for the confusion; I definitely did not mean MUST
in an rfc2119
sense.

[1]
https://arrow.apache.org/docs/cpp/api/filesystem.html#high-level-factory-functions

On Wed, Apr 9, 2025 at 1:09 PM Bryce Mecum <bryceme...@gmail.com> wrote:

> Hi Ben, would you be able to elaborate on this part:
>
> > Since URIs must be complete specifications of a filesystem, this
> necessitates inclusion of the secrets required by S3 in the URI. Since
> anyone with a URI has access to the filesystem to which it refers, these
> filesystem URIs are transitively secret.
>
> Why must URIs be the complete specification of a filesystem? Why does
> having a URI confer access to the resource, or does that point just
> follow from your previous?
>
> On Wed, Apr 9, 2025 at 7:37 AM Benjamin Kietzman <bengil...@gmail.com>
> wrote:
> >
> > I have been working on modularizing the C++ library by extending
> FileSystem
> > construction from URIs. I recently merged a PR which prompted some
> > discussion [1] of how the library should handle secrets.
> >
> > Some FileSystems cannot be constructed without one or more secrets. For
> > example, an S3FileSystem might require a proxy's username and password in
> > order to configure the client which the S3FileSystem wraps. Since the
> > usefulness of S3 and other filesystems which may only use default
> > credentials is very limited, I think it's safe to say that any interface
> > for construction of filesystems must accept secrets as parameters.
> >
> > In the C++ library and its bindings, FileSystems can be constructed from
> a
> > URI. This modular interface means that libarrow can construct an
> > S3FileSystem even without being compiled with/linked to the AWS SDK.
> Since
> > URIs must be complete specifications of a filesystem, this necessitates
> > inclusion of the secrets required by S3 in the URI. Since anyone with a
> URI
> > has access to the filesystem to which it refers, these filesystem URIs
> are
> > transitively secret.
> >
> > This can and should be better documented, but first we should discuss
> > whether URIs-which-are-secrets is an acceptable interface. As a minimal
> > example of an alternative design, we could extend the FileSystemFactory
> > interface, allowing URIs to reference secrets registered by name
> elsewhere:
> > "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New
> secrets
> > may be added like GetSecretRegistry()->AddSecret({.key =
> > "my-s3-secret-key", .secret = "sw0rdf1sh"});)
> >
> > Is explicit out-of-URI secret management necessary, or is it sufficient
> to
> > document that since filesystem URIs represent access to their referent
> they
> > must be guarded accordingly?
> >
> > Ben Kietzman
> > [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077
>

Reply via email to