Got it, thanks. This reminds me of some of the confusion we saw in the
R bindings [1] with constructing filesystems from options versus a URI
and how you seem to have to pick one path or the other. I think the
type of approach ObjectStore uses where information gets merged is
more flexible and more likely to be what users expect. Currently, I
think users who want to pass extra options to the from-URI
constructors do so with URL query parameters which isn't as ergonomic
as other options.

To your original questions, I don't think an explicit out-of-URI
secrets manager is necessary or a good idea but I wonder if we
couldn't relax our URI-is-the-complete-specification rule,
disallow/ignore secrets in URIs, but allow them as extra parameters.

[1] https://github.com/apache/arrow/issues/33904


On Thu, Apr 10, 2025 at 10:39 AM Benjamin Kietzman <bengil...@gmail.com> wrote:
>
> Hi Bryce,
>
> I meant to say that since the C++ interface for constructing filesystems is
> [1]
>
>     Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string
> &uri);
>
> it follows that the only argument (the uri) must contain all the
> information required.
>
> Certainly alternate interfaces such as the uri-and-kv-pairs which Raphael
> described
> would be possible, sorry for the confusion; I definitely did not mean MUST
> in an rfc2119
> sense.
>
> [1]
> https://arrow.apache.org/docs/cpp/api/filesystem.html#high-level-factory-functions
>
> On Wed, Apr 9, 2025 at 1:09 PM Bryce Mecum <bryceme...@gmail.com> wrote:
>
> > Hi Ben, would you be able to elaborate on this part:
> >
> > > Since URIs must be complete specifications of a filesystem, this
> > necessitates inclusion of the secrets required by S3 in the URI. Since
> > anyone with a URI has access to the filesystem to which it refers, these
> > filesystem URIs are transitively secret.
> >
> > Why must URIs be the complete specification of a filesystem? Why does
> > having a URI confer access to the resource, or does that point just
> > follow from your previous?
> >
> > On Wed, Apr 9, 2025 at 7:37 AM Benjamin Kietzman <bengil...@gmail.com>
> > wrote:
> > >
> > > I have been working on modularizing the C++ library by extending
> > FileSystem
> > > construction from URIs. I recently merged a PR which prompted some
> > > discussion [1] of how the library should handle secrets.
> > >
> > > Some FileSystems cannot be constructed without one or more secrets. For
> > > example, an S3FileSystem might require a proxy's username and password in
> > > order to configure the client which the S3FileSystem wraps. Since the
> > > usefulness of S3 and other filesystems which may only use default
> > > credentials is very limited, I think it's safe to say that any interface
> > > for construction of filesystems must accept secrets as parameters.
> > >
> > > In the C++ library and its bindings, FileSystems can be constructed from
> > a
> > > URI. This modular interface means that libarrow can construct an
> > > S3FileSystem even without being compiled with/linked to the AWS SDK.
> > Since
> > > URIs must be complete specifications of a filesystem, this necessitates
> > > inclusion of the secrets required by S3 in the URI. Since anyone with a
> > URI
> > > has access to the filesystem to which it refers, these filesystem URIs
> > are
> > > transitively secret.
> > >
> > > This can and should be better documented, but first we should discuss
> > > whether URIs-which-are-secrets is an acceptable interface. As a minimal
> > > example of an alternative design, we could extend the FileSystemFactory
> > > interface, allowing URIs to reference secrets registered by name
> > elsewhere:
> > > "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New
> > secrets
> > > may be added like GetSecretRegistry()->AddSecret({.key =
> > > "my-s3-secret-key", .secret = "sw0rdf1sh"});)
> > >
> > > Is explicit out-of-URI secret management necessary, or is it sufficient
> > to
> > > document that since filesystem URIs represent access to their referent
> > they
> > > must be guarded accordingly?
> > >
> > > Ben Kietzman
> > > [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077
> >

Reply via email to