Got it, thanks. This reminds me of some of the confusion we saw in the R bindings [1] with constructing filesystems from options versus a URI and how you seem to have to pick one path or the other. I think the type of approach ObjectStore uses where information gets merged is more flexible and more likely to be what users expect. Currently, I think users who want to pass extra options to the from-URI constructors do so with URL query parameters which isn't as ergonomic as other options.
To your original questions, I don't think an explicit out-of-URI secrets manager is necessary or a good idea but I wonder if we couldn't relax our URI-is-the-complete-specification rule, disallow/ignore secrets in URIs, but allow them as extra parameters. [1] https://github.com/apache/arrow/issues/33904 On Thu, Apr 10, 2025 at 10:39 AM Benjamin Kietzman <bengil...@gmail.com> wrote: > > Hi Bryce, > > I meant to say that since the C++ interface for constructing filesystems is > [1] > > Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string > &uri); > > it follows that the only argument (the uri) must contain all the > information required. > > Certainly alternate interfaces such as the uri-and-kv-pairs which Raphael > described > would be possible, sorry for the confusion; I definitely did not mean MUST > in an rfc2119 > sense. > > [1] > https://arrow.apache.org/docs/cpp/api/filesystem.html#high-level-factory-functions > > On Wed, Apr 9, 2025 at 1:09 PM Bryce Mecum <bryceme...@gmail.com> wrote: > > > Hi Ben, would you be able to elaborate on this part: > > > > > Since URIs must be complete specifications of a filesystem, this > > necessitates inclusion of the secrets required by S3 in the URI. Since > > anyone with a URI has access to the filesystem to which it refers, these > > filesystem URIs are transitively secret. > > > > Why must URIs be the complete specification of a filesystem? Why does > > having a URI confer access to the resource, or does that point just > > follow from your previous? > > > > On Wed, Apr 9, 2025 at 7:37 AM Benjamin Kietzman <bengil...@gmail.com> > > wrote: > > > > > > I have been working on modularizing the C++ library by extending > > FileSystem > > > construction from URIs. I recently merged a PR which prompted some > > > discussion [1] of how the library should handle secrets. > > > > > > Some FileSystems cannot be constructed without one or more secrets. For > > > example, an S3FileSystem might require a proxy's username and password in > > > order to configure the client which the S3FileSystem wraps. Since the > > > usefulness of S3 and other filesystems which may only use default > > > credentials is very limited, I think it's safe to say that any interface > > > for construction of filesystems must accept secrets as parameters. > > > > > > In the C++ library and its bindings, FileSystems can be constructed from > > a > > > URI. This modular interface means that libarrow can construct an > > > S3FileSystem even without being compiled with/linked to the AWS SDK. > > Since > > > URIs must be complete specifications of a filesystem, this necessitates > > > inclusion of the secrets required by S3 in the URI. Since anyone with a > > URI > > > has access to the filesystem to which it refers, these filesystem URIs > > are > > > transitively secret. > > > > > > This can and should be better documented, but first we should discuss > > > whether URIs-which-are-secrets is an acceptable interface. As a minimal > > > example of an alternative design, we could extend the FileSystemFactory > > > interface, allowing URIs to reference secrets registered by name > > elsewhere: > > > "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New > > secrets > > > may be added like GetSecretRegistry()->AddSecret({.key = > > > "my-s3-secret-key", .secret = "sw0rdf1sh"});) > > > > > > Is explicit out-of-URI secret management necessary, or is it sufficient > > to > > > document that since filesystem URIs represent access to their referent > > they > > > must be guarded accordingly? > > > > > > Ben Kietzman > > > [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077 > >