Hi Bryce, I meant to say that since the C++ interface for constructing filesystems is [1]
Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const std::string &uri); it follows that the only argument (the uri) must contain all the information required. Certainly alternate interfaces such as the uri-and-kv-pairs which Raphael described would be possible, sorry for the confusion; I definitely did not mean MUST in an rfc2119 sense. [1] https://arrow.apache.org/docs/cpp/api/filesystem.html#high-level-factory-functions On Wed, Apr 9, 2025 at 1:09 PM Bryce Mecum <bryceme...@gmail.com> wrote: > Hi Ben, would you be able to elaborate on this part: > > > Since URIs must be complete specifications of a filesystem, this > necessitates inclusion of the secrets required by S3 in the URI. Since > anyone with a URI has access to the filesystem to which it refers, these > filesystem URIs are transitively secret. > > Why must URIs be the complete specification of a filesystem? Why does > having a URI confer access to the resource, or does that point just > follow from your previous? > > On Wed, Apr 9, 2025 at 7:37 AM Benjamin Kietzman <bengil...@gmail.com> > wrote: > > > > I have been working on modularizing the C++ library by extending > FileSystem > > construction from URIs. I recently merged a PR which prompted some > > discussion [1] of how the library should handle secrets. > > > > Some FileSystems cannot be constructed without one or more secrets. For > > example, an S3FileSystem might require a proxy's username and password in > > order to configure the client which the S3FileSystem wraps. Since the > > usefulness of S3 and other filesystems which may only use default > > credentials is very limited, I think it's safe to say that any interface > > for construction of filesystems must accept secrets as parameters. > > > > In the C++ library and its bindings, FileSystems can be constructed from > a > > URI. This modular interface means that libarrow can construct an > > S3FileSystem even without being compiled with/linked to the AWS SDK. > Since > > URIs must be complete specifications of a filesystem, this necessitates > > inclusion of the secrets required by S3 in the URI. Since anyone with a > URI > > has access to the filesystem to which it refers, these filesystem URIs > are > > transitively secret. > > > > This can and should be better documented, but first we should discuss > > whether URIs-which-are-secrets is an acceptable interface. As a minimal > > example of an alternative design, we could extend the FileSystemFactory > > interface, allowing URIs to reference secrets registered by name > elsewhere: > > "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New > secrets > > may be added like GetSecretRegistry()->AddSecret({.key = > > "my-s3-secret-key", .secret = "sw0rdf1sh"});) > > > > Is explicit out-of-URI secret management necessary, or is it sufficient > to > > document that since filesystem URIs represent access to their referent > they > > must be guarded accordingly? > > > > Ben Kietzman > > [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077 >