I have been working on modularizing the C++ library by extending FileSystem construction from URIs. I recently merged a PR which prompted some discussion [1] of how the library should handle secrets.
Some FileSystems cannot be constructed without one or more secrets. For example, an S3FileSystem might require a proxy's username and password in order to configure the client which the S3FileSystem wraps. Since the usefulness of S3 and other filesystems which may only use default credentials is very limited, I think it's safe to say that any interface for construction of filesystems must accept secrets as parameters. In the C++ library and its bindings, FileSystems can be constructed from a URI. This modular interface means that libarrow can construct an S3FileSystem even without being compiled with/linked to the AWS SDK. Since URIs must be complete specifications of a filesystem, this necessitates inclusion of the secrets required by S3 in the URI. Since anyone with a URI has access to the filesystem to which it refers, these filesystem URIs are transitively secret. This can and should be better documented, but first we should discuss whether URIs-which-are-secrets is an acceptable interface. As a minimal example of an alternative design, we could extend the FileSystemFactory interface, allowing URIs to reference secrets registered by name elsewhere: "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New secrets may be added like GetSecretRegistry()->AddSecret({.key = "my-s3-secret-key", .secret = "sw0rdf1sh"});) Is explicit out-of-URI secret management necessary, or is it sufficient to document that since filesystem URIs represent access to their referent they must be guarded accordingly? Ben Kietzman [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077