Hi all, I'll prepare a draft PR adding KV pairs to the interface.
Thanks, Ben On Wed, Apr 23, 2025 at 8:09 AM Antoine Pitrou <anto...@python.org> wrote: > > Hi Ben and all, > > Sorry for chiming in lately. I do find the URI-and-kv-pairs interface > attractive. > > That said, some filesystem options can't reasonably be expressed as > strings. For example, `S3Options` has a `std::shared_ptr<const > KeyValueMetadata> default_metadata` and a > `std::shared_ptr<S3RetryStrategy>`. > > So, perhaps we want to allow for generic option values, and therefore > have an interface looking like: > ``` > /// \param[in] uri the URI to give access to > /// \param[in] options a list of backend-specific filesystem options > /// Each option is a (name, value) pair. > /// The expected type is specific to the backend and > /// option name. > Result<std::shared_ptr<FileSystem>> FileSystemFromUri( > std::string_view uri, > const std::vector<std::pair<std::string_view, std::any>>& options); > ``` > > > Le 10/04/2025 à 19:38, Benjamin Kietzman a écrit : > > Hi Bryce, > > > > I meant to say that since the C++ interface for constructing filesystems > is > > [1] > > > > Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const > std::string > > &uri); > > > > it follows that the only argument (the uri) must contain all the > > information required. > > > > Certainly alternate interfaces such as the uri-and-kv-pairs which Raphael > > described > > would be possible, sorry for the confusion; I definitely did not mean > MUST > > in an rfc2119 > > sense. > > > > [1] > > > https://arrow.apache.org/docs/cpp/api/filesystem.html#high-level-factory-functions > > > > On Wed, Apr 9, 2025 at 1:09 PM Bryce Mecum <bryceme...@gmail.com> wrote: > > > >> Hi Ben, would you be able to elaborate on this part: > >> > >>> Since URIs must be complete specifications of a filesystem, this > >> necessitates inclusion of the secrets required by S3 in the URI. Since > >> anyone with a URI has access to the filesystem to which it refers, these > >> filesystem URIs are transitively secret. > >> > >> Why must URIs be the complete specification of a filesystem? Why does > >> having a URI confer access to the resource, or does that point just > >> follow from your previous? > >> > >> On Wed, Apr 9, 2025 at 7:37 AM Benjamin Kietzman <bengil...@gmail.com> > >> wrote: > >>> > >>> I have been working on modularizing the C++ library by extending > >> FileSystem > >>> construction from URIs. I recently merged a PR which prompted some > >>> discussion [1] of how the library should handle secrets. > >>> > >>> Some FileSystems cannot be constructed without one or more secrets. For > >>> example, an S3FileSystem might require a proxy's username and password > in > >>> order to configure the client which the S3FileSystem wraps. Since the > >>> usefulness of S3 and other filesystems which may only use default > >>> credentials is very limited, I think it's safe to say that any > interface > >>> for construction of filesystems must accept secrets as parameters. > >>> > >>> In the C++ library and its bindings, FileSystems can be constructed > from > >> a > >>> URI. This modular interface means that libarrow can construct an > >>> S3FileSystem even without being compiled with/linked to the AWS SDK. > >> Since > >>> URIs must be complete specifications of a filesystem, this necessitates > >>> inclusion of the secrets required by S3 in the URI. Since anyone with a > >> URI > >>> has access to the filesystem to which it refers, these filesystem URIs > >> are > >>> transitively secret. > >>> > >>> This can and should be better documented, but first we should discuss > >>> whether URIs-which-are-secrets is an acceptable interface. As a minimal > >>> example of an alternative design, we could extend the FileSystemFactory > >>> interface, allowing URIs to reference secrets registered by name > >> elsewhere: > >>> "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New > >> secrets > >>> may be added like GetSecretRegistry()->AddSecret({.key = > >>> "my-s3-secret-key", .secret = "sw0rdf1sh"});) > >>> > >>> Is explicit out-of-URI secret management necessary, or is it sufficient > >> to > >>> document that since filesystem URIs represent access to their referent > >> they > >>> must be guarded accordingly? > >>> > >>> Ben Kietzman > >>> [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077 > >> > > > >