Hi all,

I'll prepare a draft PR adding KV pairs to the interface.

Thanks,
Ben

On Wed, Apr 23, 2025 at 8:09 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Ben and all,
>
> Sorry for chiming in lately. I do find the URI-and-kv-pairs interface
> attractive.
>
> That said, some filesystem options can't reasonably be expressed as
> strings. For example, `S3Options` has a `std::shared_ptr<const
> KeyValueMetadata> default_metadata` and a
> `std::shared_ptr<S3RetryStrategy>`.
>
> So, perhaps we want to allow for generic option values, and therefore
> have an interface looking like:
> ```
> /// \param[in] uri the URI to give access to
> /// \param[in] options a list of backend-specific filesystem options
> ///            Each option is a (name, value) pair.
> ///            The expected type is specific to the backend and
> ///            option name.
> Result<std::shared_ptr<FileSystem>> FileSystemFromUri(
>      std::string_view uri,
>      const std::vector<std::pair<std::string_view, std::any>>& options);
> ```
>
>
> Le 10/04/2025 à 19:38, Benjamin Kietzman a écrit :
> > Hi Bryce,
> >
> > I meant to say that since the C++ interface for constructing filesystems
> is
> > [1]
> >
> >      Result<std::shared_ptr<FileSystem>> FileSystemFromUri(const
> std::string
> > &uri);
> >
> > it follows that the only argument (the uri) must contain all the
> > information required.
> >
> > Certainly alternate interfaces such as the uri-and-kv-pairs which Raphael
> > described
> > would be possible, sorry for the confusion; I definitely did not mean
> MUST
> > in an rfc2119
> > sense.
> >
> > [1]
> >
> https://arrow.apache.org/docs/cpp/api/filesystem.html#high-level-factory-functions
> >
> > On Wed, Apr 9, 2025 at 1:09 PM Bryce Mecum <bryceme...@gmail.com> wrote:
> >
> >> Hi Ben, would you be able to elaborate on this part:
> >>
> >>> Since URIs must be complete specifications of a filesystem, this
> >> necessitates inclusion of the secrets required by S3 in the URI. Since
> >> anyone with a URI has access to the filesystem to which it refers, these
> >> filesystem URIs are transitively secret.
> >>
> >> Why must URIs be the complete specification of a filesystem? Why does
> >> having a URI confer access to the resource, or does that point just
> >> follow from your previous?
> >>
> >> On Wed, Apr 9, 2025 at 7:37 AM Benjamin Kietzman <bengil...@gmail.com>
> >> wrote:
> >>>
> >>> I have been working on modularizing the C++ library by extending
> >> FileSystem
> >>> construction from URIs. I recently merged a PR which prompted some
> >>> discussion [1] of how the library should handle secrets.
> >>>
> >>> Some FileSystems cannot be constructed without one or more secrets. For
> >>> example, an S3FileSystem might require a proxy's username and password
> in
> >>> order to configure the client which the S3FileSystem wraps. Since the
> >>> usefulness of S3 and other filesystems which may only use default
> >>> credentials is very limited, I think it's safe to say that any
> interface
> >>> for construction of filesystems must accept secrets as parameters.
> >>>
> >>> In the C++ library and its bindings, FileSystems can be constructed
> from
> >> a
> >>> URI. This modular interface means that libarrow can construct an
> >>> S3FileSystem even without being compiled with/linked to the AWS SDK.
> >> Since
> >>> URIs must be complete specifications of a filesystem, this necessitates
> >>> inclusion of the secrets required by S3 in the URI. Since anyone with a
> >> URI
> >>> has access to the filesystem to which it refers, these filesystem URIs
> >> are
> >>> transitively secret.
> >>>
> >>> This can and should be better documented, but first we should discuss
> >>> whether URIs-which-are-secrets is an acceptable interface. As a minimal
> >>> example of an alternative design, we could extend the FileSystemFactory
> >>> interface, allowing URIs to reference secrets registered by name
> >> elsewhere:
> >>> "s3://{my-s3-key}:{my-s3-secret-key}@.../{my-secret-bucket}". (New
> >> secrets
> >>> may be added like GetSecretRegistry()->AddSecret({.key =
> >>> "my-s3-secret-key", .secret = "sw0rdf1sh"});)
> >>>
> >>> Is explicit out-of-URI secret management necessary, or is it sufficient
> >> to
> >>> document that since filesystem URIs represent access to their referent
> >> they
> >>> must be guarded accordingly?
> >>>
> >>> Ben Kietzman
> >>> [1] https://github.com/apache/arrow/pull/41559#discussion_r1768836077
> >>
> >
>
>

Reply via email to