yuqi1129 commented on code in PR #5209: URL: https://github.com/apache/gravitino/pull/5209#discussion_r1810184103
########## clients/client-python/gravitino/filesystem/gvfs.py: ########## @@ -819,5 +840,40 @@ def _get_gcs_filesystem(self): return importlib.import_module("pyarrow.fs").GcsFileSystem() + def _get_s3_filesystem(self): + # get All keys from the options that start with 'gravitino.bypass.s3.' and remove the prefix + s3_options = { + key[len(GVFSConfig.GVFS_FILESYSTEM_BY_PASS_S3) :]: value + for key, value in self._options.items() + if key.startswith(GVFSConfig.GVFS_FILESYSTEM_BY_PASS_S3) + } + + # get 'aws_access_key_id' from s3_options, if the key is not found, throw an exception + aws_access_key_id = s3_options.get(GVFSConfig.GVFS_FILESYSTEM_S3_ACCESS_KEY) + if aws_access_key_id is None: + raise GravitinoRuntimeException( + "AWS access key id is not found in the options." + ) + + # get 'aws_secret_access_key' from s3_options, if the key is not found, throw an exception + aws_secret_access_key = s3_options.get(GVFSConfig.GVFS_FILESYSTEM_S3_SECRET_KEY) + if aws_secret_access_key is None: + raise GravitinoRuntimeException( + "AWS secret access key is not found in the options." + ) + + # get 'aws_endpoint_url' from s3_options, if the key is not found, throw an exception + aws_endpoint_url = s3_options.get(GVFSConfig.GVFS_FILESYSTEM_S3_ENDPOINT) + if aws_endpoint_url is None: + raise GravitinoRuntimeException( + "AWS endpoint url is not found in the options." + ) + + return importlib.import_module("pyarrow.fs").S3FileSystem( Review Comment: PyArrow's implementation provides an uniform API to users, for example, combined with `ArrowFSWrapper`, we can support all kinds of storage throught API exposed by `ArrowFSWrapper`. I have viewed the implementation by `fsspec`, it's seems that there are no big difference compared to that provided by `pyarrow`. Considering the efficiency brought by `arrow` and `arrow` has been used by `HDFS`, so I continue to use `pyarrow` ########## clients/client-python/gravitino/filesystem/gvfs_config.py: ########## @@ -35,3 +35,8 @@ class GVFSConfig: GVFS_FILESYSTEM_BY_PASS = "gravitino.bypass" GVFS_FILESYSTEM_BY_PASS_GCS = "gravitino.bypass.gcs." GVFS_FILESYSTEM_KEY_FILE = "service-account-key-path" + + GVFS_FILESYSTEM_BY_PASS_S3 = "gravitino.bypass.s3." Review Comment: OK, let me think a bit. ########## clients/client-python/gravitino/filesystem/gvfs.py: ########## @@ -49,6 +49,8 @@ class StorageType(Enum): HDFS = "hdfs" LOCAL = "file" GCS = "gs" + S3A = "s3a" + S3 = "s3" Review Comment: I am concerned about any instances where the location starts with `s3` NOT `s3a` before, as clarified by @xloya , there seems to be only one entrance and Gravitino is the only ways that can create fileset, so we can safely remove `s3` here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org