yuqi1129 commented on code in PR #7782:
URL: https://github.com/apache/gravitino/pull/7782#discussion_r2250245003
##########
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetCatalogOperations.java:
##########
@@ -1248,10 +1298,59 @@ private boolean hasCallerContext() {
&& !CallerContext.CallerContextHolder.get().context().isEmpty();
}
+ @VisibleForTesting
+ FileSystem getFileSystemWithCache(Path path, Map<String, String> conf) {
+ String pathString = path.toString();
+ // extract the prefix of the path to use as the cache key
+ String prefix = extractPrefix(pathString);
+ return fileSystemCache.get(
+ new FileSystemCacheKey(conf, prefix),
+ cacheKey -> {
+ try {
+ return getFileSystem(path, conf);
+ } catch (IOException e) {
+ throw new GravitinoRuntimeException(
+ e, "Failed to get FileSystem for fileset: path: %s, conf: %s",
path, conf);
+ }
+ });
+ }
+
+ /**
+ * Extracts the prefix from the given path. The prefix is defined as the
scheme and the first
+ * slash after the scheme.
+ *
+ * @param path the path from which to extract the prefix.
+ * @return the prefix of the path, or an empty string if the path is null or
empty.
+ */
+ @VisibleForTesting
+ String extractPrefix(String path) {
+ if (path == null || path.isEmpty()) {
+ return "";
+ }
+
+ if (path.startsWith("file:/")) {
+ return "file:///";
+ }
+
+ String protocolSlash = "://";
+ int protocolStart = path.indexOf(protocolSlash);
+ if (protocolStart == -1) {
+ return path;
+ }
+
+ int firstSlash = path.indexOf('/', protocolStart + protocolSlash.length());
+ if (firstSlash == -1) {
+ return path + "/";
+ }
+
+ return path.substring(0, firstSlash + 1);
Review Comment:
No, `path.toUri().getAuthority()` will only get the machine or bucket name.
For example:
- `hdfs://127.0.0.1:9000/dir1/dir2/file1`, the result is 127.0.0.1:9000.
- `s3://bucket1/dir1/dir2/file2`, the result is bucket1.
- `gs://bucket1/dir1/dir2/file3`, the result is also bucket1.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]