github-actions[bot] commented on code in PR #64009:
URL: https://github.com/apache/doris/pull/64009#discussion_r3452401481


##########
fe/fe-filesystem/fe-filesystem-spi/src/main/java/org/apache/doris/filesystem/spi/S3CompatibleFileSystem.java:
##########
@@ -53,12 +54,26 @@ public abstract class S3CompatibleFileSystem extends 
ObjFileSystem {
 
     // Object stores do not have real directories; use a zero-byte marker with 
trailing slash.
     private static final String DIR_MARKER_SUFFIX = "/";
+    private static final int DEFAULT_MAX_DETERMINISTIC_HEAD_PATHS = 100;
 
     private final boolean usePathStyle;
+    private final boolean skipListForDeterministicPath;
+    private final int maxDeterministicHeadPaths;
 
     protected S3CompatibleFileSystem(ObjStorage<?> objStorage, boolean 
usePathStyle) {
+        this(objStorage, usePathStyle, true, 
DEFAULT_MAX_DETERMINISTIC_HEAD_PATHS);

Review Comment:
   This only wires the runtime controls through the AWS S3 provider. The shared 
base is also used by COS/OSS/OBS, but those constructors still call the 
two-argument constructor, which hard-codes `skipListForDeterministicPath=true` 
and the default limit. Since `FileSystemFactory` injects 
`s3_skip_list_for_deterministic_path` and `s3_head_request_max_paths` for every 
provider map, an operator who disables the feature or lowers the limit still 
gets deterministic HEAD expansion for COS/OSS/OBS exact/finite paths. Please 
move these controls into a common S3-compatible property path, or pass the 
effective values from the COS/OSS/OBS providers too, and add at least one 
non-S3 provider unit test for the disabled/limit fallback behavior. This is the 
remaining parallel-provider instance after the S3-only plumbing added in this 
PR.



##########
fe/fe-core/src/main/java/org/apache/doris/fs/FileSystemCache.java:
##########
@@ -208,10 +209,17 @@ public static class FileSystemCacheKey {
         // eg: hdfs://nameservices1
         private final String fsIdent;
         private final StorageProperties properties;
+        private final boolean s3SkipListForDeterministicPath;
+        private final int s3HeadRequestMaxPaths;
 
         public FileSystemCacheKey(String fsIdent, StorageProperties 
properties) {
             this.fsIdent = fsIdent;
             this.properties = properties;
+            boolean isS3Compatible = properties instanceof 
AbstractS3CompatibleProperties;
+            this.s3SkipListForDeterministicPath = isS3Compatible
+                    && Config.s3_skip_list_for_deterministic_path;

Review Comment:
   These values are captured into the cache key, but the filesystem loader does 
not use that captured snapshot. On a cache miss it passes only `key.properties` 
to `FileSystemFactory.getFileSystem()`, and `withRuntimeFileSystemProperties()` 
re-reads the mutable global `Config` while constructing the provider map. If an 
admin changes `s3_skip_list_for_deterministic_path` or 
`s3_head_request_max_paths` between key construction and the load, the cache 
can store a filesystem built with the new values under a key hashed with the 
old values; when the config later returns to the old value, callers can reuse a 
filesystem with the wrong deterministic-list behavior until eviction. Please 
construct the filesystem from the key's captured runtime values, or materialize 
those values into the effective property map before loading, and add a 
`FileSystemCacheTest` case that flips `Config` after key construction but 
before the loader runs.



##########
regression-test/suites/external_table_p0/tvf/test_s3_tvf_skip_list_for_exact_path.groovy:
##########
@@ -0,0 +1,35 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_s3_tvf_skip_list_for_exact_path", "p0,external") {
+    String ak = getS3AK()
+    String sk = getS3SK()
+    String s3_endpoint = getS3Endpoint()
+    String region = getS3Region()
+    String bucket = context.config.otherConfigs.get("s3BucketName")
+
+    sql """ desc function S3 (
+                "uri" = 
"s3://${bucket}/dir_deny_list_objects/test_s3_tvf_skip_list_for_exact_path.csv",
+                "format" = "csv",
+                "csv_schema" = "k1:int;k2:string",

Review Comment:
   This `DESC FUNCTION` can still pass without proving the exact object was 
matched. The test supplies `csv_schema`, and `getTableColumns()` returns that 
schema before fetching file-derived schema, while the new deterministic HEAD 
path treats a not-found exact key as an empty listing. So a missing or typoed 
`dir_deny_list_objects/...csv` fixture would still make this test green as long 
as no ListObjects request was issued. Please make this assert actual data from 
the object, for example with an `order_qt` `SELECT * FROM S3 (...)` and 
generated `.out`, so the no-ListBucket path is still exercised but a 
missing/wrong exact object fails the regression.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to