github-actions[bot] commented on code in PR #64009:
URL: https://github.com/apache/doris/pull/64009#discussion_r3450854455


##########
fe/fe-filesystem/fe-filesystem-spi/src/main/java/org/apache/doris/filesystem/spi/S3CompatibleFileSystem.java:
##########
@@ -53,6 +54,7 @@ public abstract class S3CompatibleFileSystem extends 
ObjFileSystem {
 
     // Object stores do not have real directories; use a zero-byte marker with 
trailing slash.
     private static final String DIR_MARKER_SUFFIX = "/";
+    private static final int MAX_DETERMINISTIC_HEAD_PATHS = 100;

Review Comment:
   This restores the deterministic HEAD optimization unconditionally with a 
hard-coded limit, but the FE still exposes mutable controls for this exact 
behavior: `s3_skip_list_for_deterministic_path` says setting it to false falls 
back to listing, and `s3_head_request_max_paths` controls the expansion 
threshold. Those names are now only declared in `Config.java`; the SPI path 
uses this constant and always takes the plain/deterministic HEAD branches. 
After this PR, an operator who disables the optimization or lowers the max path 
count still gets HEAD requests, so the documented config behavior is broken. 
Please either plumb these values into filesystem construction/properties, or 
remove/update the stale configs and tests so the behavior is explicit.



##########
fe/fe-filesystem/fe-filesystem-spi/src/main/java/org/apache/doris/filesystem/spi/S3CompatibleFileSystem.java:
##########
@@ -907,6 +909,232 @@ private static String expandNumericRanges(String pattern) 
{
         return sb.toString();
     }
 
+    private static boolean isDeterministicKeyPattern(String pattern) {
+        if (pattern.indexOf('*') >= 0 || pattern.indexOf('?') >= 0 || 
pattern.indexOf('\\') >= 0) {
+            return false;
+        }
+        int i = 0;
+        while (i < pattern.length()) {
+            if (pattern.charAt(i) != '[') {
+                i++;
+                continue;
+            }
+            int end = pattern.indexOf(']', i + 1);
+            if (end < 0 || end == i + 1) {
+                return false;
+            }
+            char first = pattern.charAt(i + 1);
+            if (first == '!' || first == '^') {
+                return false;
+            }
+            i = end + 1;
+        }
+        return true;
+    }
+
+    private static boolean isPlainObjectKeyPattern(String pattern) {
+        for (char c : new char[] {'*', '?', '[', '{', '\\'}) {
+            if (pattern.indexOf(c) >= 0) {
+                return false;
+            }
+        }
+        return true;
+    }
+
+    private static String expandBracketPatterns(String pattern) {
+        StringBuilder result = new StringBuilder();
+        int i = 0;
+        while (i < pattern.length()) {
+            char c = pattern.charAt(i);
+            if (c != '[') {
+                result.append(c);
+                i++;
+                continue;
+            }
+            int end = pattern.indexOf(']', i + 1);
+            if (end < 0) {
+                result.append(c);
+                i++;
+                continue;
+            }
+            List<Character> values = expandBracketContent(pattern.substring(i 
+ 1, end));
+            result.append('{');
+            for (int j = 0; j < values.size(); j++) {
+                if (j > 0) {
+                    result.append(',');
+                }
+                result.append(values.get(j));

Review Comment:
   This bracket-to-brace conversion is not semantics-preserving for bracket 
classes containing brace syntax characters. For example, `file[,].csv` is a 
deterministic bracket pattern that should HEAD `file,.csv`, but this emits 
`file{,}.csv`; `splitBraceContent` then treats the comma as an alternation 
separator and expands to empty alternatives, so the code HEADs `file.csv` 
instead. Similar cases exist for `{`/`}` inside the class. The deterministic 
fast path needs to carry concrete alternatives directly or escape/special-case 
characters that are meaningful to the later brace expander; otherwise valid S3 
TVF glob patterns silently return the wrong files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to