LuciferYang opened a new issue, #64767:
URL: https://github.com/apache/doris/issues/64767

   ### Search before asking
   
   - [x] I had searched in the [issues](https://github.com/apache/doris/issues) 
and found no similar issues.
   
   ### Version
   
   master (`fe-foundation`, `org.apache.doris.foundation.util.PathUtils`).
   
   ### What's Wrong?
   
   `PathUtils.equalsIgnoreSchemeIfOneIsS3(p1, p2)` compares two 
storage-location URIs treating the `s3` scheme as interchangeable with other 
object-store schemes. Its two branches used **inconsistent** rules:
   
   - **Same scheme** → `p1.equalsIgnoreCase(p2)`: full-string, 
**case-insensitive**, trailing slash **significant**.
   - **Cross-scheme (one is `s3`)** → compares 
`normalize(authority)`/`normalize(path)` with `Objects.equals`: 
**case-sensitive**, trailing slash **stripped**.
   
   Consequences:
   
   1. The result for one URI depends on the *other* URI's scheme. For example 
`s3://bucket/path/` vs `s3://bucket/path` are **unequal** (same-scheme branch), 
but `s3://bucket/path/` vs `cos://bucket/path` are **equal** (cross-scheme 
branch).
   2. The same-scheme branch ignores case for the whole string, so it can 
wrongly equate case-sensitive S3 object keys (`s3://b/A` == `s3://b/a`).
   
   The only caller is `HMSTransaction.prepareInsertExistingTable` (`fe-core`), 
which uses this to decide whether a Hive commit needs a rename — so 
inconsistent equality can lead to an incorrect rename decision.
   
   ### What You Expected?
   
   A single, consistent rule regardless of whether the two schemes match: 
compare authority + path (scheme ignored when equal or when one side is `s3`), 
with trailing slashes insignificant and the comparison case-sensitive 
(object-storage keys are case-sensitive).
   
   ### How to Reproduce?
   
   ```java
   // same pair, different "other" scheme -> different answer (inconsistent):
   PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/path/", 
"s3://bucket/path");   // false
   PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/path/", 
"cos://bucket/path");  // true
   
   // same-scheme comparison wrongly ignores case:
   PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/A", "s3://bucket/a");     
     // true (should be false)
   ```
   
   ### Anything Else?
   
   Fix proposed in the linked PR. It also hardens several edge cases surfaced 
during review (opaque URIs, percent-encoded slashes, triple-slash / 
network-path forms) by falling back to exact string comparison for inputs that 
are malformed for object storage.
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to