arturobernalg commented on code in PR #517:
URL: 
https://github.com/apache/httpcomponents-core/pull/517#discussion_r1996253015


##########
httpcore5/src/main/java/org/apache/hc/core5/net/PercentCodec.java:
##########
@@ -113,6 +113,37 @@ public class PercentCodec {
         RFC5987_UNRESERVED.set('~');
     }
 
+    static final BitSet PCHAR = new BitSet(256);
+    static final BitSet USERINFO = new BitSet(256);
+    static final BitSet REG_NAME = new BitSet(256);
+    static final BitSet PATH_SEGMENT = new BitSet(256);
+    static final BitSet QUERY = new BitSet(256);
+    static final BitSet FRAGMENT = new BitSet(256);
+
+    static {
+        PCHAR.or(UNRESERVED);
+        PCHAR.or(SUB_DELIMS);
+        PCHAR.set(':');
+        PCHAR.set('@');
+        USERINFO.or(UNRESERVED);
+        USERINFO.or(SUB_DELIMS);
+        USERINFO.set(':');
+        REG_NAME.or(UNRESERVED);
+        REG_NAME.or(SUB_DELIMS);
+        REG_NAME.clear('!');
+        PATH_SEGMENT.or(PCHAR);
+        QUERY.or(PCHAR);
+        QUERY.set('/');
+        QUERY.set('?');
+        FRAGMENT.or(PCHAR);
+        FRAGMENT.set('/');
+        FRAGMENT.set('?');
+        // Some sub-delims remain encoded (RFC 3986 allows them unencoded, but 
we choose to be strict).
+        PATH_SEGMENT.clear('(');
+        PATH_SEGMENT.clear(')');
+        PATH_SEGMENT.clear('&');
+    }

Review Comment:
   > @arturobernalg I cannot find anything in the RFC stating that. What am I 
missing? What a super strict mode we have `ALL_RESERVED`
   
   @olegk  The test expects `(`, `)`, `&` to remain percent-encoded in path 
segments (e.g., `blah-%28%20-blah-%20%26%20-blah-%20%29-blah/`), but without 
the `PATH_SEGMENT.clear('(')`, `clear(')')`, `clear('&')` in 
`PercentCodec.java`, it outputs `blah-(%20-blah-%20&%20-blah-%20)-blah/`. 
Reintroducing the `clear()` calls fixes it, but I’m wondering if this aligns 
with our intended behavior or if we should adjust `formatPath` to handle 
encoding differently based on `encodingPolicy`.
   I noticed the older `formatPath` used `PercentCodec.encode(buf, segment, 
charset, false)` with `UNRESERVED`, which worked, but the new version with 
`encodingPolicy` support might need tweaking. Could you take a look and advise?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to