Copilot commented on code in PR #2875:
URL: https://github.com/apache/tika/pull/2875#discussion_r3363856577


##########
tika-core/src/main/java/org/apache/tika/io/FilenameUtils.java:
##########
@@ -268,6 +270,39 @@ public static String getSanitizedEmbeddedFilePath(Metadata 
metadata,
         return retPath;
     }
 
+    /**
+     * Resolves {@code name} against {@code dir} and returns the result only 
if it stays
+     * within {@code dir}.
+     * <p>
+     * The result is {@link Path#normalize() normalized} before it is checked. 
This is the
+     * load-bearing step: without it, resolving a name such as {@code ../x} 
yields the literal
+     * path {@code dir/../x}, whose path elements still begin with {@code 
dir}, so the
+     * {@link Path#startsWith(Path)} check below would wrongly accept it. 
Normalizing first
+     * collapses the {@code ..} so the check sees the real location.
+     * <p>
+     * Containment is tested with {@link Path#startsWith(Path)} (element by 
element) rather
+     * than by comparing path strings, so a sibling whose name merely shares a 
textual prefix
+     * with {@code dir} (for example {@code /a/bc} against {@code /a/b}) is 
correctly treated
+     * as being outside {@code dir}.
+     *
+     * @param dir  the directory the resolved path must stay within
+     * @param name the child name to resolve against {@code dir}; may contain 
relative
+     *             segments such as {@code ..}
+     * @return the resolved, normalized path, guaranteed to start with the 
normalized
+     *         {@code dir}
+     * @throws IOException if {@code name} resolves to a location outside 
{@code dir}
+     */
+    public static Path resolveWithin(Path dir, String name) throws IOException 
{
+        Path normalizedDir = dir.normalize();
+        Path resolved = normalizedDir.resolve(name).normalize();
+        if (!resolved.startsWith(normalizedDir)) {
+            throw new IOException(
+                    "'" + name + "' resolves to '" + resolved + "', which is 
outside of '" +
+                            normalizedDir + "'");
+        }
+        return resolved;
+    }

Review Comment:
   `resolveWithin` prevents `..` traversal via `normalize()` + `startsWith()`, 
but it can still be bypassed via symlinks inside `dir` (e.g., `dir/link -> 
/etc` and `name=link/passwd` will pass the current check while escaping the 
directory). There is precedent in `FileSystemFetcher` for a 
`toRealPath()`-based containment check when the target exists; adding the same 
defense-in-depth here would better match the method’s stated guarantee of 
staying within `dir`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to