Issue 135619
Summary clang-format-diff.py Fails to Correctly Handle Filenames with Spaces
Labels clang-format
Assignees
Reporter selimkeles
    I’ve encountered an issue with clang-format-diff.py where the script is unable to properly detect filenames that contain spaces. Upon investigating the source code, I discovered that the filename extraction relies on the following code block:
```python
filename = None
lines_by_file = {}
for line in sys.stdin:
    match = re.search(r'^\+\+\+\ (.*?/){%s}(\S*)' % args.p, line)
    if match:
        filename = match.group(2)
    if filename == None:
```
Because this regular _expression_ uses \S* to capture the filename, it stops at the first encountered space. This results in the filename not being recognized in its entirety when there are spaces—causing inaccurate processing of the diff.

**Steps to Reproduce:**

1- Create or rename a file such that its path contains one or more spaces (e.g., my folder/file.c).
2- Make some changes and stage them so that a diff is generated.
3- Run the process clang-format-diff.py (e.g., via a pre-commit hook).
4- Observe that the script fails to detect the file correctly, as the regex does not capture filenames with spaces.

**Proposed Fix:** 

One straightforward solution is to update the regex to capture everything after the scaled path components—even if that includes spaces. For example, replacing (\S*) with (.+) and applying .strip() can address the issue:
```python
match = re.search(r'^\+\+\+\s+(?:.*?/){%s}(.+)$' % args.p, line)
if match:
    filename = match.group(1).strip()
```
This change—matching one or more characters until the end of the line—should correctly capture filenames with spaces, while also handling any extraneous whitespace. I have tested this modification with various file paths and it resolves the issue.

**Additional Note:**

In my local integration of clang-format-diff.py, I also noted that the script always exits with a status code of 0—even when formatting differences are present. This can be misleading for integration in automated checks where one might expect a non-zero exit code in case of formatting issues. While I understand that the design intention might be to simply output the diff changes, it could be useful to mention this behavior in the documentation or consider an opt-in flag to affect the exit code.

**Would you be open to a patch based on the above changes?** I’d be happy to contribute a patch if the maintainers agree that this is an improvement.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to