I got some time over during the holidays, so I spent some of it
doing something I've been thinking about for a while.

For those of you that are not aware of it: Coccinelle is a tool for pattern
matching and text transformation for C code and can be used for detection
of problematic programming patterns and to make complex, tree-wide patches
easy. It is aware of the structure of C code and is better suited to make
complicated changes than what is possible using normal text substitution
tools like Sed and Perl.

Coccinelle have been successfully been used in the Linux project since 2008
and is now an established tool for Linux development and a large number of
semantic patches have been added to the source tree to capture everything
from generic issues (like eliminating the redundant A in expressions like
"!A || (A && B)") to more Linux-specific problems like adding a missing
call to kfree().

Although PostgreSQL is nowhere the size of the Linux kernel, it is
nevertheless of a significant size and would benefit from incorporating
Coccinelle into the development. I noticed it's been used in a few cases
way back (like 10 years back) to fix issues in the PostgreSQL code, but I
thought it might be useful to make it part of normal development practice
to, among other things:

- Identify and correct bugs in the source code both during development and
review.
- Make large-scale changes to the source tree to improve the code based on
new insights.
- Encode and enforce APIs by ensuring that function calls are used
correctly.
- Use improved coding patterns for more efficient code.
- Allow extensions to automatically update code for later PostgreSQL
versions.

To that end, I created a series of patches to show how it could be used in
the PostgreSQL tree. It is a lot easier to discuss concrete code and I
split it up into separate messages since that makes it easier to discuss
each individual patch. The series contains code to make it easy to work
with Coccinelle during development and reviews, as well as examples of
semantic patches that capture problems, demonstrate how to make large-scale
changes, how to enforce APIs, and also improve some coding patterns.

This first patch contains the coccicheck.py script, which is a
re-implementation of the coccicheck script that the Linux kernel uses. We
cannot immediately use the coccicheck script since it is quite closely tied
to the Linux source code tree and we need to have something that both
supports autoconf and Meson. Since Python seems to be used more and more in
the tree, it seems to be the most natural choice. (I have no strong opinion
on what language to use, but think it would be good to have something that
is as platform-independent as possible.)

The intention is that we should be able to use the Linux semantic patches
directly, so it supports the "Requires" and "Options" keywords, which can
be used to require a specific version of spatch(1) and add options to the
execution of that semantic patch, respectively.
--
Best wishes,
Mats Kindahl, Timescale
From 55f5caba3d6cb88e3729985571286c16171f36b3 Mon Sep 17 00:00:00 2001
From: Mats Kindahl <m...@kindahl.net>
Date: Sun, 29 Dec 2024 19:35:58 +0100
Subject: Add initial coccicheck script

The coccicheck.py script can be used to run several semantics patches on a
source tree to either generate a report, see the context of the modification
(what lines that requires changes), or generate a patch to correct an issue.

    python coccicheck.py <options> <pattern> <path> ...

Options:

    --spatch=SPATCH
	Path to spatch binary. Defaults to value of environment variable
	SPATCH.

    --mode={report,context,patch}
        Defaults to value of environment variable MODE.

    <pattern>
	pattern for all semantic patches to match. For example,
	src/tools/cocci/**/.cocci to match all *.cocci files in the directory
        src/tools/cocci.

    <path>
        Path to source files to apply semantic patches to.
---
 src/tools/coccicheck.py | 176 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 176 insertions(+)
 create mode 100755 src/tools/coccicheck.py

diff --git a/src/tools/coccicheck.py b/src/tools/coccicheck.py
new file mode 100755
index 00000000000..1fe136b307f
--- /dev/null
+++ b/src/tools/coccicheck.py
@@ -0,0 +1,176 @@
+#!/usr/bin/env python3
+
+"""Run Coccinelle on a set of files and directories.
+
+This is a re-written version of the Linux ``coccicheck`` script.
+
+Coccicheck can run in two different modes (the original have four
+different modes):
+
+- *patch*: patch files using the cocci file.
+
+- *report*: report will report any improvements that this script can
+  make, but not show any patch.
+
+- *context*: show the context where the patch can be applied.
+
+The program will take a single cocci file and call spatch(1) with a
+set of paths that can be either files or directories.
+
+When starting, the cocci file will be parsed and any lines containing
+"Options:" or "Requires:" will be treated specially.
+
+- Lines containing "Options:" will have a list of options to add to
+  the call of the spatch(1) program. These options will be added last.
+
+- Lines containing "Requires:" can contain a version of spatch(1) that
+  is required for this cocci file. If the version requirements are not
+  satisfied, the file will not be used.
+
+When calling spatch(1), it will set the virtual rules "patch" or
+"report" and the cocci file can use these to act differently depending
+on the mode.
+
+You need to set the following environment variables to control the
+default:
+
+SPATCH: Path to spatch program. This will be used if no path is
+  passed using the option --spatch.
+
+You may set the following environment variables:
+
+SPATCH_EXTRA: Extra flags to use when calling spatch. These will be
+  added last.
+
+"""
+
+import argparse
+import os
+import sys
+import subprocess
+import re
+
+from pathlib import PurePath, Path
+from packaging import version
+
+VERSION_CRE = re.compile(
+    r'spatch version (\S+) compiled with OCaml version (\S+)'
+)
+
+
+def parse_metadata(cocci_file):
+    """Parse metadata in Cocci file."""
+    metadata = {}
+    with open(cocci_file) as fh:
+        for line in fh:
+            mre = re.match(r'(Options|Requires):(.*)', line, re.IGNORECASE)
+            if mre:
+                metadata[mre.group(1).lower()] = mre.group(2)
+    return metadata
+
+
+def get_config(args):
+    """Compute configuration information."""
+    # Figure out spatch version. We just need to read the first line
+    config = {}
+    cmd = [args.spatch, '--version']
+    with subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True) as proc:
+        for line in proc.stdout:
+            mre = VERSION_CRE.match(line)
+            if mre:
+                config['spatch_version'] = mre.group(1)
+                break
+    return config
+
+
+def run_spatch(cocci_file, args, config, env):
+    """Run coccinelle on the provided file."""
+    if args.verbose > 1:
+        print("processing cocci file", cocci_file)
+    spatch_version = config['spatch_version']
+    metadata = parse_metadata(cocci_file)
+
+    # Check that we have a valid version
+    if 'required' in metadata:
+        required_version = version.parse(metadata['required'])
+        if required_version < spatch_version:
+            print(
+                f'Skipping SmPL patch {cocci_file}: '
+                f'requires {required_version} (had {spatch_version})'
+            )
+            return
+
+    command = [
+        args.spatch,
+        "-D",  args.mode,
+        "--cocci-file", cocci_file,
+        "--very-quiet",
+    ]
+
+    if 'options' in metadata:
+        command.append(metadata['options'])
+    if args.mode == 'report':
+        command.append('--no-show-diff')
+    if args.spflags:
+        command.append(args.spflags)
+
+    sp = subprocess.run(command + args.path, env=env)
+    if sp.returncode != 0:
+        sys.exit(sp.returncode)
+
+
+def coccinelle(args, config, env):
+    """Run coccinelle on all files matching the provided pattern."""
+    root = '/' if PurePath(args.cocci).is_absolute() else '.'
+    count = 0
+    for cocci_file in Path(root).glob(args.cocci):
+        count += 1
+        run_spatch(cocci_file, args, config, env)
+    return count
+
+
+def main(argv):
+    """Run coccicheck."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--verbose', '-v', action='count', default=0)
+    parser.add_argument('--spatch', type=PurePath, metavar='SPATCH',
+                        default=os.environ.get('SPATCH'),
+                        help=('Path to spatch binary. Defaults to '
+                              'value of environment variable SPATCH.'))
+    parser.add_argument('--spflags', type=PurePath,
+                        metavar='SPFLAGS',
+                        default=os.environ.get('SPFLAGS', None),
+                        help=('Flags to pass to spatch call. Defaults '
+                              'to value of enviroment variable SPFLAGS.'))
+    parser.add_argument('--mode', choices=['patch', 'report', 'context'],
+                        default=os.environ.get('MODE', 'report'),
+                        help=('Mode to use for coccinelle. Defaults to '
+                              'value of environment variable MODE.'))
+    parser.add_argument('--include', '-I', type=PurePath,
+                        metavar='DIR',
+                        help='Extra include directories.')
+    parser.add_argument('cocci', metavar='pattern',
+                        help='Pattern for Cocci files to use.')
+    parser.add_argument('path', nargs='+', type=PurePath,
+                        help='Directory or source path to process.')
+
+    args = parser.parse_args(argv)
+
+    if args.verbose > 1:
+        print("arguments:", args)
+
+    if args.spatch is None:
+        parser.error('spatch is part of the Coccinelle project and is '
+                     'available at http://coccinelle.lip6.fr/')
+
+    if coccinelle(args, get_config(args), os.environ) == 0:
+        parser.error(f'no coccinelle files found matching {args.cocci}')
+
+
+if __name__ == '__main__':
+    try:
+        main(sys.argv[1:])
+    except KeyboardInterrupt:
+        print("Execution aborted")
+    except Exception as exc:
+        print(exc)
-- 
2.43.0

Reply via email to