I got some time over during the holidays, so I spent some of it doing something I've been thinking about for a while.
For those of you that are not aware of it: Coccinelle is a tool for pattern matching and text transformation for C code and can be used for detection of problematic programming patterns and to make complex, tree-wide patches easy. It is aware of the structure of C code and is better suited to make complicated changes than what is possible using normal text substitution tools like Sed and Perl. Coccinelle have been successfully been used in the Linux project since 2008 and is now an established tool for Linux development and a large number of semantic patches have been added to the source tree to capture everything from generic issues (like eliminating the redundant A in expressions like "!A || (A && B)") to more Linux-specific problems like adding a missing call to kfree(). Although PostgreSQL is nowhere the size of the Linux kernel, it is nevertheless of a significant size and would benefit from incorporating Coccinelle into the development. I noticed it's been used in a few cases way back (like 10 years back) to fix issues in the PostgreSQL code, but I thought it might be useful to make it part of normal development practice to, among other things: - Identify and correct bugs in the source code both during development and review. - Make large-scale changes to the source tree to improve the code based on new insights. - Encode and enforce APIs by ensuring that function calls are used correctly. - Use improved coding patterns for more efficient code. - Allow extensions to automatically update code for later PostgreSQL versions. To that end, I created a series of patches to show how it could be used in the PostgreSQL tree. It is a lot easier to discuss concrete code and I split it up into separate messages since that makes it easier to discuss each individual patch. The series contains code to make it easy to work with Coccinelle during development and reviews, as well as examples of semantic patches that capture problems, demonstrate how to make large-scale changes, how to enforce APIs, and also improve some coding patterns. This first patch contains the coccicheck.py script, which is a re-implementation of the coccicheck script that the Linux kernel uses. We cannot immediately use the coccicheck script since it is quite closely tied to the Linux source code tree and we need to have something that both supports autoconf and Meson. Since Python seems to be used more and more in the tree, it seems to be the most natural choice. (I have no strong opinion on what language to use, but think it would be good to have something that is as platform-independent as possible.) The intention is that we should be able to use the Linux semantic patches directly, so it supports the "Requires" and "Options" keywords, which can be used to require a specific version of spatch(1) and add options to the execution of that semantic patch, respectively. -- Best wishes, Mats Kindahl, Timescale
From 55f5caba3d6cb88e3729985571286c16171f36b3 Mon Sep 17 00:00:00 2001 From: Mats Kindahl <m...@kindahl.net> Date: Sun, 29 Dec 2024 19:35:58 +0100 Subject: Add initial coccicheck script The coccicheck.py script can be used to run several semantics patches on a source tree to either generate a report, see the context of the modification (what lines that requires changes), or generate a patch to correct an issue. python coccicheck.py <options> <pattern> <path> ... Options: --spatch=SPATCH Path to spatch binary. Defaults to value of environment variable SPATCH. --mode={report,context,patch} Defaults to value of environment variable MODE. <pattern> pattern for all semantic patches to match. For example, src/tools/cocci/**/.cocci to match all *.cocci files in the directory src/tools/cocci. <path> Path to source files to apply semantic patches to. --- src/tools/coccicheck.py | 176 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 176 insertions(+) create mode 100755 src/tools/coccicheck.py diff --git a/src/tools/coccicheck.py b/src/tools/coccicheck.py new file mode 100755 index 00000000000..1fe136b307f --- /dev/null +++ b/src/tools/coccicheck.py @@ -0,0 +1,176 @@ +#!/usr/bin/env python3 + +"""Run Coccinelle on a set of files and directories. + +This is a re-written version of the Linux ``coccicheck`` script. + +Coccicheck can run in two different modes (the original have four +different modes): + +- *patch*: patch files using the cocci file. + +- *report*: report will report any improvements that this script can + make, but not show any patch. + +- *context*: show the context where the patch can be applied. + +The program will take a single cocci file and call spatch(1) with a +set of paths that can be either files or directories. + +When starting, the cocci file will be parsed and any lines containing +"Options:" or "Requires:" will be treated specially. + +- Lines containing "Options:" will have a list of options to add to + the call of the spatch(1) program. These options will be added last. + +- Lines containing "Requires:" can contain a version of spatch(1) that + is required for this cocci file. If the version requirements are not + satisfied, the file will not be used. + +When calling spatch(1), it will set the virtual rules "patch" or +"report" and the cocci file can use these to act differently depending +on the mode. + +You need to set the following environment variables to control the +default: + +SPATCH: Path to spatch program. This will be used if no path is + passed using the option --spatch. + +You may set the following environment variables: + +SPATCH_EXTRA: Extra flags to use when calling spatch. These will be + added last. + +""" + +import argparse +import os +import sys +import subprocess +import re + +from pathlib import PurePath, Path +from packaging import version + +VERSION_CRE = re.compile( + r'spatch version (\S+) compiled with OCaml version (\S+)' +) + + +def parse_metadata(cocci_file): + """Parse metadata in Cocci file.""" + metadata = {} + with open(cocci_file) as fh: + for line in fh: + mre = re.match(r'(Options|Requires):(.*)', line, re.IGNORECASE) + if mre: + metadata[mre.group(1).lower()] = mre.group(2) + return metadata + + +def get_config(args): + """Compute configuration information.""" + # Figure out spatch version. We just need to read the first line + config = {} + cmd = [args.spatch, '--version'] + with subprocess.Popen(cmd, stdout=subprocess.PIPE, text=True) as proc: + for line in proc.stdout: + mre = VERSION_CRE.match(line) + if mre: + config['spatch_version'] = mre.group(1) + break + return config + + +def run_spatch(cocci_file, args, config, env): + """Run coccinelle on the provided file.""" + if args.verbose > 1: + print("processing cocci file", cocci_file) + spatch_version = config['spatch_version'] + metadata = parse_metadata(cocci_file) + + # Check that we have a valid version + if 'required' in metadata: + required_version = version.parse(metadata['required']) + if required_version < spatch_version: + print( + f'Skipping SmPL patch {cocci_file}: ' + f'requires {required_version} (had {spatch_version})' + ) + return + + command = [ + args.spatch, + "-D", args.mode, + "--cocci-file", cocci_file, + "--very-quiet", + ] + + if 'options' in metadata: + command.append(metadata['options']) + if args.mode == 'report': + command.append('--no-show-diff') + if args.spflags: + command.append(args.spflags) + + sp = subprocess.run(command + args.path, env=env) + if sp.returncode != 0: + sys.exit(sp.returncode) + + +def coccinelle(args, config, env): + """Run coccinelle on all files matching the provided pattern.""" + root = '/' if PurePath(args.cocci).is_absolute() else '.' + count = 0 + for cocci_file in Path(root).glob(args.cocci): + count += 1 + run_spatch(cocci_file, args, config, env) + return count + + +def main(argv): + """Run coccicheck.""" + parser = argparse.ArgumentParser() + parser.add_argument('--verbose', '-v', action='count', default=0) + parser.add_argument('--spatch', type=PurePath, metavar='SPATCH', + default=os.environ.get('SPATCH'), + help=('Path to spatch binary. Defaults to ' + 'value of environment variable SPATCH.')) + parser.add_argument('--spflags', type=PurePath, + metavar='SPFLAGS', + default=os.environ.get('SPFLAGS', None), + help=('Flags to pass to spatch call. Defaults ' + 'to value of enviroment variable SPFLAGS.')) + parser.add_argument('--mode', choices=['patch', 'report', 'context'], + default=os.environ.get('MODE', 'report'), + help=('Mode to use for coccinelle. Defaults to ' + 'value of environment variable MODE.')) + parser.add_argument('--include', '-I', type=PurePath, + metavar='DIR', + help='Extra include directories.') + parser.add_argument('cocci', metavar='pattern', + help='Pattern for Cocci files to use.') + parser.add_argument('path', nargs='+', type=PurePath, + help='Directory or source path to process.') + + args = parser.parse_args(argv) + + if args.verbose > 1: + print("arguments:", args) + + if args.spatch is None: + parser.error('spatch is part of the Coccinelle project and is ' + 'available at http://coccinelle.lip6.fr/') + + if coccinelle(args, get_config(args), os.environ) == 0: + parser.error(f'no coccinelle files found matching {args.cocci}') + + +if __name__ == '__main__': + try: + main(sys.argv[1:]) + except KeyboardInterrupt: + print("Execution aborted") + except Exception as exc: + print(exc) -- 2.43.0