Hi, On 2023-10-18 19:18:13 -0700, Andres Freund wrote: > On 2023-10-18 21:29:37 -0400, Tom Lane wrote: > Ah, I see. If I interpret that correctly, the code filters out symbols it > doesn't find in in some .[chly] file in the *source* directory. This code is, > uh, barely readable and massively underdocumented. > > I guess I need to reimplement that :/. Don't immediately see how this could > be implemented for in-tree autoconf builds... > > > > But in the attached patch I've implemented this slightly differently. If > > > the > > > tooling to do so is available, the indent-* targets explained above, > > > use/depend on src/tools/pgindent/typedefs.list.merged (in the build dir), > > > which is the combination of a src/tools/pgindent/typedefs.list.local > > > generated > > > for the local binaries/libraries and the source tree > > > src/tools/pgindent/typedefs.list. > > > > Hmm ... that allows indenting your C files, but how do you get from that > > to a non-noisy patch to commit to typedefs.list? > > It doesn't provide that on its own. Being able to painlessly indent the files > seems pretty worthwhile already. But clearly it'd much better if we can > automatically update typedefs.list.
With code for that added, things seem to work quite nicely. I added similar logic to the buildfarm code that builds a list of all tokens in the source code. With that, the in-tree typedefs.list can be updated with new tokens found locally *and* typdefs that aren't used anymore can be removed from the in-tree typedefs.list (detected by no matching tokens found in the source code). The only case this approach can't handle is newly referenced typedefs in code that isn't built locally - which I think isn't particularly common and seems somewhat fundamental. In those cases typedefs.list still can be updated manually (and the sorting will still be "fixed" if necessary). The code is still in a somewhat rough shape and I'll not finish polishing it today. I've attached the code anyway, don't be too rough :). The changes from running "ninja update-typedefs indent-tree" on debian and macos are attached as 0004 - the set of changes looks quite correct to me. The buildfarm code filtered out a few typedefs manually: push(@badsyms, 'date', 'interval', 'timestamp', 'ANY'); but I don't really see why? Possibly that was needed with an older pg_bsd_indent to prevent odd stuff? Right now building a new unified typedefs.list and copying it to the source tree are still separate targets, but that probably makes less sense now? Or perhaps it should be copied to the source tree when reindenting? I've only handled linux and macos in the typedefs gathering code. But the remaining OSs should be "just a bit of work" [TM]. Greetings, Andres Freund
>From f7520fd87764cddbde9d6c5d25fdfe5314ec5cbc Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Sat, 27 May 2023 11:38:27 -0700 Subject: [PATCH v4 1/4] Add meson targets to run pgindent Author: Reviewed-By: Discussion: https://postgr.es/m/ Backpatch: --- meson.build | 38 +++++++++++++++++++++++++++++++++++++ src/tools/pgindent/pgindent | 12 ++++++++++-- 2 files changed, 48 insertions(+), 2 deletions(-) diff --git a/meson.build b/meson.build index 862c955453f..2cfeb60eb07 100644 --- a/meson.build +++ b/meson.build @@ -3013,6 +3013,44 @@ run_target('install-test-files', +############################################################### +# Indentation and similar targets +############################################################### + +indent_base_cmd = [perl, files('src/tools/pgindent/pgindent'), + '--indent', pg_bsd_indent.full_path(), + '--sourcetree=@SOURCE_ROOT@'] +indent_depend = [pg_bsd_indent] + +# reindent the entire tree +# Reindent the entire tree +run_target('indent-tree', + command: indent_base_cmd + ['.'], + depends: indent_depend +) + +# Reindent the last commit +run_target('indent-head', + command: indent_base_cmd + ['--commit=HEAD'], + depends: indent_depend +) + +# Silently check if any file is not corectly indented (fails the build if +# there are differences) +run_target('indent-check', + command: indent_base_cmd + ['--silent-diff', '.'], + depends: indent_depend +) + +# Show a diff of all the unindented files (doesn't fail the build if there are +# differences) +run_target('indent-diff', + command: indent_base_cmd + ['--show-diff', '.'], + depends: indent_depend +) + + + ############################################################### # Test prep ############################################################### diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent index bce63d95daf..08b0c95814f 100755 --- a/src/tools/pgindent/pgindent +++ b/src/tools/pgindent/pgindent @@ -23,7 +23,7 @@ my $devnull = File::Spec->devnull; my ($typedefs_file, $typedef_str, @excludes, $indent, $build, $show_diff, - $silent_diff, $help, @commits,); + $silent_diff, $sourcetree, $help, @commits,); $help = 0; @@ -35,7 +35,8 @@ my %options = ( "excludes=s" => \@excludes, "indent=s" => \$indent, "show-diff" => \$show_diff, - "silent-diff" => \$silent_diff,); + "silent-diff" => \$silent_diff, + "sourcetree=s" => \$sourcetree); GetOptions(%options) || usage("bad command line argument"); usage() if $help; @@ -53,6 +54,13 @@ $typedefs_file ||= $ENV{PGTYPEDEFS}; # get indent location for environment or default $indent ||= $ENV{PGINDENT} || $ENV{INDENT} || "pg_bsd_indent"; +# When invoked from the build directory, change into source tree, otherwise +# the heuristics in locate_sourcedir() don't work. +if (defined $sourcetree) +{ + chdir $sourcetree; +} + my $sourcedir = locate_sourcedir(); # if it's the base of a postgres tree, we will exclude the files -- 2.38.0
>From dfe5ad9229e812d0f805d6a39cb296b9284e30f2 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Wed, 18 Oct 2023 16:18:58 -0700 Subject: [PATCH v4 2/4] heavily wip: update typedefs --- meson.build | 14 ++++- src/backend/jit/llvm/meson.build | 2 +- src/backend/snowball/meson.build | 2 +- src/timezone/meson.build | 2 +- src/tools/pgindent/merge_typedefs | 86 +++++++++++++++++++++++++++ src/tools/pgindent/meson.build | 69 +++++++++++++++++++++ src/tools/pgindent/typedefs_from_objs | 42 +++++++++++++ 7 files changed, 213 insertions(+), 4 deletions(-) create mode 100755 src/tools/pgindent/merge_typedefs create mode 100644 src/tools/pgindent/meson.build create mode 100755 src/tools/pgindent/typedefs_from_objs diff --git a/meson.build b/meson.build index 2cfeb60eb07..492ce9a48b1 100644 --- a/meson.build +++ b/meson.build @@ -2579,6 +2579,7 @@ add_project_link_arguments(ldflags, language: ['c', 'cpp']) # list of targets for various alias targets backend_targets = [] +data_targets = [] bin_targets = [] pl_targets = [] contrib_targets = [] @@ -2894,6 +2895,8 @@ subdir('src/interfaces/ecpg/test') subdir('doc/src/sgml') +subdir('src/tools/pgindent') + generated_sources_ac += {'': ['GNUmakefile']} # After processing src/test, add test_install_libs to the testprep_targets @@ -2982,6 +2985,7 @@ endif all_built = [ backend_targets, bin_targets, + data_targets, libpq_st, pl_targets, contrib_targets, @@ -3017,12 +3021,20 @@ run_target('install-test-files', # Indentation and similar targets ############################################################### +# If the dependencies for generating a local typedefs.list are fulfilled, we +# use a combination of a locally built and the source tree's typededefs.list +# file (see src/tools/pgindent/meson.build) for reindenting. That ensures +# newly added typedefs are indented correctly. indent_base_cmd = [perl, files('src/tools/pgindent/pgindent'), '--indent', pg_bsd_indent.full_path(), '--sourcetree=@SOURCE_ROOT@'] indent_depend = [pg_bsd_indent] -# reindent the entire tree +if typedefs_supported + indent_base_cmd += ['--typedefs', typedefs_merged.full_path()] + indent_depend += typedefs +endif + # Reindent the entire tree run_target('indent-tree', command: indent_base_cmd + ['.'], diff --git a/src/backend/jit/llvm/meson.build b/src/backend/jit/llvm/meson.build index 8ffaf414609..7e2f6d5bd5c 100644 --- a/src/backend/jit/llvm/meson.build +++ b/src/backend/jit/llvm/meson.build @@ -82,4 +82,4 @@ llvmjit_types = custom_target('llvmjit_types.bc', install_dir: dir_lib_pkg, depfile: '@basen...@.c.bc.d', ) -backend_targets += llvmjit_types +data_targets += llvmjit_types diff --git a/src/backend/snowball/meson.build b/src/backend/snowball/meson.build index 0f669c0bf3c..69e777399fd 100644 --- a/src/backend/snowball/meson.build +++ b/src/backend/snowball/meson.build @@ -94,4 +94,4 @@ install_subdir('stopwords', ) backend_targets += dict_snowball -backend_targets += snowball_create +data_targets += snowball_create diff --git a/src/timezone/meson.build b/src/timezone/meson.build index 7b85a01c6bd..779a51d85a4 100644 --- a/src/timezone/meson.build +++ b/src/timezone/meson.build @@ -50,7 +50,7 @@ if get_option('system_tzdata') == '' install_dir: dir_data, ) - bin_targets += tzdata + data_targets += tzdata endif subdir('tznames') diff --git a/src/tools/pgindent/merge_typedefs b/src/tools/pgindent/merge_typedefs new file mode 100755 index 00000000000..9826fd1a675 --- /dev/null +++ b/src/tools/pgindent/merge_typedefs @@ -0,0 +1,86 @@ +#!/usr/bin/env python3 + +# Tool to build a new typedefs.list file from +# a) the current typedefs.list file in the source tree +# b) newly generated typedefs.list fragments for build products +# c) the source code +# +# To avoid littering the typedefs file with typedefs that are not in our source +# code, the list of typedefs found is filtered with tokens found in source +# files. +# +# If the local build references a typedef that's not present in the source +# typedefs.list, we can discover such typedefs. +# +# If the source typedefs.list references a typedef that is not used anymore, it +# will not appear in the source tree anymore and thus can be filtered out. +# +# However, typedefs in newly written code for a different platform (or a +# disabled build option), will not be discovered. + +import argparse +import os +import subprocess +import sys +import re + +parser = argparse.ArgumentParser( + description='build new typedefs files from multiple inputs') + +parser.add_argument('--output-local', type=argparse.FileType('w'), required=True) +parser.add_argument('--output-merged', type=argparse.FileType('w'), required=True) +parser.add_argument('--current-typedefs', type=argparse.FileType('r'), required=True) +parser.add_argument('--filter-source', type=str, required=False) +parser.add_argument('input', type=argparse.FileType('r'), nargs='*') + + +def merge_typedefs(files): + typedefs = set() + for input in files: + for typedef in input.readlines(): + typedefs.add(typedef.strip()) + return typedefs + +def source_list(srcdir): + filelist = [] + os.chdir(srcdir) + accepted_files = re.compile(r'\.(c|h|l|y|.cpp)$') + for root, dirs, files in os.walk('.', topdown=True): + # don't go into hidden dirs or docs + for d in dirs: + if d.startswith('.') or d == 'doc': + dirs.remove(d) + for f in files: + if accepted_files.search(f): + filelist.append((os.path.join(root, f))) + return filelist + +def tokenize_sources(files): + comment_sub_re = re.compile(r'/\*.*?\*/', flags=re.MULTILINE|re.DOTALL) + token_find_re = re.compile(r'\b\w+\b') + + tokens = set() + for file in files: + with open(file, 'r') as fh: + content = fh.read() + content = comment_sub_re.sub('', content) + tokens.update(token_find_re.findall(content)) + return tokens + +def main(args): + local_typedefs = merge_typedefs(args.input) + current_typedefs = merge_typedefs([args.current_typedefs]) + + if args.filter_source: + filelist = source_list(args.filter_source) + tokens = tokenize_sources(filelist) + local_typedefs.intersection_update(tokens) + current_typedefs.intersection_update(tokens) + + current_typedefs.update(local_typedefs) + print('\n'.join(sorted(local_typedefs)), file=args.output_local) + print('\n'.join(sorted(current_typedefs)), file=args.output_merged) + +if __name__ == '__main__': + main(parser.parse_args()) + sys.exit(0) diff --git a/src/tools/pgindent/meson.build b/src/tools/pgindent/meson.build new file mode 100644 index 00000000000..de02c676a46 --- /dev/null +++ b/src/tools/pgindent/meson.build @@ -0,0 +1,69 @@ +# Currently the code requires meson 0.63 or upwards. This could likely be +# lifted (in fact, some older versions actually work, but only some), but for +# now it's not crucial. +typedefs_supported = meson.version().version_compare('>=0.63') +if not typedefs_supported + subdir_done() +endif + +typedefs_from_objs = files('typedefs_from_objs') +typedefs_from_objs_cmd = [typedefs_from_objs, '--host', host_system, '--output', '@OUTPUT@', '@INPUT@'] +merge_typedefs = files('merge_typedefs') + +# XXX: This list of targets should likely not be maintained here +typedef_src_tgts = [ + backend_targets, + bin_targets, + [libpq_st], + pl_targets, + contrib_targets, + ecpg_targets, +] + +# We generate partial typedefs files for each binary/library. That makes using +# this during incremental development much faster. +# +# The reason we process the object files instead of executables is that that +# doesn't work on macos. There doesn't seem to be a reason to target +# executables directly on other platforms. +typedef_tgts = [] +foreach tgts : typedef_src_tgts + foreach tgt : tgts + # can't use tgt.name(), as we have have targets that differ just in suffix + name = fs.name(tgt.full_path()) + tdname = 'typedefs.list.local-@0@'.format(name) + objs = tgt.extract_all_objects(recursive: true) + typedef_tgts += custom_target(tdname, + output: tdname, + command: typedefs_from_objs_cmd, + input: objs, + build_by_default: false, + ) + + endforeach +endforeach + + +# Build new typedefs.list files. typedefs.list.local will contain only +# typedefs in the local build, whereas typedefs.list.merged is the combination +# of typedefs.list.local and the source tree's typedefs.list. However, +# typedefs that are not used in the code anymore are removed even from +# typedefs.list.merged. +typedefs = custom_target('typedefs.list', output: ['typedefs.list.local', 'typedefs.list.merged'], + command: [merge_typedefs, + '--output-local', '@OUTPUT0@', + '--output-merged', '@OUTPUT1@', + '--current-typedefs', files('typedefs.list'), + '--filter-source', '@SOURCE_ROOT@', + '@INPUT@'], + input: typedef_tgts, + build_by_default: false, +) +typedefs_local = typedefs[0] +typedefs_merged = typedefs[1] + +if cp.found() + run_target('update-typedefs', + command: [cp, typedefs_merged, '@SOURCE_ROOT@/src/tools/pgindent/typedefs.list'], + ) +endif diff --git a/src/tools/pgindent/typedefs_from_objs b/src/tools/pgindent/typedefs_from_objs new file mode 100755 index 00000000000..821c84cb3d8 --- /dev/null +++ b/src/tools/pgindent/typedefs_from_objs @@ -0,0 +1,42 @@ +#!/usr/bin/env python3 + +# Tool to extract typedefs from object files. + +import argparse +import re +import shutil +import subprocess +import sys + +parser = argparse.ArgumentParser( + description='generate typedefs file for a set of object files') + +parser.add_argument('--output', type=argparse.FileType('w'), required=True) +parser.add_argument('--host', type=str, required=True) +parser.add_argument('input', nargs='*') + +args = parser.parse_args() + +if args.host == 'linux': + find_td_re = re.compile(r'^[^\n]+TAG_typedef\)\n[^\n]+DW_AT_name\s*:\s*\([^\)]+\): ([^\n]+)$', re.MULTILINE) + # FIXME: should probably be set by the caller? Except that the same binary + # name behaves very differently on different platforms :/ + cmd = [shutil.which('objdump'), '-Wi'] +elif args.host == 'darwin': + find_td_re = re.compile(r'^[^\n]+TAG_typedef\n\s*DW_AT_type[^\n]+\n\s+DW_AT_name\s*\(\"([^\n]+)\"\)$', re.MULTILINE) + cmd = [shutil.which('dwarfdump')] +else: + raise f'unsupported platform: {args.host}' + +lcmd = cmd + args.input +sp = subprocess.run(lcmd, stdout=subprocess.PIPE, universal_newlines=True) +if sp.returncode != 0: + print(f'{lcmd} failed with return code {sp.returncode}', file=sys.stderr) + sys.exit(sp.returncode) + +fa = find_td_re.findall(sp.stdout) + +for typedef in fa: + print(typedef, file=args.output) + +sys.exit(0) -- 2.38.0
>From 017183e75c7a633dedabdbf57aa1ac80ee955f93 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Wed, 18 Oct 2023 21:40:36 -0700 Subject: [PATCH v4 3/4] WIP: meson: nitpick mode + test checking indentation Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- meson.build | 20 +++++++++++++++++--- meson_options.txt | 3 +++ .cirrus.tasks.yml | 1 + 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/meson.build b/meson.build index 492ce9a48b1..854b7ecf56c 100644 --- a/meson.build +++ b/meson.build @@ -44,6 +44,7 @@ cc = meson.get_compiler('c') not_found_dep = dependency('', required: false) thread_dep = dependency('threads') auto_features = get_option('auto_features') +nitpick_mode = get_option('nitpick') @@ -3025,16 +3026,18 @@ run_target('install-test-files', # use a combination of a locally built and the source tree's typededefs.list # file (see src/tools/pgindent/meson.build) for reindenting. That ensures # newly added typedefs are indented correctly. -indent_base_cmd = [perl, files('src/tools/pgindent/pgindent'), +indent_base_args = [files('src/tools/pgindent/pgindent'), '--indent', pg_bsd_indent.full_path(), - '--sourcetree=@SOURCE_ROOT@'] + '--sourcetree=@0@'.format(meson.current_source_dir())] indent_depend = [pg_bsd_indent] if typedefs_supported - indent_base_cmd += ['--typedefs', typedefs_merged.full_path()] + indent_base_args += ['--typedefs', typedefs_merged.full_path()] indent_depend += typedefs endif +indent_base_cmd = [perl, indent_base_args] + # Reindent the entire tree run_target('indent-tree', command: indent_base_cmd + ['.'], @@ -3364,6 +3367,17 @@ if meson.version().version_compare('>=0.57') endif +# If we were asked to be nitpicky, add a test that fails if sources aren't +# properly indented +if nitpick_mode + test('indent-check', + perl, args: [indent_base_args, '--silent-diff', '.'], + depends: indent_depend, + suite: 'source', + priority: 10) +endif + + ############################################################### # Pseudo targets diff --git a/meson_options.txt b/meson_options.txt index d2f95cfec36..90cb2177a44 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -52,6 +52,9 @@ option('atomics', type: 'boolean', value: true, option('spinlocks', type: 'boolean', value: true, description: 'Use spinlocks') +option('nitpick', type: 'boolean', value: false, + description: 'annoy the hell out of you, committers ought to enable this') + # Compilation options diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml index e137769850d..5dd29bed4cb 100644 --- a/.cirrus.tasks.yml +++ b/.cirrus.tasks.yml @@ -358,6 +358,7 @@ task: meson setup \ --buildtype=debug \ -Dcassert=true \ + -Dnitpick=true \ ${LINUX_MESON_FEATURES} \ -DPG_TEST_EXTRA="$PG_TEST_EXTRA" \ build -- 2.38.0
>From 7e06ca28697b365561371b2843d88c4caaccd347 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Wed, 18 Oct 2023 21:31:22 -0700 Subject: [PATCH v4 4/4] Update typedefs, reindent using the new build targets >From debian sid and macos ventura. --- contrib/sepgsql/hooks.c | 2 +- contrib/sepgsql/label.c | 2 +- contrib/sepgsql/uavc.c | 2 +- src/tools/pgindent/typedefs.list | 14 +++++++++++--- 4 files changed, 14 insertions(+), 6 deletions(-) diff --git a/contrib/sepgsql/hooks.c b/contrib/sepgsql/hooks.c index fa734763602..443e30be994 100644 --- a/contrib/sepgsql/hooks.c +++ b/contrib/sepgsql/hooks.c @@ -50,7 +50,7 @@ typedef struct * command. Elsewhere (including the case of default) NULL. */ const char *createdb_dtemplate; -} sepgsql_context_info_t; +} sepgsql_context_info_t; static sepgsql_context_info_t sepgsql_context_info; diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c index 38ff4068ff2..e13640dc4d9 100644 --- a/contrib/sepgsql/label.c +++ b/contrib/sepgsql/label.c @@ -67,7 +67,7 @@ typedef struct { SubTransactionId subid; char *label; -} pending_label; +} pending_label; /* * sepgsql_get_client_label diff --git a/contrib/sepgsql/uavc.c b/contrib/sepgsql/uavc.c index 6e3a892da24..6f63a19aa96 100644 --- a/contrib/sepgsql/uavc.c +++ b/contrib/sepgsql/uavc.c @@ -44,7 +44,7 @@ typedef struct /* true, if tcontext is valid */ char *ncontext; /* temporary scontext on execution of trusted * procedure, or NULL elsewhere */ -} avc_cache; +} avc_cache; /* * Declaration of static variables diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index e69bb671bf8..d6178386252 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -2,6 +2,7 @@ ACCESS_ALLOWED_ACE ACL ACL_SIZE_INFORMATION AFFIX +ANY ASN1_INTEGER ASN1_OBJECT ASN1_OCTET_STRING @@ -1275,9 +1276,9 @@ JsonManifestWALRangeField JsonObjectAgg JsonObjectConstructor JsonOutput -JsonParseExpr JsonParseContext JsonParseErrorType +JsonParseExpr JsonPath JsonPathBool JsonPathExecContext @@ -1913,7 +1914,6 @@ ParallelHashJoinBatch ParallelHashJoinBatchAccessor ParallelHashJoinState ParallelIndexScanDesc -ParallelReadyList ParallelSlot ParallelSlotArray ParallelSlotResultHandler @@ -3135,6 +3135,7 @@ _ino_t _locale_t _resultmap _stringlist +access_vector_t acquireLocksOnSubLinks_context add_nulling_relids_context adjust_appendrel_attrs_context @@ -3167,6 +3168,7 @@ assign_collations_context auth_password_hook_typ autovac_table av_relation +avc_cache avl_dbase avl_node avl_tree @@ -3247,6 +3249,7 @@ crosstab_HashEnt crosstab_cat_desc datapagemap_iterator_t datapagemap_t +date dateKEY datetkn dce_uuid_t @@ -3402,6 +3405,7 @@ indexed_tlist inet inetKEY inet_struct +initRowMethod init_function inline_cte_walker_context inline_error_callback_arg @@ -3418,6 +3422,7 @@ int64 int64KEY int8 internalPQconninfoOption +interval intptr_t intset_internal_node intset_leaf_node @@ -3519,6 +3524,7 @@ parse_error_callback_arg parser_context partition_method_t pendingPosition +pending_label pgParameterStatus pg_atomic_flag pg_atomic_uint32 @@ -3710,7 +3716,9 @@ saophash_hash save_buffer scram_state scram_state_enum +security_class_t sem_t +sepgsql_context_info_t sequence_magic set_join_pathlist_hook_type set_rel_pathlist_hook_type @@ -3792,6 +3800,7 @@ time_t timeout_handler_proc timeout_params timerCA +timestamp tlist_vinfo toast_compress_header tokenize_error_callback_arg @@ -3869,7 +3878,6 @@ wchar2mb_with_len_converter wchar_t win32_deadchild_waitinfo wint_t -worker_spi_state worker_state worktable wrap -- 2.38.0