SECCOMP ("SECure COMPuting with filters") is a Linux kernel syscall
filtering mechanism which allows reduction of the kernel attack surface
by preventing (or at least audit logging) normally unused syscalls.

Quoting from this link:
https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt

   "A large number of system calls are exposed to every userland process
    with many of them going unused for the entire lifetime of the
    process. As system calls change and mature, bugs are found and
    eradicated. A certain subset of userland applications benefit by
    having a reduced set of available system calls. The resulting set
    reduces the total kernel surface exposed to the application. System
    call filtering is meant for use with those applications."

Recent security best-practices recommend, and certain highly
security-conscious organizations are beginning to require, that SECCOMP
be used to the extent possible. The major web browsers, container
runtime engines, and systemd are all examples of software that already
support seccomp.

---------
A seccomp (bpf) filter is comprised of a default action, and a set of
rules with actions pertaining to specific syscalls (possibly with even
more specific sets of arguments). Once loaded into the kernel, a filter
is inherited by all child processes and cannot be removed. It can,
however, be overlaid with another filter. For any given syscall match,
the most restrictive (a.k.a. highest precedence) action will be taken by
the kernel. PostgreSQL has already been run "in the wild" under seccomp
control in containers, and possibly systemd. Adding seccomp support into
PostgreSQL itself mitigates issues with these approaches, and has
several advantages:

* Container seccomp filters tend to be extremely broad/permissive,
  typically allowing about 6 out 7 of all syscalls. They must do this
  because the use cases for containers vary widely.
* systemd does not implement seccomp filters by default. Packagers may
  decide to do so, but there is no guarantee. Adding them post install
  potentially requires cooperation by groups outside control of
  the database admins.
* In the container and systemd case there is no particularly good way to
  inspect what filters are active. It is possible to observe actions
  taken, but again, control is possibly outside the database admin
  group. For example, the best way to understand what happened is to
  review the auditd log, which is likely not readable by the DBA.
* With built-in support, it is possible to lock down backend processes
  more tightly than the postmaster.
* With built-in support, it is possible to lock down different backend
  processes differently than each other, for example by using ALTER ROLE
  ... SET or ALTER DATABASE ... SET.
* With built-in support, it is possible to calculate and return (in the
  form of an SRF) the effective filters being applied to the postmaster
  and the current backend.
* With built-in support, it could be possible (this part not yet
  implemented) to have separate filters for different backend types,
  e.g. autovac workers, background writer, etc.

---------
Attached is a patch for discussion, adding support for seccomp-bpf
(nowadays generally just called seccomp) syscall filtering at
configure-time using libseccomp. I would like to get this in shape to be
committed by the end of the November CF if possible.

The code itself has been through several rounds of revision based on
discussions I have had with the author of libseccomp as well as a few
other folks. However as of the moment:

* Documentation - general discussion missing entirely
* No regression tests

---------
For convenience, here are a couple of additional links to relevant
information regarding seccomp:
https://en.wikipedia.org/wiki/Seccomp
https://github.com/seccomp/libseccomp

---------
Specific feedback requested:
1. Placement of pg_get_seccomp_filter() in
   src/backend/utils/adt/genfile.c
   originally made sense but after several rewrites no longer does.
   Ideas where it *should* go?
2. Where should a general discussion section go in the docs, if at all?
3. Currently this supports a global filter at the postmaster level,
   which is inherited by all child processes, and a secondary filter
   at the client backend session level. It likely makes sense to
   support secondary filters for other types of child processes,
   e.g. autovacuum workers, etc. Add that now (pg13), later release,
   or never?
4. What is the best way to approach testing of this feature? Tap
   testing perhaps?
5. Default GUC values - should we provide "starter" lists, or only a
   procedure for generating a list (as below).

---------
Notes on usage:
===============
In order to determine your minimally required allow lists, do something
like the following on a non-production server with the same architecture
as production:

0. Setup:
 * install libseccomp, libseccomp-dev, and seccomp
 * install auditd if not already installed
 * configure postgres --with-seccomp and maybe --enable-tap-tests to
   improve feature coverage (see below)

1. Modify postgresql.conf and/or create <pg_source_dir>/postgresql_tmp.conf
8<--------------------
seccomp = on
global_syscall_default = allow
global_syscall_allow = ''
global_syscall_log = ''
global_syscall_error = ''
global_syscall_kill = ''
session_syscall_default = log
session_syscall_allow = '*'
session_syscall_log = '*'
session_syscall_error = '*'
session_syscall_kill = '*'
8<--------------------

2. Modify /etc/audit/auditd.conf
 * disp_qos = 'lossless'
 * change max_log_file_action = 'ignore'

3. Stop auditd, clear out all audit.logs, start auditd:
 * systemctl stop auditd.service                        # if running
 * echo -n "" > /var/log/audit/audit.log
 * systemctl start auditd.service

4. Start/restart postgres.

5. Exercise postgres as much as possible (one or more of the following):
 * make installcheck-world
 * make check world \
   EXTRA_REGRESS_OPTS=--temp-config=<pg_source_dir>/postgresql_tmp.conf
 * run your application through its paces
 * other random testing of relevant postgres features

  Note: at this point audit.log will start growing quickly. During `make
  check world` mine grew to just under 1 GB.

6. Process results:
 a) systemctl stop auditd.service
 b) Run the provided "get_syscalls.sh" script
 c) Cut and paste the result as the value of session_syscall_allow.

7. Optional:
 a) global_syscall_default = 'log'
 b) Repeat steps 3-5
 c) Repeat step 6a and 6b
 d) Cut and paste the result as the value of global_syscall_allow

8. Iterate steps 3-6b.
 * Output should be empty.
 * If there are any new syscalls, add to global_syscall_allow and
   session_syscall_allow.
 * Iterate until output of "get_syscalls.sh" script is empty.

9. Optional:
 * Change global and session defaults to "error" or "kill"
 * Reduce the allow lists if desired
 * This can be done for specific database users, by doing
   ALTER ROLE... SET session_syscall_allow to '<some reduced allow list>'

10. Adjust settings to taste, restart postgres, and monitor audit.log
    going forward.

Below are some values from my system. Note that I have made no attempt
thus far to do static code analysis -- this list was build using `make
check world` several times.
8<-------------------------
seccomp = on

global_syscall_default = log
global_syscall_allow =
'accept,access,bind,brk,chmod,clone,close,connect,dup,epoll_create1,epoll_ctl,epoll_wait,exit_group,fadvise64,fallocate,fcntl,fdatasync,fstat,fsync,ftruncate,futex,getdents,getegid,geteuid,getgid,getpeername,getpid,getppid,getrandom,getrusage,getsockname,getsockopt,getuid,ioctl,kill,link,listen,lseek,lstat,mkdir,mmap,mprotect,mremap,munmap,openat,pipe,poll,prctl,pread64,prlimit64,pwrite64,read,readlink,recvfrom,recvmsg,rename,rmdir,rt_sigaction,rt_sigprocmask,rt_sigreturn,seccomp,select,sendto,setitimer,set_robust_list,setsid,setsockopt,shmat,shmctl,shmdt,shmget,shutdown,socket,stat,statfs,symlink,sync_file_range,sysinfo,umask,uname,unlink,utime,wait4,write'
global_syscall_log = ''
global_syscall_error = ''
global_syscall_kill = ''

session_syscall_default = log
session_syscall_allow =
'access,brk,chmod,close,connect,epoll_create1,epoll_ctl,epoll_wait,exit_group,fadvise64,fallocate,fcntl,fdatasync,fstat,fsync,ftruncate,futex,getdents,getegid,geteuid,getgid,getpeername,getpid,getrandom,getrusage,getsockname,getsockopt,getuid,ioctl,kill,link,lseek,lstat,mkdir,mmap,mprotect,mremap,munmap,openat,poll,pread64,pwrite64,read,readlink,recvfrom,recvmsg,rename,rmdir,rt_sigaction,rt_sigprocmask,rt_sigreturn,select,sendto,setitimer,setsockopt,shutdown,socket,stat,symlink,sync_file_range,sysinfo,umask,uname,unlink,utime,write'
session_syscall_log = '*'
session_syscall_error = '*'
session_syscall_kill = '*'
8<-------------------------

That results in the following effective filters at the ("context"
equals) global and session levels:

8<-------------------------
select * from pg_get_seccomp_filter() order by 4,1;
     syscall     | syscallnum | filter_action  | context
-----------------+------------+----------------+---------
 accept          |         43 | global->allow  | global
 access          |         21 | global->allow  | global
 bind            |         49 | global->allow  | global
 brk             |         12 | global->allow  | global
 chmod           |         90 | global->allow  | global
 clone           |         56 | global->allow  | global
 close           |          3 | global->allow  | global
 connect         |         42 | global->allow  | global
 <default>       |         -1 | global->log    | global
 dup             |         32 | global->allow  | global
 epoll_create1   |        291 | global->allow  | global
 epoll_ctl       |        233 | global->allow  | global
 epoll_wait      |        232 | global->allow  | global
 exit_group      |        231 | global->allow  | global
 fadvise64       |        221 | global->allow  | global
 fallocate       |        285 | global->allow  | global
 fcntl           |         72 | global->allow  | global
 fdatasync       |         75 | global->allow  | global
 fstat           |          5 | global->allow  | global
 fsync           |         74 | global->allow  | global
 ftruncate       |         77 | global->allow  | global
 futex           |        202 | global->allow  | global
 getdents        |         78 | global->allow  | global
 getegid         |        108 | global->allow  | global
 geteuid         |        107 | global->allow  | global
 getgid          |        104 | global->allow  | global
 getpeername     |         52 | global->allow  | global
 getpid          |         39 | global->allow  | global
 getppid         |        110 | global->allow  | global
 getrandom       |        318 | global->allow  | global
 getrusage       |         98 | global->allow  | global
 getsockname     |         51 | global->allow  | global
 getsockopt      |         55 | global->allow  | global
 getuid          |        102 | global->allow  | global
 ioctl           |         16 | global->allow  | global
 kill            |         62 | global->allow  | global
 link            |         86 | global->allow  | global
 listen          |         50 | global->allow  | global
 lseek           |          8 | global->allow  | global
 lstat           |          6 | global->allow  | global
 mkdir           |         83 | global->allow  | global
 mmap            |          9 | global->allow  | global
 mprotect        |         10 | global->allow  | global
 mremap          |         25 | global->allow  | global
 munmap          |         11 | global->allow  | global
 openat          |        257 | global->allow  | global
 pipe            |         22 | global->allow  | global
 poll            |          7 | global->allow  | global
 prctl           |        157 | global->allow  | global
 pread64         |         17 | global->allow  | global
 prlimit64       |        302 | global->allow  | global
 pwrite64        |         18 | global->allow  | global
 read            |          0 | global->allow  | global
 readlink        |         89 | global->allow  | global
 recvfrom        |         45 | global->allow  | global
 recvmsg         |         47 | global->allow  | global
 rename          |         82 | global->allow  | global
 rmdir           |         84 | global->allow  | global
 rt_sigaction    |         13 | global->allow  | global
 rt_sigprocmask  |         14 | global->allow  | global
 rt_sigreturn    |         15 | global->allow  | global
 seccomp         |        317 | global->allow  | global
 select          |         23 | global->allow  | global
 sendto          |         44 | global->allow  | global
 setitimer       |         38 | global->allow  | global
 set_robust_list |        273 | global->allow  | global
 setsid          |        112 | global->allow  | global
 setsockopt      |         54 | global->allow  | global
 shmat           |         30 | global->allow  | global
 shmctl          |         31 | global->allow  | global
 shmdt           |         67 | global->allow  | global
 shmget          |         29 | global->allow  | global
 shutdown        |         48 | global->allow  | global
 socket          |         41 | global->allow  | global
 stat            |          4 | global->allow  | global
 statfs          |        137 | global->allow  | global
 symlink         |         88 | global->allow  | global
 sync_file_range |        277 | global->allow  | global
 sysinfo         |         99 | global->allow  | global
 umask           |         95 | global->allow  | global
 uname           |         63 | global->allow  | global
 unlink          |         87 | global->allow  | global
 utime           |        132 | global->allow  | global
 wait4           |         61 | global->allow  | global
 write           |          1 | global->allow  | global
 accept          |         43 | session->log   | session
 access          |         21 | session->allow | session
 bind            |         49 | session->log   | session
 brk             |         12 | session->allow | session
 chmod           |         90 | session->allow | session
 clone           |         56 | session->log   | session
 close           |          3 | session->allow | session
 connect         |         42 | session->allow | session
 <default>       |         -1 | session->log   | session
 dup             |         32 | session->log   | session
 epoll_create1   |        291 | session->allow | session
 epoll_ctl       |        233 | session->allow | session
 epoll_wait      |        232 | session->allow | session
 exit_group      |        231 | session->allow | session
 fadvise64       |        221 | session->allow | session
 fallocate       |        285 | session->allow | session
 fcntl           |         72 | session->allow | session
 fdatasync       |         75 | session->allow | session
 fstat           |          5 | session->allow | session
 fsync           |         74 | session->allow | session
 ftruncate       |         77 | session->allow | session
 futex           |        202 | session->allow | session
 getdents        |         78 | session->allow | session
 getegid         |        108 | session->allow | session
 geteuid         |        107 | session->allow | session
 getgid          |        104 | session->allow | session
 getpeername     |         52 | session->allow | session
 getpid          |         39 | session->allow | session
 getppid         |        110 | session->log   | session
 getrandom       |        318 | session->allow | session
 getrusage       |         98 | session->allow | session
 getsockname     |         51 | session->allow | session
 getsockopt      |         55 | session->allow | session
 getuid          |        102 | session->allow | session
 ioctl           |         16 | session->allow | session
 kill            |         62 | session->allow | session
 link            |         86 | session->allow | session
 listen          |         50 | session->log   | session
 lseek           |          8 | session->allow | session
 lstat           |          6 | session->allow | session
 mkdir           |         83 | session->allow | session
 mmap            |          9 | session->allow | session
 mprotect        |         10 | session->allow | session
 mremap          |         25 | session->allow | session
 munmap          |         11 | session->allow | session
 openat          |        257 | session->allow | session
 pipe            |         22 | session->log   | session
 poll            |          7 | session->allow | session
 prctl           |        157 | session->log   | session
 pread64         |         17 | session->allow | session
 prlimit64       |        302 | session->log   | session
 pwrite64        |         18 | session->allow | session
 read            |          0 | session->allow | session
 readlink        |         89 | session->allow | session
 recvfrom        |         45 | session->allow | session
 recvmsg         |         47 | session->allow | session
 rename          |         82 | session->allow | session
 rmdir           |         84 | session->allow | session
 rt_sigaction    |         13 | session->allow | session
 rt_sigprocmask  |         14 | session->allow | session
 rt_sigreturn    |         15 | session->allow | session
 seccomp         |        317 | session->log   | session
 select          |         23 | session->allow | session
 sendto          |         44 | session->allow | session
 setitimer       |         38 | session->allow | session
 set_robust_list |        273 | session->log   | session
 setsid          |        112 | session->log   | session
 setsockopt      |         54 | session->allow | session
 shmat           |         30 | session->log   | session
 shmctl          |         31 | session->log   | session
 shmdt           |         67 | session->log   | session
 shmget          |         29 | session->log   | session
 shutdown        |         48 | session->allow | session
 socket          |         41 | session->allow | session
 stat            |          4 | session->allow | session
 statfs          |        137 | session->log   | session
 symlink         |         88 | session->allow | session
 sync_file_range |        277 | session->allow | session
 sysinfo         |         99 | session->allow | session
 umask           |         95 | session->allow | session
 uname           |         63 | session->allow | session
 unlink          |         87 | session->allow | session
 utime           |        132 | session->allow | session
 wait4           |         61 | session->log   | session
 write           |          1 | session->allow | session
(170 rows)
8<-------------------------

If you made it all the way to here, thank you for your attention :-)

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Attachment: get_syscalls.sh
Description: application/shellscript

diff --git a/configure b/configure
index f14709e..18cdcd4 100755
*** a/configure
--- b/configure
*************** UUID_EXTRA_OBJS
*** 708,713 ****
--- 708,714 ----
  with_uuid
  with_systemd
  with_selinux
+ with_seccomp
  with_openssl
  with_ldap
  with_krb_srvnam
*************** with_bsd_auth
*** 853,858 ****
--- 854,860 ----
  with_ldap
  with_bonjour
  with_openssl
+ with_seccomp
  with_selinux
  with_systemd
  with_readline
*************** Optional Packages:
*** 1557,1562 ****
--- 1559,1565 ----
    --with-ldap             build with LDAP support
    --with-bonjour          build with Bonjour support
    --with-openssl          build with OpenSSL support
+   --with-seccomp          build with seccomp support
    --with-selinux          build with SELinux support
    --with-systemd          build with systemd support
    --without-readline      do not use GNU Readline nor BSD Libedit for editing
*************** $as_echo "$with_openssl" >&6; }
*** 7897,7902 ****
--- 7900,7940 ----
  
  
  #
+ # Seccomp
+ #
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with seccomp support" >&5
+ $as_echo_n "checking whether to build with seccomp support... " >&6; }
+ 
+ 
+ 
+ # Check whether --with-seccomp was given.
+ if test "${with_seccomp+set}" = set; then :
+   withval=$with_seccomp;
+   case $withval in
+     yes)
+ 
+ $as_echo "#define USE_SECCOMP 1" >>confdefs.h
+ 
+       ;;
+     no)
+       :
+       ;;
+     *)
+       as_fn_error $? "no argument expected for --with-seccomp option" "$LINENO" 5
+       ;;
+   esac
+ 
+ else
+   with_seccomp=no
+ 
+ fi
+ 
+ 
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_seccomp" >&5
+ $as_echo "$with_seccomp" >&6; }
+ 
+ 
+ #
  # SELinux
  #
  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build with SELinux support" >&5
*************** fi
*** 12407,12412 ****
--- 12445,12500 ----
  
  
  
+ if test "$with_seccomp" = yes; then
+   { $as_echo "$as_me:${as_lineno-$LINENO}: checking for seccomp_init in -lseccomp" >&5
+ $as_echo_n "checking for seccomp_init in -lseccomp... " >&6; }
+ if ${ac_cv_lib_seccomp_seccomp_init+:} false; then :
+   $as_echo_n "(cached) " >&6
+ else
+   ac_check_lib_save_LIBS=$LIBS
+ LIBS="-lseccomp  $LIBS"
+ cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+ /* end confdefs.h.  */
+ 
+ /* Override any GCC internal prototype to avoid an error.
+    Use char because int might match the return type of a GCC
+    builtin and then its argument prototype would still apply.  */
+ #ifdef __cplusplus
+ extern "C"
+ #endif
+ char seccomp_init ();
+ int
+ main ()
+ {
+ return seccomp_init ();
+   ;
+   return 0;
+ }
+ _ACEOF
+ if ac_fn_c_try_link "$LINENO"; then :
+   ac_cv_lib_seccomp_seccomp_init=yes
+ else
+   ac_cv_lib_seccomp_seccomp_init=no
+ fi
+ rm -f core conftest.err conftest.$ac_objext \
+     conftest$ac_exeext conftest.$ac_ext
+ LIBS=$ac_check_lib_save_LIBS
+ fi
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_seccomp_seccomp_init" >&5
+ $as_echo "$ac_cv_lib_seccomp_seccomp_init" >&6; }
+ if test "x$ac_cv_lib_seccomp_seccomp_init" = xyes; then :
+   cat >>confdefs.h <<_ACEOF
+ #define HAVE_LIBSECCOMP 1
+ _ACEOF
+ 
+   LIBS="-lseccomp $LIBS"
+ 
+ else
+   as_fn_error $? "library 'libseccomp' is required for seccomp support" "$LINENO" 5
+ fi
+ 
+ fi
+ 
  # for contrib/sepgsql
  if test "$with_selinux" = yes; then
    { $as_echo "$as_me:${as_lineno-$LINENO}: checking for security_compute_create_name in -lselinux" >&5
*************** else
*** 13050,13055 ****
--- 13138,13154 ----
  fi
  
  
+ fi
+ 
+ if test "$with_seccomp" = yes ; then
+   ac_fn_c_check_header_mongrel "$LINENO" "seccomp.h" "ac_cv_header_seccomp_h" "$ac_includes_default"
+ if test "x$ac_cv_header_seccomp_h" = xyes; then :
+ 
+ else
+   as_fn_error $? "header file <seccomp.h> is required for seccomp support" "$LINENO" 5
+ fi
+ 
+ 
  fi
  
  if test "$with_libxslt" = yes ; then
diff --git a/configure.in b/configure.in
index 805cf86..65b382d 100644
*** a/configure.in
--- b/configure.in
*************** AC_MSG_RESULT([$with_openssl])
*** 842,847 ****
--- 842,856 ----
  AC_SUBST(with_openssl)
  
  #
+ # Seccomp
+ #
+ AC_MSG_CHECKING([whether to build with seccomp support])
+ PGAC_ARG_BOOL(with, seccomp, no, [build with seccomp support],
+               [AC_DEFINE([USE_SECCOMP], 1, [Define to 1 to build with seccomp support. (--with-seccomp)])])
+ AC_MSG_RESULT([$with_seccomp])
+ AC_SUBST(with_seccomp)
+ 
+ #
  # SELinux
  #
  AC_MSG_CHECKING([whether to build with SELinux support])
*************** fi
*** 1234,1239 ****
--- 1243,1252 ----
  AC_SUBST(LDAP_LIBS_FE)
  AC_SUBST(LDAP_LIBS_BE)
  
+ if test "$with_seccomp" = yes; then
+   AC_CHECK_LIB(seccomp, seccomp_init, [], [AC_MSG_ERROR([library 'libseccomp' is required for seccomp support])])
+ fi
+ 
  # for contrib/sepgsql
  if test "$with_selinux" = yes; then
    AC_CHECK_LIB(selinux, security_compute_create_name, [],
*************** if test "$with_libxml" = yes ; then
*** 1389,1394 ****
--- 1402,1411 ----
    AC_CHECK_HEADER(libxml/parser.h, [], [AC_MSG_ERROR([header file <libxml/parser.h> is required for XML support])])
  fi
  
+ if test "$with_seccomp" = yes ; then
+   AC_CHECK_HEADER(seccomp.h, [], [AC_MSG_ERROR([header file <seccomp.h> is required for seccomp support])])
+ fi
+ 
  if test "$with_libxslt" = yes ; then
    AC_CHECK_HEADER(libxslt/xslt.h, [], [AC_MSG_ERROR([header file <libxslt/xslt.h> is required for XSLT support])])
  fi
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 89284dc..c346db7 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** include_dir 'conf.d'
*** 1859,1864 ****
--- 1859,2031 ----
         </para>
        </listitem>
       </varlistentry>
+ 
+      <varlistentry id="guc-seccomp" xreflabel="seccomp">
+       <term><varname>seccomp</varname> (<type>bool</type>)
+       <indexterm>
+        <primary><varname>seccomp</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         <varname>seccomp</varname> turns on or off seccomp syscall enforcement.
+         This parameter can only be set at server start. The default value is
+         <literal>off</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="guc-global-syscall-default" xreflabel="global_syscall_default">
+       <term><varname>global_syscall_default</varname> (<type>enum</type>)
+       <indexterm>
+        <primary><varname>global_syscall_default</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         <varname>global_syscall_default</varname> determines the default action taken by
+         kernel seccomp enforcement. It is applied to the postmaster and inherited by
+         all child processes.
+        </para>
+ 
+        <para>
+         Valid values are as follows in increasing precedence order.
+         The default value is <literal>allow</literal>, which allows all
+         syscalls not in a specific action list without any action including logging.
+         A value of <literal>log</literal> turns on seccomp enforcement
+         in log-only mode. In this mode, disallowed kernel syscalls are logged by auditd
+         to the audit log. When set to <literal>error</literal>, disallowed kernel syscalls
+         will return with a permission denied error. Finally, <literal>kill</literal> will
+         cause the offending process to be killed as though by a
+         <literal>SIGSYS</literal> signal.
+        </para>
+ 
+        <para>
+         This parameter can only be set at server start.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="guc-session-syscall-default" xreflabel="session_syscall_default">
+       <term><varname>session_syscall_default</varname> (<type>enum</type>)
+       <indexterm>
+        <primary><varname>session_syscall_default</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         <varname>session_syscall_default</varname> helps determine the default action taken
+         by kernel seccomp enforcement. It is applied to client backend sessions. The
+         effective value is the either this setting or that of the postmaster,
+         <varname>global_syscall_default</varname>, whichever has the higher precedence.
+        </para>
+ 
+        <para>
+         Valid values are the same as those for <varname>global_syscall_default</varname>.
+        </para>
+ 
+        <para>
+         This parameter can be set by the superuser, however new values only take effect
+         at session start. This makes it possible to customize sessions with the
+         <command>ALTER ROLE SET</command>. For example, a specific role might have it
+         set to <literal>error</literal> with a restrictive session allow list
+         (<varname>session_syscall_allow</varname>), while other roles have it set to
+         <literal>allow</literal>, assuming <varname>global_syscall_default</varname>
+         is also set to <literal>allow</literal>.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="guc-global-syscall-lists" xreflabel="global_syscall_lists">
+       <term><varname>global_syscall_allow</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>global_syscall_allow</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <term><varname>global_syscall_log</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>global_syscall_log</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <term><varname>global_syscall_error</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>global_syscall_error</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <term><varname>global_syscall_kill</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>global_syscall_kill</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+ 
+       <listitem>
+        <para>
+         These four configuration parameters are lists of kernel syscalls to be given
+         allow, log, error, and kill action rules in the global (postmaster) seccomp
+         filter. They are also inherited by all child processes. Any syscall not explicitly
+         enumerated in one of these lists will have an action as determined by the
+         <varname>global_syscall_default</varname> setting. This parameter can only be set
+         at server start.
+        </para>
+       </listitem>
+      </varlistentry>
+ 
+      <varlistentry id="guc-session-syscall-lists" xreflabel="session_syscall_lists">
+       <term><varname>session_syscall_allow</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>session_syscall_allow</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <term><varname>session_syscall_log</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>session_syscall_log</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <term><varname>session_syscall_error</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>session_syscall_error</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+       <term><varname>session_syscall_kill</varname> (<type>string</type>)
+       <indexterm>
+        <primary><varname>session_syscall_kill</varname> configuration parameter</primary>
+       </indexterm>
+       </term>
+ 
+       <listitem>
+        <para>
+         These four configuration parameters are lists of kernel syscalls to be given
+         allow, log, error, and kill action rules in the client session backend seccomp
+         filter. Any syscall not explicitly enumerated in one of these lists will have
+         a session filter action as determined by the <varname>session_syscall_default</varname>
+         setting. The actual effective action for any given syscall is the highest
+         precedence action, for that syscall, from either the session filter or the
+         global filter. This setting takes effect on session start and may not be
+         changed once a session is established.
+        </para>
+ 
+        <para>
+         The intent of this feature is to allow further restriction of the syscalls
+         available in an interactive user session. It is also possible to customize
+         sessions with the <command>ALTER ROLE SET</command>. For example, a specific
+         role might be allowed to use the necessary syscalls to enable an untrusted
+         procedural-language function to execute arbitrary system commands, while
+         other roles are denied that permission.
+        </para>
+ 
+        <para>
+         These lists may also be set to the single character <literal>'*'</literal>.
+         When set this way, the corresponding action global list is used without
+         modification.
+        </para>
+ 
+        <para>
+         This parameter can be changed without restarting the server, but changes only
+         take effect when a new session is started.
+        </para>
+ 
+       </listitem>
+      </varlistentry>
       </variablelist>
      </sect2>
  
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a7abf8c..51f5964 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*************** SELECT collation for ('foo' COLLATE "de_
*** 19678,19683 ****
--- 19678,19767 ----
      </tgroup>
     </table>
  
+    <para>
+     The functions shown in <xref linkend="functions-seccomp"/>
+     print information about active seccomp filters, both at the
+     global (postmaster) level and session (client backend) level.
+     In particular they calculate the the session level based
+     on the kernel's rules for overlaying the session filter
+     on top of the global filter. Essentially, for any given syscall
+     the most restrictive (highest precedence) rule will govern
+     the action taken.
+    </para>
+ 
+    <para>
+     These functions return no rows unless the <option>--with-seccomp</option>
+     was used during <command>configure</command>.
+    </para>
+ 
+    <table id="functions-seccomp">
+     <title>Seccomp Functions</title>
+     <tgroup cols="3">
+      <thead>
+       <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry></row>
+      </thead>
+ 
+      <tbody>
+       <row>
+        <entry>
+         <indexterm><primary>pg_get_seccomp_filter</primary></indexterm>
+         <literal><function>pg_get_seccomp_filter()</function></literal>
+        </entry>
+        <entry><type>record</type></entry>
+        <entry>
+         Returns information about active seccomp filters.
+        </entry>
+       </row>
+ 
+      </tbody>
+     </tgroup>
+    </table>
+ 
+    <para>
+     <function>pg_get_seccomp_filter</function> returns a record, shown in
+     <xref linkend="functions-pg-get-seccomp-filter"/>
+    </para>
+ 
+    <table id="functions-pg-get-seccomp-filter">
+     <title><function>pg_get_seccomp_filter</function> Columns</title>
+     <tgroup cols="3">
+      <thead>
+       <row>
+        <entry>Column Name</entry>
+        <entry>Data Type</entry>
+        <entry>Description</entry>
+       </row>
+      </thead>
+ 
+      <tbody>
+ 
+       <row>
+        <entry><literal>syscall</literal></entry>
+        <entry><type>text</type></entry>
+        <entry>Name of the kernel syscall</entry>
+       </row>
+ 
+       <row>
+        <entry><literal>syscallnum</literal></entry>
+        <entry><type>int</type></entry>
+        <entry>Kernel syscall number, or -1 for a default rule</entry>
+       </row>
+ 
+       <row>
+        <entry><literal>filter_action</literal></entry>
+        <entry><type>text</type></entry>
+        <entry>Source context -> rule action</entry>
+       </row>
+ 
+       <row>
+        <entry><literal>context</literal></entry>
+        <entry><type>text</type></entry>
+        <entry>Context in which rule is applied</entry>
+       </row>
+ 
+      </tbody>
+     </tgroup>
+    </table>
    </sect1>
  
    <sect1 id="functions-admin">
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 4493862..939e9e3 100644
*** a/doc/src/sgml/installation.sgml
--- b/doc/src/sgml/installation.sgml
*************** su - postgres
*** 254,259 ****
--- 254,266 ----
  
      <listitem>
       <para>
+       You need <productname>seccomp</productname>, if you want to support
+       kernel syscall filtering.
+      </para>
+     </listitem>
+ 
+     <listitem>
+      <para>
        You need <application>Kerberos</application>, <productname>OpenLDAP</productname>,
        and/or <application>PAM</application>, if you want to support authentication
        using those services.
*************** su - postgres
*** 843,848 ****
--- 850,869 ----
           before proceeding.
          </para>
         </listitem>
+       </varlistentry>
+ 
+       <varlistentry>
+        <term><option>--with-seccomp</option></term>
+        <listitem>
+         <para>
+          Build with support for <indexterm><primary>seccomp</primary></indexterm>
+          kernel syscall filtering. This requires <productname>seccomp</productname>
+          packages to be installed. <filename>configure</filename> will check
+          for the required header files and libraries to make sure that
+          your <productname>seccomp</productname> installation is sufficient
+          before proceeding.
+         </para>
+        </listitem>
        </varlistentry>
  
        <varlistentry>
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index dc3f207..bbdc69b 100644
*** a/src/Makefile.global.in
--- b/src/Makefile.global.in
*************** with_perl	= @with_perl@
*** 185,190 ****
--- 185,191 ----
  with_python	= @with_python@
  with_tcl	= @with_tcl@
  with_openssl	= @with_openssl@
+ with_seccomp	= @with_seccomp@
  with_selinux	= @with_selinux@
  with_systemd	= @with_systemd@
  with_gssapi	= @with_gssapi@
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 1119e21..bb2e899 100644
*** a/src/backend/commands/variable.c
--- b/src/backend/commands/variable.c
***************
*** 17,22 ****
--- 17,25 ----
  #include "postgres.h"
  
  #include <ctype.h>
+ #ifdef USE_SECCOMP
+ #include <seccomp.h>
+ #endif
  
  #include "access/htup_details.h"
  #include "access/parallel.h"
*************** show_role(void)
*** 901,903 ****
--- 904,988 ----
  	/* Otherwise we can just use the GUC string */
  	return role_string ? role_string : "none";
  }
+ 
+ #ifdef USE_SECCOMP
+ /*
+  * check_syscall_list: GUC check_hook
+  * check various lists of syscalls used for seccomp enforcement
+  */
+ static bool
+ check_syscall_list(char **newval, void **extra, GucSource source)
+ {
+ 	char		   *rawstring = NULL;
+ 	List		   *elemlist = NIL;
+ 	ListCell	   *l;
+ 	bool			result = true;
+ 
+ 	/* Need a modifiable copy of string */
+ 	rawstring = pstrdup(*newval);
+ 
+ 	/* Parse string into list of syscalls */
+ 	if (!SplitIdentifierString(rawstring, ',', &elemlist))
+ 	{
+ 		GUC_check_errdetail("List syntax is invalid.");
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ 	foreach(l, elemlist)
+ 	{
+ 		char   *cursyscall = (char *) lfirst(l);
+ 		int		syscallnum;
+ 
+ 		/* resolve the syscall name to its number on the current arch */
+ 		syscallnum = seccomp_syscall_resolve_name(cursyscall);
+ 		if (syscallnum < 0)
+ 		{
+ 			/* invalid syscall name */
+ 			GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE);
+ 			GUC_check_errdetail("Seccomp failed to resolve syscall: \"%s\"",
+ 								cursyscall);
+ 			result = false;
+ 			goto out;
+ 		}
+ 	}
+ 
+ out:
+ 	/* safe to release if NIL */
+ 	list_free(elemlist);
+ 
+ 	/* but pfree is not */
+ 	if (rawstring)
+ 		pfree(rawstring);
+ 
+ 	return result;
+ }
+ #endif
+ 
+ bool
+ check_global_syscall_list(char **newval, void **extra, GucSource source)
+ {
+ #ifdef USE_SECCOMP
+ 	return check_syscall_list(newval, extra, source);
+ #else
+ 	return true;
+ #endif
+ }
+ 
+ bool
+ check_session_syscall_list(char **newval, void **extra, GucSource source)
+ {
+ #ifdef USE_SECCOMP
+ 	/*
+ 	 * If the only character of the passed *newval string is '*'
+ 	 * then use the global allow list. Only applies to children
+ 	 * of the postmaster.
+ 	 */
+ 	if (strlen(*newval) == 1 && *newval[0] == '*')
+ 		return true;
+ 	else
+ 		return check_syscall_list(newval, extra, source);
+ #else
+ 	return true;
+ #endif
+ }
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 62dc93d..2216d49 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** PostmasterMain(int argc, char *argv[])
*** 963,968 ****
--- 963,982 ----
  	 */
  	LocalProcessControlFile(false);
  
+ #ifdef USE_SECCOMP
+ 	/*
+ 	 * If seccomp filtering is requested, load the global filter.
+ 	 * The list of allowed syscalls may be ratched down further
+ 	 * in specific backends based on the actual needs by backend type.
+ 	 */
+ 	if(!load_seccomp_filter("postmaster"))
+ 	{
+ 		ereport(FATAL,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("failed to load global seccomp filter")));
+ 	}
+ #endif
+ 
  	/*
  	 * Initialize SSL library, if specified.
  	 */
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index 5d4f26a..a9df9e4 100644
*** a/src/backend/utils/adt/genfile.c
--- b/src/backend/utils/adt/genfile.c
***************
*** 15,20 ****
--- 15,23 ----
   */
  #include "postgres.h"
  
+ #ifdef USE_SECCOMP
+ #include <seccomp.h>
+ #endif
  #include <sys/file.h>
  #include <sys/stat.h>
  #include <unistd.h>
***************
*** 28,39 ****
--- 31,46 ----
  #include "funcapi.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
+ #include "nodes/bitmapset.h"
  #include "postmaster/syslogger.h"
  #include "storage/fd.h"
  #include "utils/builtins.h"
+ #include "utils/guc.h"
+ #include "utils/hsearch.h"
  #include "utils/memutils.h"
  #include "utils/syscache.h"
  #include "utils/timestamp.h"
+ #include "utils/varlena.h"
  
  typedef struct
  {
*************** pg_ls_archive_statusdir(PG_FUNCTION_ARGS
*** 669,671 ****
--- 676,1113 ----
  {
  	return pg_ls_dir_files(fcinfo, XLOGDIR "/archive_status", true);
  }
+ 
+ #define NUM_SECCOMP_FILTER_ATTS		4
+ #define NUM_SECCOMP_RULES			400
+ 
+ #ifdef USE_SECCOMP
+ typedef struct seccomp_rule
+ {
+ 	int			syscallnum;		/* syscall number */
+ 	char	   *syscall;		/* syscall name string */
+ 	int			rule_action;	/* action level for this rule */
+ 	char	   *source;			/* filter source for this rule */
+ } seccomp_rule;
+ 
+ typedef struct seccompHashEntry
+ {
+         int					syscallnum;
+         seccomp_rule	   *scr_entry;
+ } seccompHashEntry;
+ 
+ extern const struct config_enum_entry seccomp_options[];
+ 
+ static void
+ init_hash_from_bitmap(Bitmapset *A, int raction, char *source,
+ 					  HTAB *seccompHash)
+ {
+ 	bool				found;
+ 	int					syscallnum;
+ 	char			   *cursyscall;
+ 	seccompHashEntry   *hentry;
+ 
+ 	syscallnum = -1;
+ 	while ((syscallnum = bms_next_member(A, syscallnum)) >= 0)
+ 	{
+ 		seccomp_rule   *scr = palloc(sizeof(seccomp_rule));
+ 
+ 		scr->syscallnum = syscallnum;
+ 
+ 		/*
+ 		 * Resolver returns NULL on error. Given how we got here that
+ 		 * should never happen. We must free() the result to avoid leakage.
+ 		 */
+ 		cursyscall =  seccomp_syscall_resolve_num_arch(seccomp_arch_native(),
+ 													   syscallnum);
+ 		if (cursyscall)
+ 		{
+ 			scr->syscall = pstrdup(cursyscall);
+ 			free(cursyscall);
+ 		}
+ 		scr->rule_action = raction;
+ 		scr->source = source;
+ 
+ 		hentry = (seccompHashEntry *) hash_search(seccompHash,
+ 												  (const void *) &syscallnum,
+ 												  HASH_ENTER, &found);
+ 
+ 		/* should not happen */
+ 		if (found)
+ 			elog(ERROR, "duplicate syscall entry found: source \"%s\"",
+ 						 source);
+ 
+ 		hentry->syscallnum = syscallnum;
+ 		hentry->scr_entry = scr;
+ 	}
+ }
+ 
+ static void
+ ovly_hash_from_bitmap(Bitmapset *A, int raction, char *gsource,
+ 					  int sdef, char *ssource, HTAB *seccompHash)
+ {
+ 	bool				found;
+ 	int					syscallnum;
+ 	char			   *cursyscall;
+ 	seccompHashEntry   *hentry;
+ 
+ 	syscallnum = -1;
+ 	while ((syscallnum = bms_next_member(A, syscallnum)) >= 0)
+ 	{
+ 		seccomp_rule   *scr;
+ 
+ 		hentry = (seccompHashEntry *) hash_search(seccompHash,
+ 												  (const void *) &syscallnum,
+ 												  HASH_ENTER, &found);
+ 
+ 		/*
+ 		 * If an entry does not exist, we can just add it. However,
+ 		 * the default action from the session still wins if it takes
+ 		 * precedence over that of the global rule.
+ 		 *
+ 		 * If an entry does exist, we must determine whether the new
+ 		 * rule precedence overrides the old one.
+ 		 */
+ 		if (!found)
+ 		{
+ 			scr = palloc(sizeof(seccomp_rule));
+ 			scr->syscallnum = syscallnum;
+ 
+ 			/*
+ 			 * Resolver returns NULL on error. Given how we got here that
+ 			 * should never happen. We must free() the result to avoid leakage.
+ 			 */
+ 			cursyscall =  seccomp_syscall_resolve_num_arch(seccomp_arch_native(),
+ 														   syscallnum);
+ 			if (cursyscall)
+ 			{
+ 				scr->syscall = pstrdup(cursyscall);
+ 				free(cursyscall);
+ 			}
+ 			if (raction > sdef)
+ 			{
+ 				scr->rule_action = raction;
+ 				scr->source = gsource;
+ 			}
+ 			else
+ 			{
+ 				scr->rule_action = sdef;
+ 				scr->source = ssource;
+ 			}
+ 
+ 			hentry->syscallnum = syscallnum;
+ 			hentry->scr_entry = scr;
+ 		}
+ 		else
+ 		{
+ 			/* determine if adjustment is necessary */
+ 			scr = hentry->scr_entry;
+ 			if (raction > scr->rule_action)
+ 			{
+ 				/* new rule takes precedence */
+ 				scr->rule_action = raction;
+ 				scr->source = gsource;
+ 			}
+ 		}
+ 	}
+ }
+ 
+ static void
+ ovly_hash_from_default(Bitmapset *A, int raction, char *source,
+ 					  HTAB *seccompHash)
+ {
+ 	bool				found;
+ 	int					syscallnum;
+ 	seccompHashEntry   *hentry;
+ 
+ 	syscallnum = -1;
+ 	while ((syscallnum = bms_next_member(A, syscallnum)) >= 0)
+ 	{
+ 		seccomp_rule   *scr;
+ 
+ 		hentry = (seccompHashEntry *) hash_search(seccompHash,
+ 												  (const void *) &syscallnum,
+ 												  HASH_ENTER, &found);
+ 
+ 		/*
+ 		 * If an entry does not already exist at this point, something
+ 		 * odd is amiss. Should not happen.
+ 		 */
+ 		if (!found)
+ 			elog(ERROR, "failed to find expected session filter syscall " \
+ 						"entry: syscall number %d", syscallnum);
+ 		else
+ 		{
+ 			/* determine if adjustment is necessary */
+ 			scr = hentry->scr_entry;
+ 			if (raction > scr->rule_action)
+ 			{
+ 				/* new rule takes precedence */
+ 				scr->rule_action = raction;
+ 				scr->source = source;
+ 			}
+ 		}
+ 	}
+ }
+ 
+ static const char *
+ get_seccomp_opt_str(int val)
+ {
+ 	const struct config_enum_entry *entry;
+ 
+ 	/* stringify the enforcement action levels */
+ 	for (entry = seccomp_options; entry->name; entry++)
+ 		if (entry->val == val)
+ 			return entry->name;
+ 
+ 	return "unknown";
+ }
+ 
+ static void
+ put_global_rules(Bitmapset *A, int raction, char *source,
+ 				 TupleDesc tupdesc, Tuplestorestate *tupstore)
+ {
+ 	int					syscallnum;
+ 
+ 	syscallnum = -1;
+ 	while ((syscallnum = bms_next_member(A, syscallnum)) >= 0)
+ 	{
+ 		Datum				values[NUM_SECCOMP_FILTER_ATTS];
+ 		bool				nulls[NUM_SECCOMP_FILTER_ATTS];
+ 		char			   *cursyscall;
+ 
+ 		memset(values, 0, sizeof(values));
+ 		memset(nulls, 0, sizeof(nulls));
+ 
+ 		/*
+ 		 * Resolver returns NULL on error. Given how we got here that
+ 		 * should never happen. We must free() the result to avoid leakage.
+ 		 */
+ 		cursyscall =  seccomp_syscall_resolve_num_arch(seccomp_arch_native(),
+ 													   syscallnum);
+ 		if (cursyscall)
+ 		{
+ 			char	   *buf;
+ 
+ 			values[0] = PointerGetDatum(cstring_to_text(cursyscall));
+ 			free(cursyscall);
+ 
+ 			values[1] = Int32GetDatum(syscallnum);
+ 
+ 			buf = psprintf("%s->%s", source, get_seccomp_opt_str(raction));
+ 			values[2] = PointerGetDatum(cstring_to_text(buf));
+ 
+ 			values[3] = PointerGetDatum(cstring_to_text("global"));
+ 
+ 			/* shove row into tuplestore */
+ 			tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ 		}
+ 	}
+ }
+ #endif /* USE_SECCOMP */
+ 
+ Datum
+ pg_get_seccomp_filter(PG_FUNCTION_ARGS)
+ {
+ #ifdef USE_SECCOMP
+ 	seccomp_filter	   *g = global_filter;
+ 	seccomp_filter	   *s = session_filter;
+ 	HASHCTL         	ctl;
+ 	HTAB			   *seccompHash = NULL;
+ 	seccompHashEntry   *hentry;
+ 	seccomp_rule	   *scr = palloc(sizeof(seccomp_rule));
+ 	HASH_SEQ_STATUS		status;
+ 	int					syscallnum;
+ 	char			   *gsource = "global";
+ 	char			   *ssource = "session";
+  	int					gdef = g->def;
+  	int					sdef = s->def;
+  	int					mdef = (gdef > sdef) ? gdef : sdef;
+  	char			   *msource = (gdef > sdef) ? gsource : ssource;
+ 	Bitmapset		   *gunion = NULL;
+ 	Bitmapset		   *sunion = NULL;
+ 	char			   *buf;
+ 	Datum				values[NUM_SECCOMP_FILTER_ATTS];
+ 	bool				nulls[NUM_SECCOMP_FILTER_ATTS];
+ #endif
+ 	ReturnSetInfo	   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc			tupdesc;
+ 	Tuplestorestate	   *tupstore;
+ 	MemoryContext		per_query_ctx;
+ 	MemoryContext		oldcontext;
+ 
+ 	/* Check to see if caller supports us returning a tuplestore */
+ 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("set-valued function called in context that cannot accept a set")));
+ 	if (!(rsinfo->allowedModes & SFRM_Materialize))
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 				 errmsg("materialize mode required, but it is not " \
+ 						"allowed in this context")));
+ 
+ 	/* Switch into long-lived context to construct returned data structures */
+ 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
+ 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+ 
+ 	/* Build a tuple descriptor for our result type */
+ 	/* need a tuple descriptor representing three TEXT columns */
+ 	tupdesc = CreateTemplateTupleDesc(NUM_SECCOMP_FILTER_ATTS);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "syscall",
+ 					   TEXTOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "syscallnum",
+ 					   INT4OID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "filter_action",
+ 					   TEXTOID, -1, 0);
+ 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "context",
+ 					   TEXTOID, -1, 0);
+ 
+ 	/* Build a tuplestore to return our results in */
+ 	tupstore = tuplestore_begin_heap(true, false, work_mem);
+ 	rsinfo->returnMode = SFRM_Materialize;
+ 	rsinfo->setResult = tupstore;
+ 	rsinfo->setDesc = tupdesc;
+ 
+ #ifdef USE_SECCOMP
+ 
+ 	/*
+ 	 * We need to iterate through 4 bitmap sets each, across two filters
+ 	 * (global and session), applying the below logic, in order to
+ 	 * determine which action applies to what syscall. The most
+ 	 * straighforward way to do that seems to be to build a hash
+ 	 * table since the two filter sets may overlap, and the syscall
+ 	 * numbers may vary with architecture.
+ 	 *
+ 	 * The aforementioned logic is:
+ 	 * 1. The most recently installed filter is evaluated first (session)
+ 	 * 2. For a given filter, each syscall action is either the action
+ 	 *    value given in a syscall-specific rule, or the default action. 
+ 	 * 3. For any given syscall, the "first-seen action value of highest
+ 	 *    precedence" is applied. The precedence in order of high-to-low
+ 	 *    is: kill, error, log, allow.
+ 	 *
+ 	 * There are four combinations of the possible sets of rules to
+ 	 * consider:
+ 	 * g - global (postmaster)
+ 	 * s - session (backend)
+ 	 *
+ 	 * C1. Intersection of g + s
+ 	 * C2. In g, not in s
+ 	 * C3. In s, not in g
+ 	 * C4. Not in g or s
+ 	 *
+ 	 * C1 and C2 are handled by init_hash_from_bitmap()
+ 	 * and ovly_hash_from_bitmap(). C3 is handled by
+ 	 * ovly_hash_from_default(). C4 is covered by the final
+ 	 * "<default>" entry in the hash table.
+ 	 */
+ 	memset(&ctl, 0, sizeof(ctl));
+ 	ctl.keysize = sizeof(int);
+ 	ctl.entrysize = sizeof(seccompHashEntry);
+ 	seccompHash = hash_create("syscall rules", NUM_SECCOMP_RULES,
+ 							  &ctl, HASH_ELEM | HASH_BLOBS);
+ 
+ 	/*
+ 	 * Build up the hash table initially from the session filter.
+ 	 * We ensured no overlap of syscalls within a given filter in
+ 	 * load_seccomp_filter(), so it should be safe to just add
+ 	 * all the syscall numbers found in the 4 bitmap sets.
+ 	 */
+ 	init_hash_from_bitmap(s->kill, PG_SECCOMP_KILL, ssource, seccompHash);
+ 	init_hash_from_bitmap(s->error, PG_SECCOMP_ERROR, ssource, seccompHash);
+ 	init_hash_from_bitmap(s->log, PG_SECCOMP_LOG, ssource, seccompHash);
+ 	init_hash_from_bitmap(s->allow, PG_SECCOMP_ALLOW, ssource, seccompHash);
+ 
+ 	/*
+ 	 * Now overlay the global filter. Again we ensured no overlap
+ 	 * of syscalls within this filter in load_seccomp_filter(),
+ 	 * so it should be safe to just overlay all the syscall numbers
+ 	 * found in the 4 global bitmap sets.
+ 	 */
+ 	ovly_hash_from_bitmap(g->kill, PG_SECCOMP_KILL, gsource,
+ 						  sdef, ssource, seccompHash);
+ 	ovly_hash_from_bitmap(g->error, PG_SECCOMP_ERROR, gsource,
+ 						  sdef, ssource, seccompHash);
+ 	ovly_hash_from_bitmap(g->log, PG_SECCOMP_LOG, gsource,
+ 						  sdef, ssource, seccompHash);
+ 	ovly_hash_from_bitmap(g->allow, PG_SECCOMP_ALLOW, gsource,
+ 						  sdef, ssource, seccompHash);
+ 
+ 	/*
+ 	 * If rules from the session filter are not also explicitly
+ 	 * in the global filter, they must be compared against, and
+ 	 * possibly adjusted to, the global default action.
+ 	 */
+ 	gunion = bms_union(bms_union(bms_union(g->kill, g->error), g->log),
+ 					   g->allow);
+ 	sunion = bms_union(bms_union(bms_union(s->kill, s->error), s->log),
+ 					   s->allow);
+ 	ovly_hash_from_default(bms_difference(sunion, gunion),
+ 						   gdef, gsource, seccompHash);
+ 
+ 	/* create entry for the session default rule */
+ 	scr->syscallnum = -1;
+ 	scr->syscall = "<default>";
+ 	scr->rule_action = mdef;
+ 	scr->source = msource;
+ 	hentry = (seccompHashEntry *) hash_search(seccompHash,
+ 											  (const void *) &syscallnum,
+ 											  HASH_ENTER, NULL);
+ 	hentry->syscallnum = syscallnum;
+ 	hentry->scr_entry = scr;
+ 
+ 	/* Process the "session" results and fill the tuplestore */
+ 	hash_seq_init(&status, seccompHash);
+ 
+ 	while ((hentry = (seccompHashEntry *) hash_seq_search(&status)) != NULL)
+ 	{
+ 		char	   *buf;
+ 
+ 		memset(values, 0, sizeof(values));
+ 		memset(nulls, 0, sizeof(nulls));
+ 
+ 		scr = hentry->scr_entry;
+ 		buf = psprintf("%s->%s", scr->source,
+ 								 get_seccomp_opt_str(scr->rule_action));
+ 
+ 		values[0] = PointerGetDatum(cstring_to_text(scr->syscall));
+ 		values[1] = Int32GetDatum(scr->syscallnum);
+ 		values[2] = PointerGetDatum(cstring_to_text(buf));
+ 		values[3] = PointerGetDatum(cstring_to_text("session"));
+ 
+ 		/* shove row into tuplestore */
+ 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ 	}
+ 
+ 	/*
+ 	 * Add rows for the "global" context. This is far simpler, since
+ 	 * we can simply iterate through the global bitmaps and do not
+ 	 * need to take care for rule precedence, etc., due to there
+ 	 * only being one filter (that we know about in any case).
+ 	 */
+ 	put_global_rules(g->kill, PG_SECCOMP_KILL, gsource, tupdesc, tupstore);
+ 	put_global_rules(g->error, PG_SECCOMP_ERROR, gsource, tupdesc, tupstore);
+ 	put_global_rules(g->log, PG_SECCOMP_LOG, gsource, tupdesc, tupstore);
+ 	put_global_rules(g->allow, PG_SECCOMP_ALLOW, gsource, tupdesc, tupstore);
+ 
+ 	/* create entry for the global default rule */
+ 	values[0] = PointerGetDatum(cstring_to_text("<default>"));
+ 	values[1] = Int32GetDatum(-1);
+ 
+ 	buf = psprintf("%s->%s", gsource, get_seccomp_opt_str(gdef));
+ 	values[2] = PointerGetDatum(cstring_to_text(buf));
+ 
+ 	values[3] = PointerGetDatum(cstring_to_text("global"));
+ 
+ 	/* shove row into tuplestore */
+ 	tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+ 
+ #endif	/* USE_SECCOMP */
+ 
+ 	tuplestore_donestoring(tupstore);
+ 
+ 	/* Reset context */
+ 	MemoryContextSwitchTo(oldcontext);
+ 
+ 	return (Datum) 0;
+ }
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 83c9514..1bf2e41 100644
*** a/src/backend/utils/init/miscinit.c
--- b/src/backend/utils/init/miscinit.c
***************
*** 29,34 ****
--- 29,38 ----
  #ifdef HAVE_UTIME_H
  #include <utime.h>
  #endif
+ #ifdef USE_SECCOMP
+ #include <seccomp.h>
+ #include <sys/prctl.h>
+ #endif
  
  #include "access/htup_details.h"
  #include "catalog/pg_authid.h"
***************
*** 36,41 ****
--- 40,46 ----
  #include "libpq/libpq.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
+ #include "nodes/bitmapset.h"
  #include "pgstat.h"
  #include "postmaster/autovacuum.h"
  #include "postmaster/postmaster.h"
*************** pg_bindtextdomain(const char *domain)
*** 1617,1619 ****
--- 1622,2010 ----
  	}
  #endif
  }
+ 
+ /*-------------------------------------------------------------------------
+  *				seccomp filtering support
+  *-------------------------------------------------------------------------
+  */
+ 
+ /*
+  * GUC variables: lists of syscall names to be filtered at postmaster
+  * start and at backend start
+  */
+ 
+ const struct config_enum_entry seccomp_options[] = {
+ 	{"allow", PG_SECCOMP_ALLOW, false},
+ 	{"log", PG_SECCOMP_LOG, false},
+ 	{"error", PG_SECCOMP_ERROR, false},
+ 	{"kill", PG_SECCOMP_KILL, false},
+ 	{NULL, 0}
+ };
+ 
+ seccomp_filter *global_filter = NULL;
+ seccomp_filter *session_filter = NULL;
+ bool	seccomp_enabled = false;
+ int		global_syscall_default = PG_SECCOMP_ALLOW;
+ char   *global_syscall_allow_string = NULL;
+ char   *global_syscall_log_string = NULL;
+ char   *global_syscall_error_string = NULL;
+ char   *global_syscall_kill_string = NULL;
+ int		session_syscall_default = PG_SECCOMP_ALLOW;
+ char   *session_syscall_allow_string = NULL;
+ char   *session_syscall_log_string = NULL;
+ char   *session_syscall_error_string = NULL;
+ char   *session_syscall_kill_string = NULL;
+ 
+ #ifdef USE_SECCOMP
+ static bool apply_seccomp_list(scmp_filter_ctx	*ctx, const char *slist,
+ 							   uint32_t rule_action, uint32_t def_action,
+ 							   seccomp_filter *current_filter);
+ static const char *expand_seccomp_list(const char *slist, const char *glist,
+ 									   const char *saction);
+ static void set_filter_def_action(int default_action,
+ 								  seccomp_filter *current_filter,
+ 								  char *context);
+ #endif
+ 
+ /*
+  * Create and load seccomp filter for the requested context.
+  *
+  * Return false on error and let the caller decide what to do
+  * rather than throwing an ERROR (or FATAL) here.
+  */
+ bool
+ load_seccomp_filter(char *context)
+ {
+ #ifdef USE_SECCOMP
+ 	const char	   *allow_list = NULL;
+ 	const char	   *log_list = NULL;
+ 	const char	   *error_list = NULL;
+ 	const char	   *kill_list = NULL;
+ 	int				default_action;
+ 	uint32_t		def_action;
+ 	scmp_filter_ctx	ctx = NULL;
+ 	int				rc;
+ 	bool			result = true;
+ 	MemoryContext	oldcontext;
+ 	seccomp_filter *current_filter = NULL;
+ 
+ 	/* should not happen */
+ 	if (context == NULL)
+ 	{
+ 		ereport(WARNING, (errmsg("invalid seccomp context")));
+ 		return false;
+ 	}
+ 
+ 	/* if seccomp is disabled just return with success */
+ 	if (!seccomp_enabled)
+ 	{
+ 		ereport(LOG, (errmsg("seccomp disabled")));
+ 		return true;
+ 	}
+ 
+ 	/*
+ 	 * If the only character of the passed syscall_list is '*'
+ 	 * then use the global allow list. Only applies to children
+ 	 * of the postmaster.
+ 	 */
+ 	if (strcmp(context, "postmaster") != 0)
+ 	{
+ 		/* in a backend session */
+ 		/* we are going to need this later */
+ 		oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ 		session_filter = palloc0(sizeof(seccomp_filter));
+ 		session_filter->source = pstrdup("session");
+ 		MemoryContextSwitchTo(oldcontext);
+ 		current_filter = session_filter;
+ 
+ 		allow_list = expand_seccomp_list(session_syscall_allow_string,
+ 										 global_syscall_allow_string,
+ 										 "allow");
+ 		log_list = expand_seccomp_list(session_syscall_log_string,
+ 										 global_syscall_log_string,
+ 										 "log");
+ 		error_list = expand_seccomp_list(session_syscall_error_string,
+ 										 global_syscall_error_string,
+ 										 "error");
+ 		kill_list = expand_seccomp_list(session_syscall_kill_string,
+ 										 global_syscall_kill_string,
+ 										 "kill");
+ 
+ 		default_action = session_syscall_default;
+ 		/*
+ 		 * Fastpath: if the lists were all defaulted to their
+ 		 * respective global list, and the session value of
+ 		 * default_action is also the same as the global setting,
+ 		 * just exit with success immediately. This avoids creating
+ 		 * another identical seccomp bpf filter which will just
+ 		 * slow everything down for no particular reason.
+ 		 */
+ 		if (default_action == global_syscall_default &&
+ 				allow_list == global_syscall_allow_string &&
+ 				log_list == global_syscall_log_string &&
+ 				error_list == global_syscall_error_string &&
+ 				kill_list == global_syscall_kill_string)
+ 			return true;
+ 	}
+ 	else
+ 	{
+ 		/* in the postmaster */
+ 		/* we are going to need this later */
+ 		oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ 		global_filter = palloc0(sizeof(seccomp_filter));
+ 		global_filter->source = pstrdup("global");
+ 		MemoryContextSwitchTo(oldcontext);
+ 		current_filter = global_filter;
+ 
+ 		allow_list = global_syscall_allow_string;
+ 		log_list = global_syscall_log_string;
+ 		error_list = global_syscall_error_string;
+ 		kill_list = global_syscall_kill_string;
+ 		default_action = global_syscall_default;
+ 	}
+ 
+ 	/* Disable ptrace bybass */
+ 	rc = prctl(PR_SET_DUMPABLE, 0, 0, 0, 0);
+ 	if (rc < 0)
+ 	{
+ 		ereport(WARNING,
+ 				(ERRCODE_SYSTEM_ERROR,
+ 				 errmsg("seccomp could not set dumpable: %m")));
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ 	/* set the seccomp default action */
+ 	if (default_action == PG_SECCOMP_ERROR)
+ 		def_action = SCMP_ACT_ERRNO(EACCES);
+ 	else if (default_action == PG_SECCOMP_KILL)
+ 		def_action = SCMP_ACT_KILL;
+ 	else if (default_action == PG_SECCOMP_LOG)
+ 		def_action = SCMP_ACT_LOG;
+ 	else if (default_action == PG_SECCOMP_ALLOW)
+ 		def_action = SCMP_ACT_ALLOW;
+ 	else
+ 	{
+ 		/* unknown enforce action type */
+ 		ereport(WARNING,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("seccomp default action action unknown")));
+ 		result = false;
+ 		goto out;
+ 	}
+ 	/* preserve and log the setting */
+ 	set_filter_def_action(default_action, current_filter, context);
+ 
+ 	/* Initialize seccomp with default action */
+ 	ctx = seccomp_init(def_action);
+ 	if (ctx == NULL)
+ 	{
+ 		ereport(WARNING, (errcode(ERRCODE_OUT_OF_MEMORY),
+ 						  errmsg("out of memory")));
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ 	/*
+ 	 * By default, libseccomp will set up audit logging
+ 	 * such that actions KILL and LOG will get audit records,
+ 	 * however ERRNO will not. Arrange to have all not-allowed
+ 	 * syscalls logged instead.
+ 	 */
+ 	rc = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_LOG, 1);
+ 	if (rc != 0)
+ 	{
+ 		ereport(WARNING,
+ 				(errcode(ERRCODE_SYSTEM_ERROR),
+ 				 errmsg("seccomp failed to set audit actions")));
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ 	if (!
+ 		 (apply_seccomp_list(&ctx, allow_list, SCMP_ACT_ALLOW,
+ 							 def_action, current_filter) &&
+ 		  apply_seccomp_list(&ctx, log_list, SCMP_ACT_LOG,
+ 							 def_action, current_filter) &&
+ 		  apply_seccomp_list(&ctx, error_list, SCMP_ACT_ERRNO(EACCES),
+ 							 def_action, current_filter) &&
+ 		  apply_seccomp_list(&ctx, kill_list, SCMP_ACT_KILL,
+ 							 def_action, current_filter)))
+ 	{
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ 	/*
+ 	 * Although libseccomp will silently throw away repeated filter
+ 	 * rules against the same syscall (unless arguments are checked,
+ 	 * which we are not supporting here), it can lead to confusing
+ 	 * results, so disallow that here.
+ 	 */
+ 	if (bms_overlap(current_filter->allow, current_filter->log) ||
+ 		bms_overlap(current_filter->error, current_filter->kill) ||
+ 		bms_overlap(bms_union(current_filter->allow, current_filter->log),
+ 					bms_union(current_filter->error, current_filter->kill)))
+ 	{
+ 		ereport(WARNING,
+ 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 				 errmsg("seccomp failed due to overlapping rule sets")));
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ 	/* Finally, actually load the filter */
+ 	rc = seccomp_load(ctx);
+ 	if (rc != 0)
+ 	{
+ 		ereport(WARNING,
+ 				(errcode(ERRCODE_SYSTEM_ERROR),
+ 				 errmsg("seccomp failed to load rule set")));
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ out:
+ 	/* safe to release if NULL/NIL */
+ 	seccomp_release(ctx);
+ 
+ 	return result;
+ #else
+ 	return false;
+ #endif
+ }
+ 
+ #ifdef USE_SECCOMP
+ static bool
+ apply_seccomp_list(scmp_filter_ctx	*ctx, const char *slist,
+ 				   uint32_t rule_action, uint32_t def_action,
+ 				   seccomp_filter *current_filter)
+ {
+ 	char		   *rawstring = NULL;
+ 	List		   *elemlist = NIL;
+ 	ListCell	   *l;
+ 	bool			result = true;
+ 	MemoryContext	oldcontext;
+ 
+ 	/* 
+ 	 * libseccomp disallows the case where individual syscall rules
+ 	 * are created with the same as the default action. Therefore,
+ 	 * be careful not to add those rules to the filter we are creating.
+ 	 */
+ 	if (rule_action == def_action)
+ 		return true;
+ 
+ 	/* Need a modifiable copy */
+ 	rawstring = pstrdup(slist);
+ 
+ 	/* Parse string into list of syscalls */
+ 	if (!SplitIdentifierString(rawstring, ',', &elemlist))
+ 	{
+ 		result = false;
+ 		goto out;
+ 	}
+ 
+ 	/* add syscall specific rules to the filter */
+ 	foreach(l, elemlist)
+ 	{
+ 		char   *cursyscall = (char *) lfirst(l);
+ 		int		syscallnum;
+ 		int		rc;
+ 
+ 		/*
+ 		 * Resolve the syscall name to its number on the current arch.
+ 		 * This should have already been validated by the GUC
+ 		 * check function.
+ 		 */
+ 		syscallnum = seccomp_syscall_resolve_name(cursyscall);
+ 		if (syscallnum < 0)
+ 		{
+ 			/* should not happen */
+ 			ereport(WARNING,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("seccomp failed to resolve: syscall \"%s\"",
+ 							cursyscall)));
+ 			result = false;
+ 			goto out;
+ 		}
+ 		else
+ 		{
+ 			rc = seccomp_rule_add(*ctx, rule_action, syscallnum, 0);
+ 			if (rc != 0)
+ 			{
+ 				/* should not be reachable */
+ 				ereport(WARNING,
+ 						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 						 errmsg("seccomp failed to add rule: syscall \"%s\", %d",
+ 								 cursyscall, syscallnum)));
+ 				result = false;
+ 				goto out;
+ 			}
+ 			oldcontext = MemoryContextSwitchTo(TopMemoryContext);
+ 
+ 			if (rule_action == SCMP_ACT_ALLOW)
+ 				current_filter->allow = bms_add_member(current_filter->allow,
+ 													   syscallnum);
+ 			else if (rule_action == SCMP_ACT_LOG)
+ 				current_filter->log = bms_add_member(current_filter->log,
+ 													   syscallnum);
+ 			else if (rule_action == SCMP_ACT_ERRNO(EACCES))
+ 				current_filter->error = bms_add_member(current_filter->error,
+ 													   syscallnum);
+ 			else if (rule_action == SCMP_ACT_KILL)
+ 				current_filter->kill = bms_add_member(current_filter->kill,
+ 													   syscallnum);
+ 
+ 			MemoryContextSwitchTo(oldcontext);
+ 		}
+ 	}
+ 
+ out:
+ 	/* safe to release if still NIL */
+ 	list_free(elemlist);
+ 
+ 	/* but pfree is not */
+ 	if (rawstring)
+ 		pfree(rawstring);
+ 
+ 	return result;
+ }
+ 
+ static const char*
+ expand_seccomp_list(const char *slist, const char *glist,
+ 					const char *saction)
+ {
+ 	
+ 	if (slist && strlen(slist) == 1 && slist[0] == '*')
+ 	{
+ 		/* use the global list as promised */
+ 		ereport(LOG,
+ 				(errmsg("seccomp \"%s\" list inherited from postmaster", saction)));
+ 
+ 		return glist;
+ 	}
+ 	else
+ 		return slist;
+ }
+ 
+ static void
+ set_filter_def_action(int default_action, seccomp_filter *current_filter,
+ 					  char *context)
+ {
+ 	const struct config_enum_entry *entry;
+ 
+ 	current_filter->def = default_action;
+ 	/* stringify the enforcement action levels */
+ 	for (entry = seccomp_options; entry->name; entry++)
+ 	{
+ 		if (entry->val == default_action)
+ 		{
+ 			current_filter->def_str = entry->name;
+ 			break;
+ 		}
+ 	}
+ 	ereport(LOG,
+ 			(errmsg("seccomp default action set to \"%s\": context \"%s\"",
+ 					current_filter->def_str, context)));
+ }
+ #endif /* USE_SECCOMP */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 43b9f17..aac1940 100644
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*************** InitPostgres(const char *in_dbname, Oid
*** 1056,1061 ****
--- 1056,1076 ----
  	/* Process pg_db_role_setting options */
  	process_settings(MyDatabaseId, GetSessionUserId());
  
+ #ifdef USE_SECCOMP
+ 	/* If seccomp filtering is requested, do the backend lockdown */
+ 	if (!bootstrap &&
+ 		!IsAutoVacuumWorkerProcess() &&
+ 		 IsUnderPostmaster)
+ 	{
+ 		if(!load_seccomp_filter("session"))
+ 		{
+ 			ereport(FATAL,
+ 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 					 errmsg("failed to load session seccomp filter")));
+ 		}
+ 	}
+ #endif
+ 
  	/* Apply PostAuthDelay as soon as we've read all options */
  	if (PostAuthDelay > 0)
  		pg_usleep(PostAuthDelay * 1000000L);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 90ffd89..8b548f1 100644
*** a/src/backend/utils/misc/guc.c
--- b/src/backend/utils/misc/guc.c
*************** extern const struct config_enum_entry ar
*** 478,483 ****
--- 478,484 ----
  extern const struct config_enum_entry recovery_target_action_options[];
  extern const struct config_enum_entry sync_method_options[];
  extern const struct config_enum_entry dynamic_shared_memory_options[];
+ extern const struct config_enum_entry seccomp_options[];
  
  /*
   * GUC option variables that are exported from this module
*************** static struct config_bool ConfigureNames
*** 1952,1957 ****
--- 1953,1968 ----
  		NULL, NULL, NULL
  	},
  
+ 	{
+ 		{"seccomp", PGC_POSTMASTER, RESOURCES_KERNEL,
+ 			gettext_noop("Turns on seccomp syscall enforcement."),
+ 			NULL
+ 		},
+ 		&seccomp_enabled,
+ 		false,
+ 		NULL, NULL, NULL
+ 	},
+ 
  	/* End-of-list marker */
  	{
  		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
*************** static struct config_string ConfigureNam
*** 4199,4204 ****
--- 4210,4303 ----
  		NULL, NULL, NULL
  	},
  
+ 	{
+ 		{"global_syscall_allow", PGC_POSTMASTER, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp global syscall allow list."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&global_syscall_allow_string,
+ 		"",
+ 		check_global_syscall_list, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"global_syscall_log", PGC_POSTMASTER, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp global syscall log list."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&global_syscall_log_string,
+ 		"",
+ 		check_global_syscall_list, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"global_syscall_error", PGC_POSTMASTER, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp global syscall error list."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&global_syscall_error_string,
+ 		"",
+ 		check_global_syscall_list, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"global_syscall_kill", PGC_POSTMASTER, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp global syscall kill list."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&global_syscall_kill_string,
+ 		"",
+ 		check_global_syscall_list, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"session_syscall_allow", PGC_SUSET, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp backend session syscall allow list."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&session_syscall_allow_string,
+ 		"*",
+ 		check_session_syscall_list, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"session_syscall_log", PGC_SUSET, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp backend session syscall log list."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&session_syscall_log_string,
+ 		"*",
+ 		check_session_syscall_list, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"session_syscall_error", PGC_SUSET, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp backend session syscall error list."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&session_syscall_error_string,
+ 		"*",
+ 		check_session_syscall_list, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"session_syscall_kill", PGC_SUSET, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp backend session syscall allow kill."),
+ 			NULL,
+ 			GUC_LIST_INPUT | GUC_SUPERUSER_ONLY
+ 		},
+ 		&session_syscall_kill_string,
+ 		"*",
+ 		check_session_syscall_list, NULL, NULL
+ 	},
+ 
  	/* End-of-list marker */
  	{
  		{NULL, 0, 0, NULL, NULL}, NULL, NULL, NULL, NULL, NULL
*************** static struct config_enum ConfigureNames
*** 4537,4542 ****
--- 4636,4661 ----
  		NULL, NULL, NULL
  	},
  
+ 	{
+ 		{"global_syscall_default", PGC_POSTMASTER, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp global syscall default action."),
+ 			NULL
+ 		},
+ 		&global_syscall_default,
+ 		PG_SECCOMP_ALLOW, seccomp_options,
+ 		NULL, NULL, NULL
+ 	},
+ 
+ 	{
+ 		{"session_syscall_default", PGC_SUSET, RESOURCES_KERNEL,
+ 			gettext_noop("Seccomp beckend session syscall default action."),
+ 			NULL
+ 		},
+ 		&session_syscall_default,
+ 		PG_SECCOMP_ALLOW, seccomp_options,
+ 		NULL, NULL, NULL
+ 	},
+ 
  	/* End-of-list marker */
  	{
  		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 39fc787..82187c1 100644
*** a/src/backend/utils/misc/postgresql.conf.sample
--- b/src/backend/utils/misc/postgresql.conf.sample
***************
*** 154,159 ****
--- 154,177 ----
  
  #max_files_per_process = 1000		# min 25
  					# (change requires restart)
+ #seccomp = off				# use seccomp
+ 					# (change requires restart)
+ 
+ #global_syscall_default = allow		# postmaster default syscall action:
+ 					# allow, log, error, kill
+ #global_syscall_allow = ''			# postmaster syscall allow list
+ #global_syscall_log = ''			# postmaster syscall log list
+ #global_syscall_error = ''			# postmaster syscall error list
+ #global_syscall_kill = ''			# postmaster syscall kill list
+ 					# (global_syscall* change requires restart)
+ 
+ #session_syscall_default = allow	# session default syscall action:
+ 					# allow, log, error, kill
+ #session_syscall_allow = '*'		# backend session syscall allow list
+ #session_syscall_log = '*'			# backend session syscall log list
+ #session_syscall_error = '*'		# backend session syscall error list
+ #session_syscall_kill = '*'		# backend session syscall kill list
+ 					# session_syscall* default '*' = use global list
  
  # - Cost-Based Vacuum Delay -
  
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cf1f409..a32522e 100644
*** a/src/include/catalog/pg_proc.dat
--- b/src/include/catalog/pg_proc.dat
***************
*** 10678,10683 ****
--- 10678,10690 ----
    proallargtypes => '{oid,text,int8,timestamptz}', proargmodes => '{i,o,o,o}',
    proargnames => '{tablespace,name,size,modification}',
    prosrc => 'pg_ls_tmpdir_1arg' },
+ { oid => '8657', descr => 'get current effective seccomp filter actions',
+   proname => 'pg_get_seccomp_filter', prorows => '100', proretset => 't',
+   provolatile => 's', proparallel => 'r', prorettype => 'record',
+   proargtypes => '', proallargtypes => '{text,int4,text,text}',
+   proargmodes => '{o,o,o,o}',
+   proargnames => '{syscall,syscallnum,filter_action,context}',
+   prosrc => 'pg_get_seccomp_filter' },
  
  # hash partitioning constraint function
  { oid => '5028', descr => 'hash partition CHECK constraint',
diff --git a/src/include/commands/variable.h b/src/include/commands/variable.h
index 5f43414..58cf427 100644
*** a/src/include/commands/variable.h
--- b/src/include/commands/variable.h
*************** extern void assign_session_authorization
*** 34,38 ****
--- 34,40 ----
  extern bool check_role(char **newval, void **extra, GucSource source);
  extern void assign_role(const char *newval, void *extra);
  extern const char *show_role(void);
+ extern bool check_global_syscall_list(char **newval, void **extra, GucSource source);
+ extern bool check_session_syscall_list(char **newval, void **extra, GucSource source);
  
  #endif							/* VARIABLE_H */
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index bc6e03f..1e6745f 100644
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
***************
*** 26,31 ****
--- 26,32 ----
  #include <signal.h>
  
  #include "datatype/timestamp.h" /* for TimestampTz */
+ #include "nodes/bitmapset.h"	/* for seccomp */
  #include "pgtime.h"				/* for pg_time_t */
  
  
*************** extern void ChangeToDataDir(void);
*** 333,338 ****
--- 334,341 ----
  extern void SwitchToSharedLatch(void);
  extern void SwitchBackToLocalLatch(void);
  
+ extern bool load_seccomp_filter(char *context);
+ 
  /* in utils/misc/superuser.c */
  extern bool superuser(void);	/* current user is superuser */
  extern bool superuser_arg(Oid roleid);	/* given user is superuser */
*************** extern void process_session_preload_libr
*** 447,452 ****
--- 450,485 ----
  extern void pg_bindtextdomain(const char *domain);
  extern bool has_rolreplication(Oid roleid);
  
+ typedef struct seccomp_filter
+ {
+ 	char		   *source;
+ 	int				def;
+ 	const char	   *def_str;
+ 	Bitmapset	   *allow;
+ 	Bitmapset	   *log;
+ 	Bitmapset	   *error;
+ 	Bitmapset	   *kill;
+ } seccomp_filter;
+ extern seccomp_filter *global_filter;
+ extern seccomp_filter *session_filter;
+ extern bool seccomp_enabled;
+ extern int global_syscall_default;
+ extern int session_syscall_default;
+ extern char *global_syscall_allow_string;
+ extern char *global_syscall_log_string;
+ extern char *global_syscall_error_string;
+ extern char *global_syscall_kill_string;
+ extern char *session_syscall_allow_string;
+ extern char *session_syscall_log_string;
+ extern char *session_syscall_error_string;
+ extern char *session_syscall_kill_string;
+ /* seccomp enforce actions in increasing order of precedence */
+ #define PG_SECCOMP_ALLOW    0  /* allow */
+ #define PG_SECCOMP_LOG      1  /* log */
+ #define PG_SECCOMP_ERROR    2  /* permission denied error */
+ #define PG_SECCOMP_KILL     3  /* kill process */
+ 
+ 
  /* in access/transam/xlog.c */
  extern bool BackupInProgress(void);
  extern void CancelBackup(void);
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index d876926..dc7fdaf 100644
*** a/src/include/pg_config.h.in
--- b/src/include/pg_config.h.in
***************
*** 353,358 ****
--- 353,361 ----
  /* Define if you have a function readline library */
  #undef HAVE_LIBREADLINE
  
+ /* Define to 1 if you have the `seccomp' library (-lseccomp). */
+ #undef HAVE_LIBSECCOMP
+ 
  /* Define to 1 if you have the `selinux' library (-lselinux). */
  #undef HAVE_LIBSELINUX
  
***************
*** 935,940 ****
--- 938,946 ----
  /* Define to 1 to build with PAM support. (--with-pam) */
  #undef USE_PAM
  
+ /* Define to 1 to build with seccomp support. (--with-seccomp) */
+ #undef USE_SECCOMP
+ 
  /* Define to 1 to use software CRC-32C implementation (slicing-by-8). */
  #undef USE_SLICING_BY_8_CRC32C
  

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to