Hi, Here is an idea of how to read masking options from a file. Please, take a look.
пн, 10 окт. 2022 г. в 14:54, Олег Целебровский <oleg_tselebrovs...@mail.ru>: > Hi, > > I applied most of suggestions: used separate files for most of added code, > fixed typos/mistakes, got rid of that pesky TODO that was already > implemented, just not deleted. > > Added tests (and functionality) for cases when you need to mask columns in > tables with the same name in different schemas. If schema is not specified, > then columns in all tables with specified name are masked (Example - > pg_dump -t ‘t0’ --mask-columns id --mask-function mask_int will mask all > ids in all tables with names ‘t0’ in all existing schemas). > > Wrote comments for all ‘magic numbers’ > > About that > > >- Also it can be hard to use a lot of different functions for different > fields, maybe it would be better to set up functions in a file. > > I agree with that, but I know about at least 2 other patches (both are > WIP, but still) that are interacting with reading command-line options from > file. And if everyone will write their own version of reading command-line > options from file, it will quickly get confusing. > > A solution to that problem is another patch that will put all options from > file (one file for any possible options, from existing to future ones) into > **argv in main, so that pg_dump can process them as if they came form > command line. > > > Пятница, 7 октября 2022, 8:01 +07:00 от Виктория Шепард < > we.vikt...@gmail.com>: > > Hi, > I took a look, here are several suggestions for improvement: > > - Masking is not a main functionality of pg_dump and it is better to write > most of the connected things in a separate file like parallel.c or > dumputils.c. This will help slow down the growth of an already huge pg_dump > file. > > - Also it can be hard to use a lot of different functions for different > fields, maybe it would be better to set up functions in a file. > > - How will it work for the same field and tables in the different schemas? > Can we set up the exact schema for the field? > > - misspelling in a word > >/* > >* Add all columns and funcions to list of MaskColumnInfo structures, > >*/ > > - Why did you use 256 here? > > char* table = (char*) pg_malloc(256 * sizeof(char)); > Also for malloc you need malloc on 1 symbol more because you have to store > '\0' symbol. > > - Instead of addFuncToDatabase you can run your query using something > already defined from fe_utils/query_utils.c. And It will be better to set > up a connection only once and create all functions. Establishing a > connection is a resource-intensive procedure. There are a lot of magic > numbers, better to leave some comments explaining why there are 64 or 512. > > - It seems that you are not using temp_string > > char *temp_string = (char*)malloc(256 * sizeof(char)); > > - Grammar issues > >/* > >* mask_column_info_list contains info about every to-be-masked column: > >* its name, a name its table (if nothing is specified - mask all columns > with this name), > >* name of masking function and name of schema containing this function > (public if not specified) > >*/ > the name of its table > > > пн, 3 окт. 2022 г. в 20:45, Julien Rouhaud <rjuju...@gmail.com > <//e.mail.ru/compose/?mailto=mailto%3arjuju...@gmail.com>>: > > Hi, > > On Mon, Oct 03, 2022 at 06:30:17PM +0300, Олег Целебровский wrote: > > > > Hello, here's my take on masking data when using pg_dump > > > > The main idea is using PostgreSQL functions to replace data during a > SELECT. > > When table data is dumped SELECT a,b,c,d ... from ... query is > generated, the columns that are marked for masking are replaced with result > of functions on those columns > > Example: columns name, count are to be masked, so the query will look as > such: SELECT id, mask_text(name), mask_int(count), date from ... > > > > So about the interface: I added 2 more command-line options: > > > > --mask-columns, which specifies what columns from what tables will be > masked > > usage example: > > --mask-columns "t1.name, t2.description" - both columns > will be masked with the same corresponding function > > or --mask-columns name - ALL columns with name "name" from > all dumped tables will be masked with correspoding function > > > > --mask-function, which specifies what functions will mask data > > usage example: > > --mask-function mask_int - corresponding columns will be > masked with function named "mask_int" from default schema (public) > > or --mask-function my_schema.mask_varchar - same as above > but with specified schema where the function is stored > > or --mask-function somedir/filename - the function is > "defined" here - more on the structure below > > FTR I wrote an extension POC [1] last weekend that does that but on the > backend > side. The main advantage is that it's working with any existing versions > of > pg_dump (or any client relying on COPY or even plain interactive SQL > statements), and that the DBA can force a dedicated role to only get a > masked > dump, even if they forgot to ask for it. > > I only had a quick look at your patch but it seems that you left some todo > in > russian, which isn't helpful at least to me. > > [1] https://github.com/rjuju/pg_anonymize > > > > >
From 0ac0027defe4b312481cd18280fa417e6aa6d5e1 Mon Sep 17 00:00:00 2001 From: Victoria Shepard <5807469+demonol...@users.noreply.github.com> Date: Wed, 12 Oct 2022 04:42:24 +0500 Subject: [PATCH] C4a pg dump masking option --- doc/src/sgml/ref/pg_dump.sgml | 59 +++ src/bin/pg_dump/Makefile | 7 +- src/bin/pg_dump/masking.c | 731 ++++++++++++++++++++++++++ src/bin/pg_dump/masking.h | 79 +++ src/bin/pg_dump/pg_dump.c | 183 +++++-- src/bin/pg_dump/t/001_basic.pl | 5 + src/bin/pg_dump/t/011_dump_masking.pl | 456 ++++++++++++++++ src/include/fe_utils/simple_list.h | 2 + src/tools/msvc/Mkvcbuild.pm | 1 + 9 files changed, 1488 insertions(+), 35 deletions(-) create mode 100644 src/bin/pg_dump/masking.c create mode 100644 src/bin/pg_dump/masking.h create mode 100644 src/bin/pg_dump/t/011_dump_masking.pl diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml index 8b9d9f4cad..fe9503e6a4 100644 --- a/doc/src/sgml/ref/pg_dump.sgml +++ b/doc/src/sgml/ref/pg_dump.sgml @@ -881,6 +881,65 @@ PostgreSQL documentation </listitem> </varlistentry> + <varlistentry> + <term><option>--masking=<replaceable class="parameter">filename</replaceable></option></term> + <listitem> + <para> + Specify a filename from which to read masking patterns for using masking functions + to fields. You have ability to use user-defined functions from file, functions that + already defined in the cluster and default functions. + + Default masking functions - are full masking according to the data types. + Returns 'XXXX' for string data types [text, varchar, character]. + Returns 0 for numeric data types [int, numeric, real, smallint, bigint]. + Returns '1900-01-01' for date and '1900-01-01 00:00:00' for timestamp. + + Here is an example of masking pattern file: + + masking_file.txt + <programlisting> + Schema1 { + Table1 { + field11 : function_name11 + , field12 : function_name12 + , field13 : function_name13 + } + + Table2 { + field21 : function_name21 + , field22 : function_name22 + /* This function will be created by script from the path 'path_to_file_with_function' + and used for 'field23' */ + , field23 : "path_to_file_with_function" + } + } + + /* Functions inside this block will be used for all schemes */ + default { + /* Functions inside this block will be used for all tables */ + default { + /* Function 'for_all_fields' will be used for all fields + did not covered by exact functions */ + default: for_all_fields, + field1: value1, + /* Using default masking function for field2 */ + field2: default + } + + /* Functions inside this block will be used for tables with name 'Table' + in the all schemes */ + Table { + field : function_name + } + } + </programlisting> + + Masking using only with <command>INSERT</command> command. You can't use it with + <command>COPY</command> command. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><option>--no-comments</option></term> <listitem> diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile index 9dc5a784dd..e5e46327fe 100644 --- a/src/bin/pg_dump/Makefile +++ b/src/bin/pg_dump/Makefile @@ -19,13 +19,14 @@ include $(top_builddir)/src/Makefile.global export GZIP_PROGRAM=$(GZIP) export with_icu -override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) +override CPPFLAGS := -g -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) OBJS = \ $(WIN32RES) \ compress_io.o \ dumputils.o \ + masking.o \ parallel.o \ pg_backup_archiver.o \ pg_backup_custom.o \ @@ -43,8 +44,8 @@ pg_dump: pg_dump.o common.o pg_dump_sort.o $(OBJS) | submake-libpq submake-libpg pg_restore: pg_restore.o $(OBJS) | submake-libpq submake-libpgport submake-libpgfeutils $(CC) $(CFLAGS) pg_restore.o $(OBJS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) -pg_dumpall: pg_dumpall.o dumputils.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils - $(CC) $(CFLAGS) pg_dumpall.o dumputils.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) +pg_dumpall: pg_dumpall.o dumputils.o masking.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils + $(CC) $(CFLAGS) pg_dumpall.o dumputils.o masking.o $(WIN32RES) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o $@$(X) install: all installdirs $(INSTALL_PROGRAM) pg_dump$(X) '$(DESTDIR)$(bindir)'/pg_dump$(X) diff --git a/src/bin/pg_dump/masking.c b/src/bin/pg_dump/masking.c new file mode 100644 index 0000000000..962d5b6d43 --- /dev/null +++ b/src/bin/pg_dump/masking.c @@ -0,0 +1,731 @@ +/*------------------------------------------------------------------------- + * + * masking.c + * + * Data masking tool for pg_dump + * + * + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/bin/pg_dump/masking.c + * + *------------------------------------------------------------------------- + */ +#include <ctype.h> +#include "masking.h" +#include "common/logging.h" + +#define REL_SIZE 64 * 8 /* Length of relation name - 64 bytes */ +#define DEFAULT_NAME "default" +#define COL_WITH_FUNC_SIZE 3 * REL_SIZE + 3 /* schema_name.function name + '(' + column_name + ') */ + +char REL_SEP = '.'; /* Relation separator */ + +/* Initialise masking map */ +MaskingMap * +newMaskingMap() +{ + MaskingMap *map = malloc(sizeof(MaskingMap)); + map->size = 0; + map->capacity = 8; + map->data = malloc(sizeof(Pair) * map->capacity); + memset(map->data, 0, sizeof(Pair) * map->capacity); + return map; +} + +int +getMapIndexByKey(MaskingMap *map, char *key) +{ + int index = 0; + while (map->data[index] != NULL) + { + if (strcmp(map->data[index]->key, key) == 0) + { + return index; + } + index++; + } + return -1; +} + +void +cleanMap(MaskingMap *map) +{ + if (map != NULL && map->data != NULL) + { + for (int i = 0; map->data[i] != NULL; i++) + { + free(map->data[i]->key); + free(map->data[i]->value); + free(map->data[i]); + } + free(map->data); + free(map); + } +} + +/* + * Add value to map or rewrite, if key already exists + */ +void +setMapValue(MaskingMap *map, char *key, char *value) +{ + if (key != NULL) + { + int index = getMapIndexByKey(map, key); + if (index != -1) /* Already have key in map */ + { + free(map->data[index]->value); + map->data[index]->value = malloc(strlen(value) + 1); + strcpy(map->data[index]->value, value); + } + else + { + Pair *pair = malloc(sizeof(Pair)); + pair->key = malloc(strlen(key) + 1); + pair->value = malloc(strlen(value) + 1); + memset(pair->key, 0, strlen(key)); + memset(pair->value, 0, strlen(value)); + strcpy(pair->key, key); + strcpy(pair->value, value); + + map->data[map->size] = malloc(sizeof(Pair)); + *map->data[map->size] = *pair; + map->size++; + free(pair); + } + if (map->size == map->capacity) /* Increase capacity */ + { + map->capacity *= 1.5; + map->data = realloc(map->data, sizeof(Pair) * map->capacity); + } + } + free(key); +} + +void +printParsingError(struct MaskingDebugDetails *md, char *message, char current_symbol) +{ + pg_log_error("Error position (symbol '%c'): line: %d pos: %d. %s\n", current_symbol, md->line_num, md->symbol_num, + message); +} + +bool +isTerminal(char c) +{ + return c == ':' || c == ',' || c == '{' || c == '}' || c == EOF; +} + +bool +isSpace(char c) +{ + return c == ' ' || c == '\t' || c == '\n' || c == EOF; +} + +/* + * Read symbol and change place of cursor in MaskingDebugDetails + * md->line_num - increasing when we meet '\n' + * md->symbol_num - increasing after reading any symbol and reset + * when we meet '\n' + */ +char +readNextSymbol(struct MaskingDebugDetails *md, FILE *fin) +{ + char c = fgetc(fin); + if (c == '\n') + { + md->line_num++; + md->symbol_num = 1; + } + else + { + md->symbol_num++; + } + return c; +} + +/* Read relation name */ +char +nameReader(char *rel_name, char c, struct MaskingDebugDetails *md, FILE *fin, int size) +{ + memset(rel_name, 0, size); + while (!isTerminal(c)) + { + switch (c) + { + case ' ': + case '\t': + case '\n': + break; /* Skip space symbols */ + case EOF: + return c; /* Handling `EOF` outside the function */ + + default: + strncat(rel_name, &c, 1); + break; + } + c = readNextSymbol(md, fin); + } + return c; +} + +/* Concat schema name, table name and column name */ +char * +getFullRelName(char *schema_name, char *table_name, char *column_name) +{ + char *full_name = malloc(REL_SIZE * 3); /* Schema.Table.Field */ + memset(full_name, 0, REL_SIZE * 3); + strcpy(full_name, schema_name); + strncat(full_name, &REL_SEP, 1); + strcat(full_name, table_name); + strncat(full_name, &REL_SEP, 1); + strcat(full_name, column_name); + return full_name; +} + +/** + * Parsing file with masking pattern + * ------------------------------------ + * Schema1 + * { + * Table1 + * { + * field11 : function_name11 + * , field12 : function_name12 + * , field13 : function_name13 + * } + * + * Table2 + * { + * field21 : function_name21 + * , field22 : function_name22 + * --This function will be stored in `masking_func_query_path` list, and these functions will be + * --created by script from the path 'path_to_file_with_function' - `pg_dump.c:createMaskingFunctions` + * --and used for 'field23' - `masking.c:addFunctionToColumn` + * , field23 : "path_to_file_with_function" + * } + * } + * + * --Functions inside this block will be used for all schemes + * default + * { + * --Functions inside this block will be used for all tables + * default + * { + * --Function 'for_all_fields' will be used for all fields did not covered by exact functions + * default: for_all_fields, + * field1: value1, + * field2: value2 + * } + * --Functions inside this block will be used for tables with name 'Table' in the all schemes + * Table + * { + * field : function_name + * } + * } + */ +int +readMaskingPatternFromFile(FILE *fin, MaskingMap *map, SimpleStringList *masking_func_query_path) +{ + int exit_status; + int brace_counter; + int close_brace_counter; + char *schema_name; + char *table_name; + char *column_name; + char *func_name; + bool skip_reading; + char c; + + struct MaskingDebugDetails md; + md.line_num = 1; + md.symbol_num = 0; + md.parsing_state = SCHEMA_NAME; + exit_status = EXIT_SUCCESS; + + schema_name = malloc(REL_SIZE + 1); + table_name = malloc(REL_SIZE + 1); + column_name = malloc(REL_SIZE + 1); + func_name = malloc(PATH_MAX + 1); /* We can get function name or path to file with a creating function query */ + + brace_counter = 0; + close_brace_counter = 0; + skip_reading = false; + + c = ' '; + while (c != EOF) + { + if (skip_reading) + { + skip_reading = false; + } + else if (!isTerminal(c)) + { + c = readNextSymbol(&md, fin); + } + switch (md.parsing_state) + { + case SCHEMA_NAME: + c = nameReader(schema_name, c, &md, fin, REL_SIZE); + md.parsing_state = WAIT_OPEN_BRACE; + memset(table_name, 0, sizeof REL_SIZE); + break; + + case TABLE_NAME: + c = nameReader(table_name, c, &md, fin, REL_SIZE); + md.parsing_state = WAIT_OPEN_BRACE; + break; + + case COLUMN_NAME: + c = nameReader(column_name, c, &md, fin, REL_SIZE); + md.parsing_state = WAIT_COLON; + break; + + case FUNCTION_NAME: + c = nameReader(func_name, c, &md, fin, PATH_MAX); + extractFuncNameIfPath(func_name, masking_func_query_path); + setMapValue(map, getFullRelName(schema_name, table_name, column_name), func_name); + md.parsing_state = WAIT_COMMA; + break; + + case WAIT_COLON: + if (isSpace(c)) + break; + if (c != ':') + { + printParsingError(&md, "Waiting symbol ':'", c); + exit_status = EXIT_FAILURE; + goto clear_resources; + } + md.parsing_state = FUNCTION_NAME; + c = readNextSymbol(&md, fin); + skip_reading = true; + break; + + case WAIT_OPEN_BRACE: + if (isSpace(c)) + break; + if (c == '}' && brace_counter > 0) + { + md.parsing_state = WAIT_CLOSE_BRACE; + break; + } + if (c != '{') + { + printParsingError(&md, "Waiting symbol '{'", c); + exit_status = EXIT_FAILURE; + goto clear_resources; + } + if (table_name[0] != '\0') /* we have already read table_name */ + { + md.parsing_state = COLUMN_NAME; + } + else + { + md.parsing_state = TABLE_NAME; + } + c = readNextSymbol(&md, fin); + skip_reading = true; + brace_counter++; + break; + + case WAIT_CLOSE_BRACE: + if (isSpace(c)) + break; + if (c != '}') + { + printParsingError(&md, "Waiting symbol '}'", c); + exit_status = EXIT_FAILURE; + goto clear_resources; + } + md.parsing_state = TABLE_NAME; + c = readNextSymbol(&md, fin); + brace_counter--; + break; + + case WAIT_COMMA: + if (isSpace(c)) + break; + if (c == '}') + { + c = readNextSymbol(&md, fin); + skip_reading = true; + close_brace_counter++; + break; + } + if (c != ',' && !isTerminal(c)) /* Schema_name or Table_name */ + { + if (close_brace_counter == 1) + { + md.parsing_state = TABLE_NAME; + } + else if (close_brace_counter == 2) + { + md.parsing_state = SCHEMA_NAME; + } + else + { + printParsingError(&md, "Too many symbols '}'", c); + exit_status = EXIT_FAILURE; + goto clear_resources; + } + skip_reading = true; + close_brace_counter = 0; + break; + } + else if (c != ',') + { + printParsingError(&md, "Waiting symbol ','", c); + exit_status = EXIT_FAILURE; + goto clear_resources; + } + md.parsing_state = COLUMN_NAME; + c = readNextSymbol(&md, fin); + skip_reading = true; + break; + } + } + clear_resources: + free(schema_name); + free(table_name); + free(column_name); + free(func_name); + return exit_status; +} + +/* Creating string in format `schema_name.function name(column_name)` */ +void +concatFunctionAndColumn(char *col_with_func, char *schema_name, char *column_name, char *function_name) +{ + /* Default function */ + if (strcmp(function_name, DEFAULT_NAME)==0) + { + strcpy(col_with_func, "_masking_function."); + strcat(col_with_func, function_name); + } + /* Function name already contains schema name. If not, then add the same scheme */ + else if (strrchr(function_name, '.') != NULL) + { + strcpy(col_with_func, function_name); + } + else + { + strcpy(col_with_func, schema_name); + strcat(col_with_func, "."); + strcat(col_with_func, function_name); + } + strcat(col_with_func, "("); + strcat(col_with_func, column_name); + strcat(col_with_func, ")"); +} + +/* + * Wrapping columns with functions + * schema_name.function_name(schema_name.table_name.column_name) + * return ' ' - if + */ +char * +addFunctionToColumn(char *schema_name, char *table_name, char *column_name, MaskingMap *map) +{ + char *col_with_func; + /* Try to find for exact schema, table and column */ + int index = getMapIndexByKey(map, getFullRelName(schema_name, table_name, column_name)); + if (index == -1) /* If didn't find, try to find function, that used for all schemas */ + { + /* Try to find for exact table and column [default.table.field] */ + index = getMapIndexByKey(map, getFullRelName(DEFAULT_NAME, table_name, column_name)); + if (index == -1) /* If didn't find, try to find function, that used for all schemas and all tables */ + { + /* Try to find for exact column [default.default.field] */ + index = getMapIndexByKey(map, getFullRelName(DEFAULT_NAME, DEFAULT_NAME, column_name)); + if (index == -1) /* If didn't find, try to find function, that used for all schemas and all tables and all columns */ + { + /* Try to find function that used for all fields in all schemas and tables [default.default.default] */ + index = getMapIndexByKey(map, getFullRelName(DEFAULT_NAME, DEFAULT_NAME, DEFAULT_NAME)); + } + } + } + col_with_func = malloc(COL_WITH_FUNC_SIZE + 1); + memset(col_with_func, 0, COL_WITH_FUNC_SIZE + 1); + if (index != -1) + { + char *function_name = map->data[index]->value; + concatFunctionAndColumn(col_with_func, schema_name, column_name, function_name); + } + return col_with_func; +} + +/* Remove the first and the last symbol in func_name */ +void +removeQuotes(char *func_name) +{ + char *new_func_name = malloc(PATH_MAX + 1); + strncpy(new_func_name, func_name + 1, strlen(func_name) - 2); + memset(func_name, 0, PATH_MAX); + strcpy(func_name, new_func_name); + free(new_func_name); +} + + +/* Read a word from a query */ +char * +readWord(FILE *fin, char *word) +{ + char c; + memset(word, 0, strlen(word)); + do + { + c = tolower(getc(fin)); + if (isSpace(c) || c == '(') /* Space or open brace before function arguments */ + { + if (word[0] == '\0') /* Spaces before the word */ + continue; + else + break; /* Spaces after the word */ + } + else + { + strncat(word, &c, 1); + } + } while (c != EOF); + return word; +} + +/** + * Extract function name from query. During extracting we also check + * the query, but only the start of it. We expecting the pattern: + * `create [or replace] function {func_name}` + * If something is wrong we will not use function and leave + * the field without transforming. + */ +int +extractFunctionNameFromQueryFile(char *filename, char *func_name) +{ + FILE *fin; + char *word; + + memset(func_name, 0, REL_SIZE); + fin = NULL; + if (filename[0] != '\0') + { + fin = fopen(filename, "r"); + } + if (fin == NULL) + { + pg_log_warning("Problem with file \'%s\"", filename); + } + else + { + word = malloc(REL_SIZE + 1); + memset(word, 0, REL_SIZE); + if (strcmp(readWord(fin, word), "create") == 0) /* reading 'create' */ + { + if (strcmp(readWord(fin, word), "or") == 0) /* reading 'or' | 'function' */ + { + if (strcmp(readWord(fin, word), "replace") != 0) /* reading 'replace' */ + { + pg_log_warning("Keyword 'replace' was expected, but found '%s'. Check query for creating a function '%s'.\n", + word, filename); + goto free_resources; + } + else + { + readWord(fin, word); /* reading 'function' */ + } + } + } + else + { + pg_log_warning("Keyword 'create' was expected, but found '%s'. Check query for creating a function '%s'.\n", word, + filename); + goto free_resources; + } + if (strcmp(word, "function") == 0) + { + strcpy(func_name, readWord(fin, word)); + } + else + { + pg_log_warning("Keyword 'function' was expected, but found '%s'. Check query for creating a function '%s'.\n", word, + filename); + goto free_resources; + } + free_resources: + free(word); + fclose(fin); + } + return func_name[0] != '\0'; /* If we got a function name, then - return 0, else - return 1 */ +} + +/** + * If there is a path (the first symbol is a quote '"'), then store this path in masking_func_query_path + * and write to the first argument (func_path) name of the function from the query in the file + * If there is not a path - do nothing +*/ +void +extractFuncNameIfPath(char *func_path, SimpleStringList *masking_func_query_path) +{ + char *func_name; + if (func_path[0] == '"') + { + func_name = malloc(REL_SIZE + 1); + removeQuotes(func_path); + if (extractFunctionNameFromQueryFile(func_path, func_name) != 0) /* Read function name from query and store in func_name */ + { + if (!simple_string_list_member(masking_func_query_path, func_path)) + { + simple_string_list_append(masking_func_query_path, func_path); + } + strcpy(func_path, func_name); /* Store func_name in func_path to throw it to upper function */ + } + free(func_name); + } +} + +/* Read whole script from the file `filename` */ +char * +readQueryForCreatingFunction(char *filename) +{ + FILE *fin; + char *query; + long fsize; + query = malloc(sizeof(char)); + memset(query, 0, sizeof(char)); + fin = fopen(filename, "r"); + if (fin != NULL) + { + fseek(fin, 0L, SEEK_END); + fsize = ftell(fin); + fseek(fin, 0L, SEEK_SET); + + query = (char *) calloc(fsize + 1, sizeof(char)); + + fsize = (int) fread(query, sizeof(char), fsize, fin); + if (fsize==0) + { + pg_log_error("File is empty `%s`", filename); + } + fclose(fin); + } + return query; +} + +/** + * Default masking function + * Full masking according to the data types. Returns 'XXXX' for string data types [text, varchar, character]. + * Returns 0 for numeric data types [int, numeric, real, smallint, bigint] + * Returns '1900-01-01' for date and '1900-01-01 00:00:00' for timestamp + */ +char * +default_functions() +{ + return "CREATE SCHEMA IF NOT EXISTS _masking_function;\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in text, out text)\n" + " AS $$ SELECT 'XXXX' $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in real, out real)\n" + " AS $$ SELECT 0 $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in date, out date)\n" + " AS $$ SELECT DATE '1900-01-01' $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in timestamp, out timestamp)\n" + " AS $$ SELECT TIMESTAMP '1900-01-01 00:00:00' $$\n" + " LANGUAGE SQL;\n" + " \n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in timestamptz, out timestamptz)\n" + " AS $$ SELECT TIMESTAMPTZ '1900-01-01 00:00:00-00' $$\n" + " LANGUAGE SQL;\n" + " \n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in time, out time)\n" + " AS $$ SELECT TIME '00:00:00' $$\n" + " LANGUAGE SQL;\n" + " \n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in timetz, out timetz)\n" + " AS $$ SELECT TIMETZ '00:00:00-00' $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in interval, out interval)\n" + " AS $$ SELECT INTERVAL '1 year 2 months 3 days' $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in box, out box)\n" + " AS $$ SELECT box(circle '((0,0),2.0)') $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in circle, out circle)\n" + " AS $$ SELECT circle(point '(0,0)', 0) $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in path, out path)\n" + " AS $$ SELECT '[ ( 0 , 1 ) , ( 1 , 2 ) ]'::path $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in point, out point)\n" + " AS $$ SELECT '(0, 0)'::point $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in polygon , out polygon)\n" + " AS $$ SELECT '( ( 0 , 0 ) , ( 0 , 0 ) )'::polygon $$\n" + " LANGUAGE SQL;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in bytea, out bytea)\n" + " AS $$ SELECT '\\000'::bytea $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in inet, out inet)\n" + " AS $$ SELECT '0.0.0.0'::inet $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in cidr, out cidr)\n" + " AS $$ SELECT '0.0.0.0'::cidr $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in macaddr, out macaddr)\n" + " AS $$ SELECT macaddr '0:0:0:0:0:ab' $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in json, out json)\n" + " AS $$ SELECT '{\"a\":\"foo\", \"b\":\"bar\"}'::json $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in jsonb, out jsonb)\n" + " AS $$ SELECT '{\"a\":1, \"b\":2}'::jsonb $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in line, out line)\n" + " AS $$ SELECT '{1,2,3}'::line $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in lseg, out lseg)\n" + " AS $$ SELECT '((0,0),(0,0))'::lseg $$\n" + " LANGUAGE sql;\n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in bit, out bit)\n" + " AS $$ SELECT '0'::bit $$\n" + " LANGUAGE sql; \n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in boolean, out boolean)\n" + " AS $$ SELECT true $$\n" + " LANGUAGE sql; \n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in money, out money)\n" + " AS $$ SELECT 0 $$\n" + " LANGUAGE sql; \n" + "\n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in pg_lsn, out pg_lsn)\n" + " AS $$ SELECT '0/0'::pg_lsn $$\n" + " LANGUAGE sql; \n" + " \n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in uuid, out uuid)\n" + " AS $$ SELECT '00000000-0000-0000-0000-000000000000'::uuid $$\n" + " LANGUAGE sql; \n" + " \n" + "CREATE OR REPLACE FUNCTION _masking_function.default(in tsvector, out tsvector)\n" + " AS $$ SELECT 'a:1'::tsvector $$\n" + " LANGUAGE sql; "; +} diff --git a/src/bin/pg_dump/masking.h b/src/bin/pg_dump/masking.h new file mode 100644 index 0000000000..0634f58078 --- /dev/null +++ b/src/bin/pg_dump/masking.h @@ -0,0 +1,79 @@ +/*------------------------------------------------------------------------- + * + * masking.h + * + * Data masking tool for pg_dump + * + * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * IDENTIFICATION + * src/bin/pg_dump/masking.h + * + *------------------------------------------------------------------------- + */ +#ifndef MASKING_H +#define MASKING_H + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <stdbool.h> +#include <limits.h> +#include "fe_utils/simple_list.h" + +typedef struct _pair +{ + char *key; + char *value; +} Pair; + +typedef struct MaskingMap +{ + Pair **data; + int size; + int capacity; +} MaskingMap; + +enum +ParsingState +{ + SCHEMA_NAME, + TABLE_NAME, + COLUMN_NAME, + FUNCTION_NAME, + WAIT_COLON, + WAIT_OPEN_BRACE, + WAIT_CLOSE_BRACE, + WAIT_COMMA +}; + +struct +MaskingDebugDetails +{ + int line_num; + int symbol_num; + enum ParsingState parsing_state; +}; + +MaskingMap *newMaskingMap(void); +void cleanMap(MaskingMap *map); +void setMapValue(MaskingMap *map, char *key, char *value); +void printParsingError(struct MaskingDebugDetails *md, char *message, char current_symbol); +bool isTerminal(char c); +bool isSpace(char c); +char readNextSymbol(struct MaskingDebugDetails *md, FILE *fin); +char nameReader(char *rel_name, char c, struct MaskingDebugDetails *md, FILE *fin, int size); +int getMapIndexByKey(MaskingMap *map, char *key); +extern int readMaskingPatternFromFile(FILE *fin, MaskingMap *map, SimpleStringList *masking_func_query_path); +char *addFunctionToColumn(char *schema_name, char *table_name, char *column_name, MaskingMap *map); +char *getFullRelName(char *schema_name, char *table_name, char *column_name); +void concatFunctionAndColumn(char *col_with_func, char *schema_name, char *column_name, char *function_name); +char *readQueryForCreatingFunction(char *filename); +extern void extractFuncNameIfPath(char *func_path, SimpleStringList *masking_func_query_path); +void removeQuotes(char *func_name); +char *readWord(FILE *fin, char *word); +int extractFunctionNameFromQueryFile(char *filename, char *func_name); +char *default_functions(void); + +#endif /* MASKING_H */ diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index bd9b066e4e..cb82d63a64 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -59,8 +59,11 @@ #include "dumputils.h" #include "fe_utils/option_utils.h" #include "fe_utils/string_utils.h" +#include "fe_utils/query_utils.h" +#include "fe_utils/simple_list.h" #include "getopt_long.h" #include "libpq/libpq-fs.h" +#include "masking.h" #include "parallel.h" #include "pg_backup_db.h" #include "pg_backup_utils.h" @@ -127,6 +130,8 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL}; static SimpleStringList extension_include_patterns = {NULL, NULL}; static SimpleOidList extension_include_oids = {NULL, NULL}; +static SimpleStringList masking_func_query_path = {NULL, NULL}; /* List of path to query with masking functions, that must be created before starting dump */ +static MaskingMap *masking_map; /* Map of fields and functions for data masking */ static const CatalogId nilCatalogId = {0, 0}; @@ -318,7 +323,8 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions, static char *get_synchronized_snapshot(Archive *fout); static void setupDumpWorker(Archive *AH); static TableInfo *getRootTableInfo(const TableInfo *tbinfo); - +static int getMaskingPatternFromFile(const char *filename); +static int createMaskingFunctions(Archive *AH, SimpleStringList *masking_func_query_path); int main(int argc, char **argv) @@ -397,7 +403,8 @@ main(int argc, char **argv) {"no-tablespaces", no_argument, &dopt.outputNoTablespaces, 1}, {"quote-all-identifiers", no_argument, "e_all_identifiers, 1}, {"load-via-partition-root", no_argument, &dopt.load_via_partition_root, 1}, - {"role", required_argument, NULL, 3}, + {"masking", required_argument, NULL, 13}, + {"role", required_argument, NULL, 3}, {"section", required_argument, NULL, 5}, {"serializable-deferrable", no_argument, &dopt.serializable_deferrable, 1}, {"snapshot", required_argument, NULL, 6}, @@ -414,7 +421,7 @@ main(int argc, char **argv) {"rows-per-insert", required_argument, NULL, 10}, {"include-foreign-data", required_argument, NULL, 11}, - {NULL, 0, NULL, 0} + {NULL, 0, NULL, 0} }; pg_logging_init(argv[0]); @@ -623,6 +630,12 @@ main(int argc, char **argv) optarg); break; + case 13: /* masking */ + getMaskingPatternFromFile(optarg); + if (dopt.dump_inserts == 0) /* Masking works only with inserts */ + dopt.dump_inserts = DUMP_DEFAULT_ROWS_PER_INSERT; + break; + default: /* getopt_long already emitted a complaint */ pg_log_error_hint("Try \"%s --help\" for more information.", progname); @@ -743,6 +756,10 @@ main(int argc, char **argv) * death. */ ConnectDatabase(fout, &dopt.cparams, false); + if (masking_map) + { + createMaskingFunctions(fout, &masking_func_query_path); + } setup_connection(fout, dumpencoding, dumpsnapshot, use_role); /* @@ -820,7 +837,7 @@ main(int argc, char **argv) dopt.outputBlobs = true; /* - * Collect role names so we can map object owner OIDs to names. + * Collect role names, so we can map object owner OIDs to names. */ collectRoleNames(fout); @@ -1035,6 +1052,7 @@ help(const char *progname) printf(_(" --inserts dump data as INSERT commands, rather than COPY\n")); printf(_(" --load-via-partition-root load partitions via the root table\n")); printf(_(" --no-comments do not dump comments\n")); + printf(_(" --masking data masking, help with hiding sensitive data\n")); printf(_(" --no-publications do not dump publications\n")); printf(_(" --no-security-labels do not dump security label assignments\n")); printf(_(" --no-subscriptions do not dump subscriptions\n")); @@ -1777,7 +1795,7 @@ selectDumpableType(TypeInfo *tyinfo, Archive *fout) * Mark a default ACL as to be dumped or not * * For per-schema default ACLs, dump if the schema is to be dumped. - * Otherwise dump if we are dumping "everything". Note that dataOnly + * Otherwise, dump if we are dumping "everything". Note that dataOnly * and aclsSkip are checked separately. */ static void @@ -1918,7 +1936,7 @@ selectDumpableExtension(ExtensionInfo *extinfo, DumpOptions *dopt) * Mark a publication object as to be dumped or not * * A publication can have schemas and tables which have schemas, but those are - * ignored in decision making, because publications are only dumped when we are + * ignored in decision-making, because publications are only dumped when we are * dumping everything. */ static void @@ -2010,13 +2028,13 @@ dumpTableData_copy(Archive *fout, const void *dcontext) fmtQualifiedDumpable(tbinfo), tdinfo->filtercond ? tdinfo->filtercond : ""); } - else - { - appendPQExpBuffer(q, "COPY %s %s TO stdout;", - fmtQualifiedDumpable(tbinfo), - column_list); - } - res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT); + else + { + appendPQExpBuffer(q, "COPY %s %s TO stdout;", + fmtQualifiedDumpable(tbinfo), + column_list); + } + res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT); PQclear(res); destroyPQExpBuffer(clistBuf); @@ -2036,9 +2054,9 @@ dumpTableData_copy(Archive *fout, const void *dcontext) /* ---------- * THROTTLE: * - * There was considerable discussion in late July, 2000 regarding + * There was considerable discussion in late July 2000 regarding * slowing down pg_dump when backing up large tables. Users with both - * slow & fast (multi-processor) machines experienced performance + * slow & fast (multiprocessor) machines experienced performance * degradation when doing a backup. * * Initial attempts based on sleeping for a number of ms for each ms @@ -2061,7 +2079,7 @@ dumpTableData_copy(Archive *fout, const void *dcontext) * Most of the hard work is done in the backend, and this solution * still did not work particularly well: on slow machines, the ratio * was 50:1, and on medium paced machines, 1:1, and on fast - * multi-processor machines, it had little or no effect, for reasons + * multiprocessor machines, it had little or no effect, for reasons * that were unclear. * * Further discussion ensued, and the proposal was dropped. @@ -2153,8 +2171,28 @@ dumpTableData_insert(Archive *fout, const void *dcontext) appendPQExpBufferStr(q, ", "); if (tbinfo->attgenerated[i]) appendPQExpBufferStr(q, "NULL"); + else if (masking_map) /* If we read masking options successfully, we can use masking functions */ + { + char *column_with_fun; + column_with_fun=addFunctionToColumn(tbinfo->dobj.namespace->dobj.name, tbinfo->dobj.name, + tbinfo->attnames[i], masking_map); + if (column_with_fun[0] == ' ') + { + pg_log_warning("Function\"%s\" was not found", column_with_fun); + } + if (column_with_fun[0] != '\0') + { + appendPQExpBufferStr(q, column_with_fun); + } + else + { + appendPQExpBufferStr(q, fmtId(tbinfo->attnames[i])); + } + free(column_with_fun); + } else - appendPQExpBufferStr(q, fmtId(tbinfo->attnames[i])); + appendPQExpBufferStr(q, fmtId(tbinfo->attnames[i])); + attgenerated[nfields] = tbinfo->attgenerated[i]; nfields++; } @@ -2292,7 +2330,7 @@ dumpTableData_insert(Archive *fout, const void *dcontext) * strtod() and friends might accept NaN, so we * can't use that to test. * - * In reality we only need to defend against + * In reality, we only need to defend against * infinity and NaN, so we need not get too crazy * about pattern matching here. */ @@ -5978,7 +6016,7 @@ getFuncs(Archive *fout, int *numFuncs) * otherwise we might not get creation ordering correct. * * 3. Otherwise, we normally exclude functions in pg_catalog. However, if - * they're members of extensions and we are in binary-upgrade mode then + * they're members of extensions, and we are in binary-upgrade mode then * include them, since we want to dump extension members individually in * that mode. Also, if they are used by casts or transforms then we need * to gather the information about them, though they won't be dumped if @@ -6707,7 +6745,7 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables) /* * We want to perform just one query against pg_index. However, we * mustn't try to select every row of the catalog and then sort it out on - * the client side, because some of the server-side functions we need + * the client side, because some server-side functions we need * would be unsafe to apply to tables we don't have lock on. Hence, we * build an array of the OIDs of tables we care about (and now have lock * on!), and use a WHERE clause to constrain which rows are selected. @@ -7049,7 +7087,7 @@ getExtendedStatistics(Archive *fout) * * Get info about constraints on dumpable tables. * - * Currently handles foreign keys only. + * Currently, handles foreign keys only. * Unique and primary key constraints are handled with indexes, * while check constraints are processed in getTableAttrs(). */ @@ -7074,7 +7112,7 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables) /* * We want to perform just one query against pg_constraint. However, we * mustn't try to select every row of the catalog and then sort it out on - * the client side, because some of the server-side functions we need + * the client side, because some server-side functions we need * would be unsafe to apply to tables we don't have lock on. Hence, we * build an array of the OIDs of tables we care about (and now have lock * on!), and use a WHERE clause to constrain which rows are selected. @@ -7464,7 +7502,7 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int numTables) /* * We want to perform just one query against pg_trigger. However, we * mustn't try to select every row of the catalog and then sort it out on - * the client side, because some of the server-side functions we need + * the client side, because some server-side functions we need * would be unsafe to apply to tables we don't have lock on. Hence, we * build an array of the OIDs of tables we care about (and now have lock * on!), and use a WHERE clause to constrain which rows are selected. @@ -8084,7 +8122,7 @@ getTransforms(Archive *fout, int *numTransforms) /* * getTableAttrs - * for each interesting table, read info about its attributes - * (names, types, default values, CHECK constraints, etc) + * (names, types, default values, CHECK constraints, etc.) * * modifies tblinfo */ @@ -9426,7 +9464,7 @@ getAdditionalACLs(Archive *fout) * If a matching pg_description entry is found, it is dumped. * * Note: in some cases, such as comments for triggers and rules, the "type" - * string really looks like, e.g., "TRIGGER name ON". This is a bit of a hack + * string really looks like, e.g., "TRIGGER name ON". This is a bit of a hack, * but it doesn't seem worth complicating the API for all callers to make * it cleaner. * @@ -11138,7 +11176,7 @@ dumpCompositeType(Archive *fout, const TypeInfo *tyinfo) else { /* - * This is a dropped attribute and we're in binary_upgrade mode. + * This is a dropped attribute, and we're in binary_upgrade mode. * Insert a placeholder for it in the CREATE TYPE command, and set * length and alignment with direct UPDATE to the catalogs * afterwards. See similar code in dumpTableSchema(). @@ -11747,6 +11785,7 @@ dumpFunc(Archive *fout, const FuncInfo *finfo) fmtId(finfo->dobj.namespace->dobj.name), funcsig); + if (prokind[0] == PROKIND_PROCEDURE) keyword = "PROCEDURE"; else @@ -14921,7 +14960,7 @@ dumpTable(Archive *fout, const TableInfo *tbinfo) if (fout->remoteVersion >= 90600) { /* - * In principle we should call acldefault('c', relowner) to + * In principle, we should call acldefault('c', relowner) to * get the default ACL for a column. However, we don't * currently store the numeric OID of the relowner in * TableInfo. We could convert the owner name using regrole, @@ -15284,7 +15323,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) /* * Not Null constraint --- suppress if inherited, except - * if partition, or in binary-upgrade case where that + * if partitioned, or in binary-upgrade case where that * won't work. */ print_notnull = (tbinfo->notnull[j] && @@ -15514,7 +15553,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) * * We process foreign and partitioned tables here, even though they * lack heap storage, because they can participate in inheritance - * relationships and we want this stuff to be consistent across the + * relationships, and we want this stuff to be consistent across the * inheritance tree. We can exclude indexes, toast tables, sequences * and matviews, even though they have storage, because we don't * support altering or dropping columns in them, nor can they be part @@ -17517,9 +17556,9 @@ getExtensionMembership(Archive *fout, ExtensionInfo extinfo[], * Due to the FKs being created at CREATE EXTENSION time and therefore before * the data is loaded, we have to work out what the best order for reloading * the data is, to avoid FK violations when the tables are restored. This is - * not perfect- we can't handle circular dependencies and if any exist they - * will cause an invalid dump to be produced (though at least all of the data - * is included for a user to manually restore). This is currently documented + * not perfect - we can't handle circular dependencies and if any exist they + * will cause an invalid dump to be produced (though at least all the data + * is included for a user to manually restore). This is currently documented, * but perhaps we can provide a better solution in the future. */ void @@ -18198,3 +18237,83 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions, if (!res) pg_log_warning("could not parse %s array", "reloptions"); } + +/** + * getMaskingPatternFromFile + * + * Parse the specified masking file with description of what we need to mask + * If the filename is "-" then filters will be + * read from STDIN rather than a file. + */ +int +getMaskingPatternFromFile(const char *filename) +{ + FILE *fin; + int exit_result; + if (filename[0]=='\0') + { + pg_log_error("--masking filename shouldn't be empty"); + exit_nicely(1); + } + + fin = fopen(filename, "r"); + + if (fin == NULL) + { + exit_nicely(1); + } + + masking_map = newMaskingMap(); + + exit_result = readMaskingPatternFromFile(fin, masking_map, &masking_func_query_path); + fclose(fin); + return exit_result; +} + +/** + * Read paths to functions from `masking_func_query_path`, + * read query inside the files and run them. We checked them + * in function masking.c:extractFunctionNameFromQueryFile. + */ +int +createMaskingFunctions(Archive *AH, SimpleStringList *masking_func_query_path) +{ + int exit_result; + PGconn *conn = GetConnection(AH); + char *filename; + char *query; + bool result; + + exit_result=0; + result = false; + /* Read all custom masking functions and create them */ + for (SimpleStringListCell *cell = masking_func_query_path->head; cell; cell = cell->next) + { + filename=cell->val; + query = readQueryForCreatingFunction(filename); + if (query[0]=='\0') + { + pg_log_warning("Query is empty. Check file `%s`", filename); + exit_result++; + } + else + { + result = executeMaintenanceCommand(conn, query, true); + } + + if (!result) + { + pg_log_warning("Failed execution of query from file \"%s\"", filename); + exit_result++; + } + free(query); + } + /* Read all default functions and create them */ + result = executeMaintenanceCommand(conn, default_functions(), true); + if (!result) + { + pg_log_warning("Problem during creating default functions from method `masking.c:default_functions`"); + exit_result++; + } + return exit_result; +} diff --git a/src/bin/pg_dump/t/001_basic.pl b/src/bin/pg_dump/t/001_basic.pl index a583c8a6d2..5072057ad7 100644 --- a/src/bin/pg_dump/t/001_basic.pl +++ b/src/bin/pg_dump/t/001_basic.pl @@ -151,6 +151,11 @@ command_fails_like( qr/\Qpg_dump: error: --rows-per-insert must be in range\E/, 'pg_dump: --rows-per-insert must be in range'); +command_fails_like( + [ 'pg_dump', '--masking', '' ], + qr/\Qpg_dump: error: --masking filename shouldn't be empty\E/, + 'pg_dump: --masking filename shouldn\'t be empty'); + command_fails_like( [ 'pg_restore', '--if-exists', '-f -' ], qr/\Qpg_restore: error: option --if-exists requires option -c\/--clean\E/, diff --git a/src/bin/pg_dump/t/011_dump_masking.pl b/src/bin/pg_dump/t/011_dump_masking.pl new file mode 100644 index 0000000000..be0a1ce01f --- /dev/null +++ b/src/bin/pg_dump/t/011_dump_masking.pl @@ -0,0 +1,456 @@ +# Copyright (c) 2021, PostgreSQL Global Development Group + +use strict; +use warnings; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More tests => 65; + +my $tempdir = PostgreSQL::Test::Utils::tempdir; +my $inputfile; + +my $node = PostgreSQL::Test::Cluster->new('main'); +my $port = $node->port; +my $backupdir = $node->backup_dir; +my $plainfile = "$backupdir/plain.sql"; +my $dumpfile = "$backupdir/options_plain.sql"; +my $dumpdir = "$backupdir/parallel"; +my $dumpjobfile = "$backupdir/parallel/toc.dat'"; + +$node->init; +$node->start; + +# Generate test objects +$node->safe_psql('postgres', 'CREATE FOREIGN DATA WRAPPER dummy;'); +$node->safe_psql('postgres', 'CREATE SERVER dummyserver FOREIGN DATA WRAPPER dummy;'); +$node->safe_psql('postgres', "CREATE SCHEMA schema1;"); +$node->safe_psql('postgres', "CREATE SCHEMA schema2;"); +$node->safe_psql('postgres', "CREATE SCHEMA schema3;"); +$node->safe_psql('postgres', "CREATE SCHEMA schema4;"); +$node->safe_psql('postgres', "CREATE SCHEMA schema5;"); +$node->safe_psql('postgres', "CREATE SCHEMA schema6;"); +$node->safe_psql('postgres', "CREATE SCHEMA schema7;"); + +$node->safe_psql('postgres', "CREATE TABLE schema1.table1(field1 varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema2.table2(field2 varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema3.table3(field3 varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema4.table4(field41 varchar, field42 varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema5.table51(field511 varchar, email varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema5.table52(email varchar, field522 varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema6.table61(email varchar, field612 varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema6.table62(field621 varchar, email varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema7.table7(field71 varchar, phone varchar)"); +$node->safe_psql('postgres', "CREATE TABLE schema7.table8(phone varchar, field82 varchar)"); + +$node->safe_psql('postgres', "INSERT INTO schema1.table1 VALUES('value1')"); +$node->safe_psql('postgres', "INSERT INTO schema2.table2 VALUES('value2')"); +$node->safe_psql('postgres', "INSERT INTO schema3.table3 VALUES('value3')"); +$node->safe_psql('postgres', "INSERT INTO schema4.table4 VALUES('value41', 'value42')"); +$node->safe_psql('postgres', "INSERT INTO schema5.table51 VALUES('value511', 'value512')"); +$node->safe_psql('postgres', "INSERT INTO schema5.table52 VALUES('value521', 'value522')"); +$node->safe_psql('postgres', "INSERT INTO schema6.table61 VALUES('value611', 'value612')"); +$node->safe_psql('postgres', "INSERT INTO schema6.table62 VALUES('value621', 'value622')"); +$node->safe_psql('postgres', "INSERT INTO schema7.table7 VALUES('value71', 'value72')"); +$node->safe_psql('postgres', "INSERT INTO schema7.table8 VALUES('value81', 'value82')"); + + +######################################### +# Use masking with custom function from file + +# Create masking pattern file and file with custom function + +open $inputfile, '>>', "$tempdir/custom_function_file.txt" + or die "unable to open custom_function_file for writing"; +print $inputfile " + CREATE FUNCTION schema3.custom_function(in text, out text) + AS \$\$ SELECT \$1 || ' custom' \$\$ + LANGUAGE SQL;"; +close $inputfile; + +open $inputfile, '>>', "$tempdir/mask_email.sql" + or die "unable to open mask_email.sql for writing"; +print $inputfile " + CREATE FUNCTION public.mask_email(in text, out text) + AS \$\$ SELECT \$1 || ' email' \$\$ + LANGUAGE SQL;"; +close $inputfile; + +open $inputfile, '>>', "$tempdir/mask_phone.sql" + or die "unable to open mask_phone.sql for writing"; +print $inputfile " + CREATE FUNCTION public.mask_phone(in text, out text) + AS \$\$ SELECT \$1 || ' phone' \$\$ + LANGUAGE SQL;"; +close $inputfile; + +open $inputfile, '>>', "$tempdir/masking_file.txt" + or die "unable to open masking file for writing"; +print $inputfile "schema1 + { + table1 + { + field1: default + } + } + schema2 + { + table2 { + not_exist_field: default + } + } + schema3 + { + table3 + { + field3: \"$tempdir/custom_function_file.txt\" + } + } + schema4 + { + table4 + { + field41: schema3.custom_function, + field42: \"$tempdir/custom_function_file.txt\" + } + } + default + { + default + { + email: \"$tempdir/mask_email.sql\" + } + table7 + { + phone: \"$tempdir/mask_phone.sql\" + } + } + "; +close $inputfile; + + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $plainfile, + "--masking=$tempdir/masking_file.txt" + ], + "1. Run masking without options"); + +my $dump = slurp_file($plainfile); +ok($dump =~ qr/^INSERT INTO schema1\.table1 VALUES \(\'XXXX\'\)/m, "2. [Default function] Field1 was masked"); +ok($dump =~ qr/^CREATE FUNCTION _masking_function\.\"default\"\(text\, OUT text\) RETURNS text/m, "3. [Default function] Default functions were created"); +ok($dump =~ qr/^INSERT INTO schema2\.table2 VALUES \(\'value2\'\)/m, "4. Field2 was not masked"); +ok($dump =~ qr/^CREATE FUNCTION schema3\.custom_function\(text\, OUT text\) RETURNS text/m, "5. [Function from file] Custom function was created"); +ok($dump =~ qr/^INSERT INTO schema3\.table3 VALUES \(\'value3 custom\'\)/m, "6. [Function from file] Field3 was masked"); +ok($dump =~ qr/^INSERT INTO schema4\.table4 VALUES \(\'value41 custom\'\, \'value42 custom\'\)/m, "7. [Function from file] Already created custom function"); +ok($dump =~ qr/^INSERT INTO schema5\.table51 VALUES \(\'value511\'\, \'value512 email\'\)/m, "8. [Default schema and table] Masked only field with name `email`"); +ok($dump =~ qr/^INSERT INTO schema5\.table52 VALUES \(\'value521 email\'\, \'value522\'\)/m, "9. [Default schema and table] Masked only field with name `email`"); +ok($dump =~ qr/^INSERT INTO schema6\.table61 VALUES \(\'value611 email\'\, \'value612\'\)/m, "10. [Default schema and table] Masked only field with name `email`"); +ok($dump =~ qr/^INSERT INTO schema6\.table62 VALUES \(\'value621\'\, \'value622 email\'\)/m, "11. [Default schema and table] Masked only field with name `email`"); +ok($dump =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, "12. [Default schema] Masked only field with name `phone` from table7"); +ok($dump =~ qr/^INSERT INTO schema7\.table8 VALUES \(\'value81\'\, \'value82\'\)/m, "13. [Default schema] Masked only field with name `phone` from table7"); + +######################################### +# Run masking with other options +######################################### + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + '--data-only' + ], + "14. Run masking with option --data-only"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, "15. Check dump after running masking with option --data-only"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + '--clean' + ], + "16. Run masking with option --clean"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, "17. Check dump after running masking with option --clean"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + '--create' + ], + "18. Run masking with option --create"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, "19. Check dump after running masking with option --create"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--encoding=UTF-8" + ], + "20. Run masking with option --encoding"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, "21. Check dump after running masking with option --encoding"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpdir, + "--masking=$tempdir/masking_file.txt", + "--format=directory", + "--jobs=2" + ], + "22. Run masking with option --jobs"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--schema=schema6" + ], + "23. Run masking with option --schema"); +ok(slurp_file($dumpfile) !~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, "24. Check dump after running masking with option --schema"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema6\.table62 VALUES \(\'value621\'\, \'value622 email\'\)/m, "25. Check dump after running masking with option --schema"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--table=schema6.table62" + ], + "26. Run masking with option --table"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema6\.table62 VALUES \(\'value621\'\, \'value622 email\'\)/m, "27. Check dump after running masking with option --table"); +ok(slurp_file($dumpfile) !~ qr/^INSERT INTO schema6\.table61 VALUES \(\'value611 email\'\, \'value612\'\)/m, "28. Check dump after running masking with option --table"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--exclude-table=schema6.table62" + ], + "29. Run masking with option --exclude-table"); +ok(slurp_file($dumpfile) !~ qr/^INSERT INTO schema6\.table62 VALUES \(\'value621\'\, \'value622\'\)/m, "30. Check dump after running masking with option --exclude-table"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--verbose" + ], + "31. Run masking with option --verbose"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--no-privileges" + ], + "32. Run masking with option --no-privileges"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--binary-upgrade" + ], + "33. Run masking with option --binary-upgrade"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"34. Check dump after running masking with option --binary-upgrade"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--column-inserts" + ], + "35. Run masking with option --column-inserts"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 \(field71\, mask_phone\) VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"36. Check dump after running masking with option --column-inserts"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--disable-dollar-quoting" + ], + "37. Run masking with option --column-inserts"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"38. Check dump after running masking with option --disable-dollar-quoting"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--disable-triggers", + "--masking=$tempdir/masking_file.txt" + ], + "39. Run masking with option --disable-triggers"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--if-exists", + "--clean", + ], + "40. Run masking with option --if-exists"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema6\.table62 VALUES \(\'value621\'\, \'value622 email\'\)/m, +"41. Check dump after running masking with option --if-exists"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--inserts" + ], + "42. Run masking with option --inserts"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema6\.table62 VALUES \(\'value621\'\, \'value622 email\'\)/m, +"43. Check dump after running masking with option --inserts"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--load-via-partition-root" + ], + "44. Run masking with option --load-via-partition-root"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema6\.table62 VALUES \(\'value621\'\, \'value622 email\'\)/m, +"45. Check dump after running masking with option --load-via-partition-root"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--lock-wait-timeout=10" + ], + "46. Run masking with option --lock-wait-timeout"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--no-comments", + "--no-publications", + "--no-security-labels", + "--no-subscriptions", + "--no-sync", + "--no-tablespaces", + "--no-toast-compression", + "--no-unlogged-table-data" + ], + "47. Run masking with skip options"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--quote-all-identifiers" + ], + "48. Run masking with option --quote-all-identifiers"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO \"schema7\"\.\"table7\" VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"49. Check dump after running masking with option --quote-all-identifiers"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--rows-per-insert=10" + ], + "50. Run masking with option --rows-per-insert"); +ok(slurp_file($dumpfile) =~ qr/^ \(\'value71\'\, \'value72 phone\'\)/m, +"51. Check dump after running masking with option --rows-per-insert"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--section=pre-data" + ], + "52. Run masking with option --section"); +ok(slurp_file($dumpfile) =~ qr/^CREATE FUNCTION public\.mask_phone\(text\, OUT text\) RETURNS text/m, +"53. Check dump after running masking with option --section"); +ok(slurp_file($dumpfile) !~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"54. Check dump after running masking with option --section"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--serializable-deferrable" + ], + "55. Run masking with option --serializable-deferrable"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"56. Check dump after running masking with option --serializable-deferrable"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + '--strict-names', 'postgres' + ], + "57. Run masking with option --strict names"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"58. Check dump after running masking with option --serializable-deferrable"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + '--use-set-session-authorization' + ], + "59. Run masking with option --use-set-session-authorization"); +ok(slurp_file($dumpfile) =~ qr/^INSERT INTO schema7\.table7 VALUES \(\'value71\'\, \'value72 phone\'\)/m, +"60. Check dump after running masking with option --use-set-session-authorization"); + +command_ok( + [ + 'pg_dump', '-p', $port, '-f', $dumpfile, + "--masking=$tempdir/masking_file.txt", + "--compress=9" + ], + "61. Run masking with option --compress"); + +open $inputfile, '>>', "$tempdir/drop_table_script.sql" + or die "unable to open mask_phone.sql for writing"; +print $inputfile "DROP TABLE schema1.table1;"; +close $inputfile; + +open $inputfile, '>>', "$tempdir/masking_file_2.txt" + or die "unable to open masking file for writing"; +print $inputfile "schema1 + { + table1 + { + field1: \"$tempdir/drop_table_script.sql\" + } + } + "; +close $inputfile; + + +command_fails_like( + [ + 'pg_dump', '-p', $port, '-f', $plainfile, + "--masking=$tempdir/masking_file_2.txt" + ], + qr/pg_dump: warning: Keyword 'create' was expected, but found 'drop'. Check query for creating a function/, + "62, 63. Run masking with wrong query"); + +open $inputfile, '>>', "$tempdir/masking_file_2.txt" + or die "unable to open masking file for writing"; +print $inputfile "schema1 + { + table1 + { + field, + } + } + "; +close $inputfile; + +command_fails_like( + [ + 'pg_dump', '-p', $port, '-f', $plainfile, + "--masking=$tempdir/masking_file_2.txt" + ], + qr/\Qpg_dump: error: Error position (symbol ','): line: 12 pos: 31. Waiting symbol ':'\E/, + "64, 65. Run masking with wrong masking file"); + +done_testing(); + diff --git a/src/include/fe_utils/simple_list.h b/src/include/fe_utils/simple_list.h index 757fd6ac5a..848858c817 100644 --- a/src/include/fe_utils/simple_list.h +++ b/src/include/fe_utils/simple_list.h @@ -16,6 +16,8 @@ */ #ifndef SIMPLE_LIST_H #define SIMPLE_LIST_H +#include "postgres_ext.h" +#include "c.h" typedef struct SimpleOidListCell { diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm index 83a3e40425..e6755b9b15 100644 --- a/src/tools/msvc/Mkvcbuild.pm +++ b/src/tools/msvc/Mkvcbuild.pm @@ -457,6 +457,7 @@ sub mkvcbuild $pgrestore->{name} = 'pg_restore'; $pgrestore->AddIncludeDir('src/backend'); $pgrestore->AddFile('src/bin/pg_dump/pg_restore.c'); + $pgdumpall->AddFile('src/bin/pg_dump/masking.c'); $pgrestore->AddLibrary('ws2_32.lib'); my $zic = $solution->AddProject('zic', 'exe', 'utils'); -- 2.25.1