Re: Error-safe user functions

Andres Freund Mon, 05 Dec 2022 15:48:07 -0800

Hi,

On 2022-12-05 16:40:06 -0500, Tom Lane wrote:
> +/*
> + * errsave_start --- begin a "safe" error-reporting cycle
> + *
> + * If "context" isn't an ErrorSaveContext node, this behaves as
> + * errstart(ERROR, domain), and the errsave() macro ends up acting
> + * exactly like ereport(ERROR, ...).
> + *
> + * If "context" is an ErrorSaveContext node, but the node creator only wants
> + * notification of the fact of a safe error without any details, just set
> + * the error_occurred flag in the ErrorSaveContext node and return false,
> + * which will cause us to skip the remaining error processing steps.
> + *
> + * Otherwise, create and initialize error stack entry and return true.
> + * Subsequently, errmsg() and perhaps other routines will be called to 
> further
> + * populate the stack entry.  Finally, errsave_finish() will be called to
> + * tidy up.
> + */
> +bool
> +errsave_start(void *context, const char *domain)


Why is context a void *?


> +{
> +     ErrorSaveContext *escontext;
> +     ErrorData  *edata;
> +
> +     /*
> +      * Do we have a context for safe error reporting?  If not, just punt to
> +      * errstart().
> +      */
> +     if (context == NULL || !IsA(context, ErrorSaveContext))
> +             return errstart(ERROR, domain);

I don't think we should "accept" !IsA(context, ErrorSaveContext) - that
seems likely to hide things like use-after-free.


> +     if (++errordata_stack_depth >= ERRORDATA_STACK_SIZE)
> +     {
> +             /*
> +              * Wups, stack not big enough.  We treat this as a PANIC 
> condition
> +              * because it suggests an infinite loop of errors during error
> +              * recovery.
> +              */
> +             errordata_stack_depth = -1; /* make room on stack */
> +             ereport(PANIC, (errmsg_internal("ERRORDATA_STACK_SIZE 
> exceeded")));
> +     }

This is the fourth copy of this code...



> +/*
> + * errsave_finish --- end a "safe" error-reporting cycle
> + *
> + * If errsave_start() decided this was a regular error, behave as
> + * errfinish().  Otherwise, package up the error details and save
> + * them in the ErrorSaveContext node.
> + */
> +void
> +errsave_finish(void *context, const char *filename, int lineno,
> +                        const char *funcname)
> +{
> +     ErrorSaveContext *escontext = (ErrorSaveContext *) context;
> +     ErrorData  *edata = &errordata[errordata_stack_depth];
> +
> +     /* verify stack depth before accessing *edata */
> +     CHECK_STACK_DEPTH();
> +
> +     /*
> +      * If errsave_start punted to errstart, then elevel will be ERROR or
> +      * perhaps even PANIC.  Punt likewise to errfinish.
> +      */
> +     if (edata->elevel >= ERROR)
> +             errfinish(filename, lineno, funcname);

I'd put a pg_unreachable() or such after the errfinish() call.


> +     /*
> +      * Else, we should package up the stack entry contents and deliver them 
> to
> +      * the caller.
> +      */
> +     recursion_depth++;
> +
> +     /* Save the last few bits of error state into the stack entry */
> +     if (filename)
> +     {
> +             const char *slash;
> +
> +             /* keep only base name, useful especially for vpath builds */
> +             slash = strrchr(filename, '/');
> +             if (slash)
> +                     filename = slash + 1;
> +             /* Some Windows compilers use backslashes in __FILE__ strings */
> +             slash = strrchr(filename, '\\');
> +             if (slash)
> +                     filename = slash + 1;
> +     }
> +
> +     edata->filename = filename;
> +     edata->lineno = lineno;
> +     edata->funcname = funcname;
> +     edata->elevel = ERROR;          /* hide the LOG value used above */
> +
> +     /*
> +      * We skip calling backtrace and context functions, which are more 
> likely
> +      * to cause trouble than provide useful context; they might act on the
> +      * assumption that a transaction abort is about to occur.
> +      */

This seems like a fair bit of duplicated code.


> + * This is the same as InputFunctionCall, but the caller may also pass a
> + * previously-initialized ErrorSaveContext node.  (We declare that as
> + * "void *" to avoid including miscnodes.h in fmgr.h.)

It seems way cleaner to forward declare ErrorSaveContext instead of
using void *.


> If escontext points
> + * to an ErrorSaveContext, any "safe" errors detected by the input function
> + * will be reported by filling the escontext struct.  The caller must
> + * check escontext->error_occurred before assuming that the function result
> + * is meaningful.

I wonder if we shouldn't instead make InputFunctionCallSafe() return a
boolean and return the Datum via a pointer. As callers are otherwise
going to need to do SAFE_ERROR_OCCURRED(escontext) themselves, I think
it should also lead to more concise (and slightly more efficient) code.


> +Datum
> +InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
> +                                       Oid typioparam, int32 typmod,
> +                                       void *escontext)

Is there a reason not to provide this infrastructure for
ReceiveFunctionCall() as well?


Not that I have a suggestion for a better name, but I don't particularly
like "Safe" denoting non-erroring input function calls. There's too many
interpretations of safe - e.g. safe against privilege escalation issues
or such.



> @@ -252,10 +254,13 @@ record_in(PG_FUNCTION_ARGS)
>                       column_info->column_type = column_type;
>               }
>  
> -             values[i] = InputFunctionCall(&column_info->proc,
> -                                                                       
> column_data,
> -                                                                       
> column_info->typioparam,
> -                                                                       
> att->atttypmod);
> +             values[i] = InputFunctionCallSafe(&column_info->proc,
> +                                                                             
>   column_data,
> +                                                                             
>   column_info->typioparam,
> +                                                                             
>   att->atttypmod,
> +                                                                             
>   escontext);
> +             if (SAFE_ERROR_OCCURRED(escontext))
> +                     PG_RETURN_NULL();

It doesn't *quite* seem right to set ->isnull in case of an error. Not
that it has an obvious harm.

Wonder if it's perhaps worth to add VALGRIND_MAKE_MEM_UNDEFINED() calls
to InputFunctionCallSafe() to more easily detect cases where a caller
ignores that an error occured.


> +                     if (safe_mode)
> +                     {
> +                             ErrorSaveContext *es_context = 
> cstate->es_context;
> +
> +                             /* Must reset the error_occurred flag each time 
> */
> +                             es_context->error_occurred = false;

I'd put that into the if (es_context->error_occurred) path. Likely the
window for store-forwarding issues is smaller than
InputFunctionCallSafe(), but it's trivial to write it differently...


> diff --git a/src/test/regress/sql/copy.sql b/src/test/regress/sql/copy.sql
> index 285022e07c..ff77d27cfc 100644
> --- a/src/test/regress/sql/copy.sql
> +++ b/src/test/regress/sql/copy.sql
> @@ -268,3 +268,23 @@ a        c       b
>  
>  SELECT * FROM header_copytest ORDER BY a;
>  drop table header_copytest;
> +
> +-- "safe" error handling
> +create table on_error_copytest(i int, b bool, ai int[]);
> +
> +copy on_error_copytest from stdin with (null_on_error);
> +1    a       {1,}
> +err  1       {x}
> +2    f       {3,4}
> +bad  x       {,
> +\.
> +
> +copy on_error_copytest from stdin with (warn_on_error);
> +3    0       [3:4]={3,4}
> +4    b       [0:1000]={3,4}
> +err  t       {}
> +bad  z       {"zed"}
> +\.
> +
> +select * from on_error_copytest;
> +drop table on_error_copytest;

Think it'd be good to have a test for a composite type where one of the
columns safely errors out and the other doesn't.

Greetings,

Andres Freund

Re: Error-safe user functions

Reply via email to