GNU C extension: Function Error vs. Success

Shahbaz Youssefi Mon, 10 Mar 2014 07:28:33 -0700

Hi,

First, let me say that I'm not subscribed to the mailing list, so
please CC myself when responding.


This post is to discuss a possible extension to the GNU C language.
Note that this is still an idea and not refined.

Background
==========

In C, the following code structure is ubiquitous:

    return value = function_call(arguments);
    if (return_value == ERROR_VALUE)
        goto exit_fail;

You can take a look at goto usages in the Linux kernel just for
examples (https://github.com/torvalds/linux/search?q=goto).

However, this method has one particular drawback, besides verbosity
among others. This drawback is that each function has to designate (at
least) one special value as ERROR_VALUE. Trivial as it may seem, this
has by itself resulted in many inconsistencies and problems. For
example, `malloc` signals failure by returning `NULL`, `strtod` may
return 0, `HUGE_VAL*` etc, `fread` returns 0 which is not necessarily
an error case either, `fgetc` returns `EOF`, `remove` returns nonzero
if failed, `clock` returns -1 and so on.

Sometimes such a special value may not even be possible, in which case
a workaround is required (put the return value as a pointer argument
and return the state of success).

The following suggestion allows clearer and shorter error handling.

The Extension (Basic)
=====================

First, let's introduce a new syntax (note again, this is just an
example. I don't suggest these particular symbols):

    float inverse(int x)
    {
        if (x == 0)
            fail;
        return 1.0f / x;
    }

    ...
    y = inverse(x) !! goto exit_inverse_failed;

The semantics of this syntax would be as follows. The function
`inverse` can choose to `fail` instead of `return`, in which case it
doesn't actually return anything. From the caller site, this failure
is signaled (speculations on details below), `y` is not assigned and a
`goto exit_inverse_failed` is executed. The observed behavior would be
equivalent to:

    int inverse(int x, float *y)
    {
        if (x == 0)
            return -1;
        *y = 1.0f / x;
        return 0;
    }

    ...
    if (inverse(x, &y))
        goto exit_inverse_failed;

The Extension (Advanced)
========================

Sometimes, error handling is done not just by a single `goto`
(although they can all be reduced to this). For example:

    return value = function_call(arguments);
    if (return_value == ERROR_VALUE)
    {
        /* a small piece of code, such as printing an error */
        goto exit_fail;
    }

This could be shortened as:

    return value = function_call(arguments) !! {
        /* a small piece of code, such as printing an error */
        goto exit_fail;
    }

A generic syntax could therefore be used:

    return value = function_call(arguments) !! goto exit_fail;
    return value = function_call(arguments) !! fail;
    return value = function_call(arguments) !! return 0;
    return value = function_call(arguments) !! {
        /* more than one statement */
    }

Another necessity is for the error code. While `errno` is usable, it's
not the best solution in the world. Extending the syntax further, the
following could be used (again, syntax is just for the sake of
example, I'm not suggesting these particular symbols):

    float inverse(int x)
    {
        if (x == 0)
            fail EDOM;
        return 1.0f / x;
    }

    ...
    y = inverse(x) !!= error_code !! goto exit_inverse_failed;

By this, the function `inverse` can `fail` with an error code (again,
speculations of details below), which can be stored in a variable
(`error_code`) in call site.

Some Details
==========

The state of failure and success as well as the failure code can be
kept in registers, to keep the ABI backward-compatible.

If backward compatibility is required, a `fail`able function must
still provide a fail value (simply to keep older code intact), which
could have a syntax as follows (for example):

    float inverse(int x) !! 0
    {
        if (x == 0)
            fail EDOM;
        return 1.0f / x;
    }

    ...
    y = inverse(x);

In this example, the caller doesn't check for failure and would
receive the fail value indicated by the function signature. If no such
fail value is given, the caller must check for failure. This allows
older code, such as the standard library to be possibly used in the
way it has always been (by providing fail value) or with this
extension, while allowing cleaner and more robust code to be written
(by not providing fail value).

Examples
========

Here are some examples.

Opening a file and reading a number (normal C):

        int n;
        FILE *fin = fopen("filename", "r");
        if (fin == NULL)
            goto exit_no_file;

        if (fscanf(fin, "%d", &n) != 1)
            if (ferror(fin))
                goto exit_io_error;
            else
                { /* complain about format */ }

        fclose(fin);
        return 0;

    exit_io_error:
        /* print error: I/O error */
        fclose(fin);
        goto exit_fail;
    exit_no_file;
        /* print error: no file */
        goto exit_fail;
    exit_fail:
        return -1;

This has two major sections, main functionality and error recovery.
However, the main functionality is cluttered with error checking and
per-function error return values. With this extension, the same
function could look like this:

        int n, ret;
        FILE *fin = fopen("filename", "r") !! goto exit_no_file;

        ret = fscanf(fin, "%d", &n) !! goto exit_io_error;
        if (ret != 1)
            { /* complain about format */ }

        fclose(fin);
        return 0;

    exit_io_error:
        /* print error: I/O error */
        fclose(fin);
        goto exit_fail;
    exit_no_file;
        /* print error: no file */
        goto exit_fail;
    exit_fail:
        return -1;

Notice how reading the code and understanding its functionality is
easier simply by ignoring what comes after `!!`. Also notice how two
separate `fscanf` failures are properly distinguished; I/O error vs.
format error, the later of which is not necessarily a failure and
could be part of the logic of the program.

Here's another example rather randomly taken from the Linux kernel:

    int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
            struct btrfs_root *root)
    {
        struct btrfs_path *path = NULL;
        struct btrfs_key key;
        int ret = 0;
        int wret;
        int level;
        int next_key_ret = 0;
        u64 last_ret = 0;
        u64 min_trans = 0;

        if (root->fs_info->extent_root == root) {
            /*
             * there's recursion here right now in the tree locking,
             * we can't defrag the extent root without deadlock
             */
            goto out;
        }

        if (root->ref_cows == 0)
            goto out;

        if (btrfs_test_opt(root, SSD))
            goto out;

        path = btrfs_alloc_path();
        if (!path)
            return -ENOMEM;

        level = btrfs_header_level(root->node);

        if (level == 0)
            goto out;

        if (root->defrag_progress.objectid == 0) {
            struct extent_buffer *root_node;
            u32 nritems;

            root_node = btrfs_lock_root_node(root);
            btrfs_set_lock_blocking(root_node);
            nritems = btrfs_header_nritems(root_node);
            root->defrag_max.objectid = 0;
            /* from above we know this is not a leaf */
            btrfs_node_key_to_cpu(root_node, &root->defrag_max,
                    nritems - 1);
            btrfs_tree_unlock(root_node);
            free_extent_buffer(root_node);
            memset(&key, 0, sizeof(key));
        } else {
            memcpy(&key, &root->defrag_progress, sizeof(key));
        }

        path->keep_locks = 1;

        ret = btrfs_search_forward(root, &key, path, min_trans);
        if (ret < 0)
            goto out;
        if (ret > 0) {
            ret = 0;
            goto out;
        }
        btrfs_release_path(path);
        wret = btrfs_search_slot(trans, root, &key, path, 0, 1);

        if (wret < 0) {
            ret = wret;
            goto out;
        }
        if (!path->nodes[1]) {
            ret = 0;
            goto out;
        }
        path->slots[1] = btrfs_header_nritems(path->nodes[1]);
        next_key_ret = btrfs_find_next_key(root, path, &key, 1,
                min_trans);
        ret = btrfs_realloc_node(trans, root,
                path->nodes[1], 0,
                &last_ret,
                &root->defrag_progress);
        if (ret) {
            WARN_ON(ret == -EAGAIN);
            goto out;
        }
        if (next_key_ret == 0) {
            memcpy(&root->defrag_progress, &key, sizeof(key));
            ret = -EAGAIN;
        }
    out:
        if (path)
            btrfs_free_path(path);
        if (ret == -EAGAIN) {
            if (root->defrag_max.objectid > root->defrag_progress.objectid)
                goto done;
            if (root->defrag_max.type > root->defrag_progress.type)
                goto done;
            if (root->defrag_max.offset > root->defrag_progress.offset)
                goto done;
            ret = 0;
        }
    done:
        if (ret != -EAGAIN) {
            memset(&root->defrag_progress, 0,
                    sizeof(root->defrag_progress));
            root->defrag_trans_start = trans->transid;
        }
        return ret;
    }

And this is how it could be converted to use this extension:

    int btrfs_defrag_leaves(struct btrfs_trans_handle *trans,
            struct btrfs_root *root)
    {
        struct btrfs_path *path = NULL;
        struct btrfs_key key;
        int ret = 0;
        int level;
        int next_key_ret = 0;
        u64 last_ret = 0;
        u64 min_trans = 0;

        if (root->fs_info->extent_root == root) {
            /*
             * there's recursion here right now in the tree locking,
             * we can't defrag the extent root without deadlock
             */
            goto out;
        }

        if (root->ref_cows == 0)
            goto out;

        btrfs_test_opt(root, SSD) !! goto out;

        path = btrfs_alloc_path() !! return -ENOMEM;

        level = btrfs_header_level(root->node);

        if (level == 0)
            goto out;

        if (root->defrag_progress.objectid == 0) {
            struct extent_buffer *root_node;
            u32 nritems;

            root_node = btrfs_lock_root_node(root);
            btrfs_set_lock_blocking(root_node);
            nritems = btrfs_header_nritems(root_node);
            root->defrag_max.objectid = 0;
            /* from above we know this is not a leaf */
            btrfs_node_key_to_cpu(root_node, &root->defrag_max,
                    nritems - 1);
            btrfs_tree_unlock(root_node);
            free_extent_buffer(root_node);
            memset(&key, 0, sizeof(key));
        } else {
            memcpy(&key, &root->defrag_progress, sizeof(key));
        }

        path->keep_locks = 1;

        btrfs_search_forward(root, &key, path, min_trans) !!= ret !! {
            if (ret > 0)
                ret = 0;
            goto out;
        }
        btrfs_release_path(path);
        btrfs_search_slot(trans, root, &key, path, 0, 1) !!= ret !! goto out;

        if (!path->nodes[1]) {
            ret = 0;
            goto out;
        }
        path->slots[1] = btrfs_header_nritems(path->nodes[1]);
        next_key_ret = btrfs_find_next_key(root, path, &key, 1,
                min_trans);
        btrfs_realloc_node(trans, root,
                path->nodes[1], 0,
                &last_ret,
                &root->defrag_progress) !!= ret !! {
            WARN_ON(ret == -EAGAIN);
            goto out;
        }
        if (next_key_ret == 0) {
            memcpy(&root->defrag_progress, &key, sizeof(key));
            ret = -EAGAIN;
        }
    out:
        if (path)
            btrfs_free_path(path);
        if (ret == -EAGAIN) {
            if (root->defrag_max.objectid > root->defrag_progress.objectid)
                goto done;
            if (root->defrag_max.type > root->defrag_progress.type)
                goto done;
            if (root->defrag_max.offset > root->defrag_progress.offset)
                goto done;
            ret = 0;
        }
    done:
        if (ret != -EAGAIN) {
            memset(&root->defrag_progress, 0,
                    sizeof(root->defrag_progress));
            root->defrag_trans_start = trans->transid;
        }
        return ret;
    }

which is not just 11 lines shorter, but also has the advantage that
the replaced lines look like this:

    btrfs_do_something(...) !! ...

instead of

    ret = btrfs_do_something(...)
    if (ret < 0)
        ...

which makes it easier to understand what the code tries to do because
the focus is on the `btrfs_do_something` function rather than the
surrounding `ret` and `if`. With the traditional method, one cannot
easily understand what return values are errors and what are expected
values.

Another rather random example from the Linux kernel:

    static __net_init int sunrpc_init_net(struct net *net)
    {
        int err;
        struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);

        err = rpc_proc_init(net);
        if (err)
            goto err_proc;

        err = ip_map_cache_create(net);
        if (err)
            goto err_ipmap;

        err = unix_gid_cache_create(net);
        if (err)
            goto err_unixgid;

        err = rpc_pipefs_init_net(net);
        if (err)
            goto err_pipefs;

        init_list_head(&sn->all_clients);
        spin_lock_init(&sn->rpc_client_lock);
        spin_lock_init(&sn->rpcb_clnt_lock);
        return 0;

    err_pipefs:
        unix_gid_cache_destroy(net);
    err_unixgid:
        ip_map_cache_destroy(net);
    err_ipmap:
        rpc_proc_exit(net);
    err_proc:
        return err;
    }

which would be converted to:

    static __net_init void sunrpc_init_net(struct net *net)
    {
        int err;
        struct sunrpc_net *sn = net_generic(net, sunrpc_net_id);

        rpc_proc_init(net)            !!= err !! goto err_proc;
        ip_map_cache_create(net)      !!= err !! goto err_ipmap;
        unix_gid_cache_create(net)    !!= err !! goto err_unixgid;
        rpc_pipefs_init_net(net)      !!= err !! goto err_pipefs;

        init_list_head(&sn->all_clients);
        spin_lock_init(&sn->rpc_client_lock);
        spin_lock_init(&sn->rpcb_clnt_lock);
        return;

    err_pipefs:
        unix_gid_cache_destroy(net);
    err_unixgid:
        ip_map_cache_destroy(net);
    err_ipmap:
        rpc_proc_exit(net);
    err_proc:
        fail err;
    }

which shows even clearer than the previous example how this syntax
simplifies things.

Feedback
========

Please let me know what you think. In particular, what would be the
limitations of such a syntax? Would you be interested in seeing this
extension to the GNU C language? What alternative symbols do you think
would better show the intention/simplify parsing/look more beautiful?

Have a nice day,
Shahbaz

GNU C extension: Function Error vs. Success

Reply via email to