Hi, First, let me say that I'm not subscribed to the mailing list, so please CC myself when responding.
This post is to discuss a possible extension to the GNU C language. Note that this is still an idea and not refined. Background ========== In C, the following code structure is ubiquitous: return value = function_call(arguments); if (return_value == ERROR_VALUE) goto exit_fail; You can take a look at goto usages in the Linux kernel just for examples (https://github.com/torvalds/linux/search?q=goto). However, this method has one particular drawback, besides verbosity among others. This drawback is that each function has to designate (at least) one special value as ERROR_VALUE. Trivial as it may seem, this has by itself resulted in many inconsistencies and problems. For example, `malloc` signals failure by returning `NULL`, `strtod` may return 0, `HUGE_VAL*` etc, `fread` returns 0 which is not necessarily an error case either, `fgetc` returns `EOF`, `remove` returns nonzero if failed, `clock` returns -1 and so on. Sometimes such a special value may not even be possible, in which case a workaround is required (put the return value as a pointer argument and return the state of success). The following suggestion allows clearer and shorter error handling. The Extension (Basic) ===================== First, let's introduce a new syntax (note again, this is just an example. I don't suggest these particular symbols): float inverse(int x) { if (x == 0) fail; return 1.0f / x; } ... y = inverse(x) !! goto exit_inverse_failed; The semantics of this syntax would be as follows. The function `inverse` can choose to `fail` instead of `return`, in which case it doesn't actually return anything. From the caller site, this failure is signaled (speculations on details below), `y` is not assigned and a `goto exit_inverse_failed` is executed. The observed behavior would be equivalent to: int inverse(int x, float *y) { if (x == 0) return -1; *y = 1.0f / x; return 0; } ... if (inverse(x, &y)) goto exit_inverse_failed; The Extension (Advanced) ======================== Sometimes, error handling is done not just by a single `goto` (although they can all be reduced to this). For example: return value = function_call(arguments); if (return_value == ERROR_VALUE) { /* a small piece of code, such as printing an error */ goto exit_fail; } This could be shortened as: return value = function_call(arguments) !! { /* a small piece of code, such as printing an error */ goto exit_fail; } A generic syntax could therefore be used: return value = function_call(arguments) !! goto exit_fail; return value = function_call(arguments) !! fail; return value = function_call(arguments) !! return 0; return value = function_call(arguments) !! { /* more than one statement */ } Another necessity is for the error code. While `errno` is usable, it's not the best solution in the world. Extending the syntax further, the following could be used (again, syntax is just for the sake of example, I'm not suggesting these particular symbols): float inverse(int x) { if (x == 0) fail EDOM; return 1.0f / x; } ... y = inverse(x) !!= error_code !! goto exit_inverse_failed; By this, the function `inverse` can `fail` with an error code (again, speculations of details below), which can be stored in a variable (`error_code`) in call site. Some Details ========== The state of failure and success as well as the failure code can be kept in registers, to keep the ABI backward-compatible. If backward compatibility is required, a `fail`able function must still provide a fail value (simply to keep older code intact), which could have a syntax as follows (for example): float inverse(int x) !! 0 { if (x == 0) fail EDOM; return 1.0f / x; } ... y = inverse(x); In this example, the caller doesn't check for failure and would receive the fail value indicated by the function signature. If no such fail value is given, the caller must check for failure. This allows older code, such as the standard library to be possibly used in the way it has always been (by providing fail value) or with this extension, while allowing cleaner and more robust code to be written (by not providing fail value). Examples ======== Here are some examples. Opening a file and reading a number (normal C): int n; FILE *fin = fopen("filename", "r"); if (fin == NULL) goto exit_no_file; if (fscanf(fin, "%d", &n) != 1) if (ferror(fin)) goto exit_io_error; else { /* complain about format */ } fclose(fin); return 0; exit_io_error: /* print error: I/O error */ fclose(fin); goto exit_fail; exit_no_file; /* print error: no file */ goto exit_fail; exit_fail: return -1; This has two major sections, main functionality and error recovery. However, the main functionality is cluttered with error checking and per-function error return values. With this extension, the same function could look like this: int n, ret; FILE *fin = fopen("filename", "r") !! goto exit_no_file; ret = fscanf(fin, "%d", &n) !! goto exit_io_error; if (ret != 1) { /* complain about format */ } fclose(fin); return 0; exit_io_error: /* print error: I/O error */ fclose(fin); goto exit_fail; exit_no_file; /* print error: no file */ goto exit_fail; exit_fail: return -1; Notice how reading the code and understanding its functionality is easier simply by ignoring what comes after `!!`. Also notice how two separate `fscanf` failures are properly distinguished; I/O error vs. format error, the later of which is not necessarily a failure and could be part of the logic of the program. Here's another example rather randomly taken from the Linux kernel: int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_path *path = NULL; struct btrfs_key key; int ret = 0; int wret; int level; int next_key_ret = 0; u64 last_ret = 0; u64 min_trans = 0; if (root->fs_info->extent_root == root) { /* * there's recursion here right now in the tree locking, * we can't defrag the extent root without deadlock */ goto out; } if (root->ref_cows == 0) goto out; if (btrfs_test_opt(root, SSD)) goto out; path = btrfs_alloc_path(); if (!path) return -ENOMEM; level = btrfs_header_level(root->node); if (level == 0) goto out; if (root->defrag_progress.objectid == 0) { struct extent_buffer *root_node; u32 nritems; root_node = btrfs_lock_root_node(root); btrfs_set_lock_blocking(root_node); nritems = btrfs_header_nritems(root_node); root->defrag_max.objectid = 0; /* from above we know this is not a leaf */ btrfs_node_key_to_cpu(root_node, &root->defrag_max, nritems - 1); btrfs_tree_unlock(root_node); free_extent_buffer(root_node); memset(&key, 0, sizeof(key)); } else { memcpy(&key, &root->defrag_progress, sizeof(key)); } path->keep_locks = 1; ret = btrfs_search_forward(root, &key, path, min_trans); if (ret < 0) goto out; if (ret > 0) { ret = 0; goto out; } btrfs_release_path(path); wret = btrfs_search_slot(trans, root, &key, path, 0, 1); if (wret < 0) { ret = wret; goto out; } if (!path->nodes[1]) { ret = 0; goto out; } path->slots[1] = btrfs_header_nritems(path->nodes[1]); next_key_ret = btrfs_find_next_key(root, path, &key, 1, min_trans); ret = btrfs_realloc_node(trans, root, path->nodes[1], 0, &last_ret, &root->defrag_progress); if (ret) { WARN_ON(ret == -EAGAIN); goto out; } if (next_key_ret == 0) { memcpy(&root->defrag_progress, &key, sizeof(key)); ret = -EAGAIN; } out: if (path) btrfs_free_path(path); if (ret == -EAGAIN) { if (root->defrag_max.objectid > root->defrag_progress.objectid) goto done; if (root->defrag_max.type > root->defrag_progress.type) goto done; if (root->defrag_max.offset > root->defrag_progress.offset) goto done; ret = 0; } done: if (ret != -EAGAIN) { memset(&root->defrag_progress, 0, sizeof(root->defrag_progress)); root->defrag_trans_start = trans->transid; } return ret; } And this is how it could be converted to use this extension: int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, struct btrfs_root *root) { struct btrfs_path *path = NULL; struct btrfs_key key; int ret = 0; int level; int next_key_ret = 0; u64 last_ret = 0; u64 min_trans = 0; if (root->fs_info->extent_root == root) { /* * there's recursion here right now in the tree locking, * we can't defrag the extent root without deadlock */ goto out; } if (root->ref_cows == 0) goto out; btrfs_test_opt(root, SSD) !! goto out; path = btrfs_alloc_path() !! return -ENOMEM; level = btrfs_header_level(root->node); if (level == 0) goto out; if (root->defrag_progress.objectid == 0) { struct extent_buffer *root_node; u32 nritems; root_node = btrfs_lock_root_node(root); btrfs_set_lock_blocking(root_node); nritems = btrfs_header_nritems(root_node); root->defrag_max.objectid = 0; /* from above we know this is not a leaf */ btrfs_node_key_to_cpu(root_node, &root->defrag_max, nritems - 1); btrfs_tree_unlock(root_node); free_extent_buffer(root_node); memset(&key, 0, sizeof(key)); } else { memcpy(&key, &root->defrag_progress, sizeof(key)); } path->keep_locks = 1; btrfs_search_forward(root, &key, path, min_trans) !!= ret !! { if (ret > 0) ret = 0; goto out; } btrfs_release_path(path); btrfs_search_slot(trans, root, &key, path, 0, 1) !!= ret !! goto out; if (!path->nodes[1]) { ret = 0; goto out; } path->slots[1] = btrfs_header_nritems(path->nodes[1]); next_key_ret = btrfs_find_next_key(root, path, &key, 1, min_trans); btrfs_realloc_node(trans, root, path->nodes[1], 0, &last_ret, &root->defrag_progress) !!= ret !! { WARN_ON(ret == -EAGAIN); goto out; } if (next_key_ret == 0) { memcpy(&root->defrag_progress, &key, sizeof(key)); ret = -EAGAIN; } out: if (path) btrfs_free_path(path); if (ret == -EAGAIN) { if (root->defrag_max.objectid > root->defrag_progress.objectid) goto done; if (root->defrag_max.type > root->defrag_progress.type) goto done; if (root->defrag_max.offset > root->defrag_progress.offset) goto done; ret = 0; } done: if (ret != -EAGAIN) { memset(&root->defrag_progress, 0, sizeof(root->defrag_progress)); root->defrag_trans_start = trans->transid; } return ret; } which is not just 11 lines shorter, but also has the advantage that the replaced lines look like this: btrfs_do_something(...) !! ... instead of ret = btrfs_do_something(...) if (ret < 0) ... which makes it easier to understand what the code tries to do because the focus is on the `btrfs_do_something` function rather than the surrounding `ret` and `if`. With the traditional method, one cannot easily understand what return values are errors and what are expected values. Another rather random example from the Linux kernel: static __net_init int sunrpc_init_net(struct net *net) { int err; struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); err = rpc_proc_init(net); if (err) goto err_proc; err = ip_map_cache_create(net); if (err) goto err_ipmap; err = unix_gid_cache_create(net); if (err) goto err_unixgid; err = rpc_pipefs_init_net(net); if (err) goto err_pipefs; init_list_head(&sn->all_clients); spin_lock_init(&sn->rpc_client_lock); spin_lock_init(&sn->rpcb_clnt_lock); return 0; err_pipefs: unix_gid_cache_destroy(net); err_unixgid: ip_map_cache_destroy(net); err_ipmap: rpc_proc_exit(net); err_proc: return err; } which would be converted to: static __net_init void sunrpc_init_net(struct net *net) { int err; struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); rpc_proc_init(net) !!= err !! goto err_proc; ip_map_cache_create(net) !!= err !! goto err_ipmap; unix_gid_cache_create(net) !!= err !! goto err_unixgid; rpc_pipefs_init_net(net) !!= err !! goto err_pipefs; init_list_head(&sn->all_clients); spin_lock_init(&sn->rpc_client_lock); spin_lock_init(&sn->rpcb_clnt_lock); return; err_pipefs: unix_gid_cache_destroy(net); err_unixgid: ip_map_cache_destroy(net); err_ipmap: rpc_proc_exit(net); err_proc: fail err; } which shows even clearer than the previous example how this syntax simplifies things. Feedback ======== Please let me know what you think. In particular, what would be the limitations of such a syntax? Would you be interested in seeing this extension to the GNU C language? What alternative symbols do you think would better show the intention/simplify parsing/look more beautiful? Have a nice day, Shahbaz