A small (preprocessor) problem, and a modest enhancement proposal

Ronald F. Guilmette Tue, 30 Jun 2009 21:28:57 -0700


Greetings old friends.  Long time no see.  I have been off fighting
the spam wars for about the past 10-15 years, but I hope I'll still
be welcome in these parts.


I just dropped in again because I've got a bit of a C programming
difficulty, and if nobody else has a solution, I'd like to propose
what I believe is a modest solution.

First, the problem...

I have quite a lot of code written (my own) wherein I define various
function-like macros to do certain useful things for me.  Some of these
macros invoke yet other macros.  Here's a brief sample (but I have a
lot code containing of this kind of stuff).  Please bear with me, while
I explain.

==============================================================================
...
#define for_named_file_e_0(PATH,MODE,EXPR) \
  __extension__ ({ \
    register char const *const path = (PATH); \
    register char const *const mode = (MODE); \
    ... \
  })

#define for_named_file_e(PATH,MODE,EXPR) \
  __extension__ ({ \
    register char const *const path = (PATH); \
    register char const *const mode = (MODE); \
    ... \
    for_named_file_e_0 (path, mode, (EXPR)); \
  })
...
==============================================================================

Some of you may have run into this same problem (that I'm having) yourselves
in the past, so perhaps some of you already see the problem.

The problem is that in the final expansion of any call to `for_named_file_e',
inside the (nested) expansion of `for_named_file_e_0' we will unintentionally
end up with code that says:

        ...
        register char const *const path = path;
        register char const *const mode = mode;
        ...

Unfortunately, because of C's scoping rules, the run-time effects of these 
two declaration don't have at all the desired effect, i.e. of taking the
values of the original PATH and MODE argments passed in to the outer
`first-level' call (to `for_named_file_e') and assigning those value to
the path_ and mode_ variables that belong to the *inner* macro expansion
(i.e. those of `for_named_file_e_0').

As someone taught me a long long time ago, it is always Good, within macros,
to evaluate those macro arguments that happen to be expressions only once.
This prevents problems like some people experienced way back in the Bad Old
Days (before the language was standardized) when people ended up scratching
their heads over macro calls like:

        ... = tolower (*p++);

When using un-carful early implmentations of C, the above code often caused
the value of `p' to be incremented two or more times, which was almost
always _not_ what was desired or intended.

Following the rule of always evaluating (and then saving) the values of any
of the expression arguments to most function-like macros prevents such
problems.

Unfortunately however, universally obeying that rule can give rise to cases
of the _other_ (scoping related) problem I have illustrated above (via my
two example macro definitions).

I have an idea that I think could help to solve the latter problem, and one
which might also perhaps be useful in other contexts.

I'd like to propose a small enhancement for the GNU preprocessor, i.e.
the addition of a new __MACRO__ pre-defined built-in symbol.  This new
built-in would be analogous to the existing __LINE__ __FILE__ and __FUNCTION__
built-in symbols.  Unlike those, of course, at any point where its value was
referenced, the pre-processor would substitute for the __MACRO__ symbol the
exact text of the name of the ``innermost'' macro currently be expanded by
the preprocessor.  Thus, two example macros might be composed as follows:

        #define foo(ARG1,ARG2) \
          { \
            register int __MACRO__##_macro_arg1 = ARG1; \
            register int __MACRO__##_macro_arg2 = ARG2; \
            foobar = __MACRO__##_macro_arg1 + __MACRO__##_macro_arg2; \
          }

        #define bar(ARG1,ARG2) \
          { \
            register int __MACRO__##_macro_arg1 = ARG1; \
            register int __MACRO__##_macro_arg2 = ARG2; \
            foo (__MACRO__##_macro_arg1, __MACRO__##_macro_arg2); \
          }

In such a case, the following macro call:

        bar (*p++, *q++);

would result in the following code expansion (reformatted for legibility):

        {
          register int bar_macro_arg1 = (*p++);
          register int bar_macro_arg2 = (*q++);
          {
            register int foo_macro_arg1 = bar_macro_arg1;
            register int foo_macro_arg2 = bar_macro_arg2;
            foobar = foo_macro_arg1 + foo_macro_arg2;
          }
        }

This expansion would have the desirable effects of (a) _not_ improperly
incrementing the two pointers, `p' and `q', multiple times (as a side-
effect of the call to `bar') and also (b) using & generating identifiers
(used here as variable names) that are unique to their immediately
enclosing macro definition(s).  This avoids the C/C++ scoping problem noted
above, and it may perhaps also have other salutory effects in other contexts.

Note that the standard preprocessor ## (string concatenation) operator is
defined (by the standard) in such a way that it's evaluation/execution must
necessarily be delayed until macro-expansion time, and that fact suggests
... to me anyway... that there would be a minimum of special hacks needed
in the preprocessor in order to implement this small preprocessor extension.

There is one small case in which some fiddling of the existing preprocessor
rules would be needed however.  In order to make this newly proposed __MACRO__
feature usable in the way intended however, evaluation rules for arguments
of the pre-processor ## operator might have to be adjusted very slightly
(and in a completely backward-compatible manner).

Currently, the ## operator is defined (by the standard) in a way that
causes it to only perform its really special magic _just_ in those cases
where one or another of the (preceeding or following) arguments to the
## operator are in fact parameter identifiers which name some parameter
or another of the current function-like macro.  Thus, the code:

        #defined example(BAR) foo##BAR foo##__LINE__
        example(bar)

produces the macro-expanded text:

        foobar foo__LINE__

It _does not_ generate a version of the macro expansion which would arguably
be more helpful/useful, i.e.:

        foobar foo2

In order to allow the new __MACRO__ built-in symbol to be used as intended,
at the very least that one new preprocessor built-in symbol would have to
be handled specially when it appeared either before or after the ## binary
operator.  In such contexts, it would have to be macro-replaced _before_
being supplied as a operand to the ## operator.

In short, the rule regarding the arguments to binary ## would necessarily
be relaxed ever so slightly, i.e. so as permit macro-replacement (with the
corresponding value), prior to ## evaluation/concatenation of either:

        (a) any named parameter of the current function-like macro, or
        (b) the special new (extension) built-in named `__MACRO__'

This small adjustment should not be difficult.  (And if _I_ were implemeting
this proposal, that's how I would do it.)

Alternatively however, a more generally useful generalization of the pre-
processor ## binary operator might instead be implemented, i.e. one that
doesn't break full backwards compatability for any existing code, but one
with more generally useful semantics.  Such an operator might conceivably
be spelled `@@'.  Assuming that it were in fact spelled that way, and
assuming that it was defined in a way that called for both of its aguments
to be fully macro-expanded _prior_ to being operated upon by the new
concatenation operator (_regardless_ of whether they are named parameters
of the current function-like macro or not), then it would be possible to write

        #defined example(BAR) foo@@BAR foo@@__LINE__
        example(bar)

and thence obtain (from the preprocessor) the following text:

        foobar foo2

The implementation of this possible/plausible additional preprocessor binary
operator is however beyond the scope of what it is my intention to propose,
i.e. a new __MACRO__ built-in with some very slightly augmented semantics
(hack?) for the existing ## operator.  If that latter aspect of this
proposal offends anyway (i.e. due to its special treatment of one special
new built-in symbol) then certainly, I would instead propose a new __MACRO__
built-in, together with a new ``enhanced semactics'' preprocessor concatenation
operator (which, as I say, might possibly be spelled `@@' or perhaps `###').

Comments?

A small (preprocessor) problem, and a modest enhancement proposal

Reply via email to