Hi Derek,

On 2026-05-08T10:01:08-0400, Derek Martin wrote:
> On Fri, May 08, 2026 at 03:20:00PM +0200, Alejandro Colomar via Mutt-dev 
> wrote:
> >> I'd like to see evidence that compilers are unable to perform this
> > optimization these days.
> 
> Sure!
> 
> $ gcc -O2 -o loop_vs_strspn loop_vs_strspn.c
> $ ./loop_vs_strspn /////12345 /
> Loop method time: 0.178080 seconds
> strspn method time: 0.314601 seconds

Here's my experiment compiling exactly your code with GCC 15:

        alx@devuan:~/tmp$ gcc -O2 loop_vs_strspn.c 
        alx@devuan:~/tmp$ ./a.out /////12345 /
        Loop method time: 0.163531 seconds
        strspn method time: 0.155075 seconds
        alx@devuan:~/tmp$ gcc -O3 loop_vs_strspn.c 
        alx@devuan:~/tmp$ ./a.out /////12345 /
        Loop method time: 0.167269 seconds
        strspn method time: 0.000002 seconds

Even though it is clear that with -O3 it realizes the program does
nothing after the first iteration, that in itself shows that this may
have implications we're not even aware of, farther away in the code.
It's interesting that it's not able to see the same simplification with
the manual loop.

Rewriting the test to use a function, so that the compiler can't
optimize all the iterations away, I get other results.  Here's the
program:

        #include <stdio.h>
        #include <string.h>
        #include <time.h>

        #define ITERATIONS 100000000

        [[gnu::noipa]]
        void
        manual(const char *p, const char *accept)
        {
                while (*p && *p == *accept)
                        p++;
                if (*p != '1')
                        printf("%s\n", p);
        }

        [[gnu::noipa]]
        void
        alx(const char *p, const char *accept)
        {
                p += strspn(p, accept);
                if (*p != '1')
                        printf("%s\n", p);
        }

        int
        main(int argc, char **argv)
        {
            char *str = argv[1];
            char *accept = argv[2];

            clock_t start, end;
            double cpu_time_used[2];

            // Timing the loop method
            start = clock();
            for (int i = 0; i < ITERATIONS; i++)
                manual(str, accept);
            end = clock();

            cpu_time_used[0] = ((double)(end - start)) / CLOCKS_PER_SEC;

            // Timing the strspn method
            start = clock();
            for (int i = 0; i < ITERATIONS; i++)
                alx(str, accept);
            end = clock();

            cpu_time_used[1] = ((double)(end - start)) / CLOCKS_PER_SEC;
            printf("Loop method time: %f seconds\n", cpu_time_used[0]);
            printf("strspn method time: %f seconds\n", cpu_time_used[1]);

            return 0;
        }

If I compile this with GCC 15, the results are indistinguishable from
each other:

        alx@devuan:~/tmp$ gcc -O2 test.c 
        alx@devuan:~/tmp$ ./a.out /////12345 /
        Loop method time: 0.178783 seconds
        strspn method time: 0.178006 seconds
        alx@devuan:~/tmp$ gcc -O3 test.c 
        alx@devuan:~/tmp$ ./a.out /////12345 /
        Loop method time: 0.174688 seconds
        strspn method time: 0.174829 seconds

> Test program attached.  The if block is necessary so that the compiler
> doesn't completely optimize away the loop in the strspn case.

With -O3, it still gets rid of it.  You need [[gnu::noipa]] for that,
and thus a function.

>  It's
> realistic since you're going to actually use the value in real code.
> Note that I chose the values so the printf would not actually execute.
> 
> > But even if I saw evidence (which I doubt), I think the fact that it
> > doesn't allow silly mistakes when writing the loop already counters the
> > small efficiency theoretical gains.
> 
> I don't know what you mean by silly mistakes;

The following loop has several expressions.  One could have a typo in
any of them, introducing subtle bugs.

                while (*p && *p == *accept)
                        p++;

A call

                p += strspn(p, accept);

is significantly more robust, in that it admits less typos.  I still
don't think it's perfect, and a wrapper that returns the pointer is even
more robust (it has even less places where one can have a typo):

                p = stpspn(p, accept);

> regardless the code I
> wrote was unit-tested and has no silly mistakes.
> 
> On Fri, May 08, 2026 at 03:16:24PM +0200, Alejandro Colomar via Mutt-dev 
> wrote:
> > The name means "string span".  It is the span (length) of the substring
> > composed exclusively of characters in the second parameter.
> 
> I know what it is and what it does.  But it has a stupid name and it
> is not used often, regardless of why. [By comparison, C++ has
> std::string::find_first_not_of, which is a much clearer name.]
> 
> > I believe it's not often used because few people know it, and thus few
> > people spread the knowledge of when it's useful and how.
> 
> Which is a great reason not to use it.  The while loop solution is
> both more efficient and more explicit.  That means it wins every time
> in my book.
> 
> > There are mainly two cases where it's useful:
> 
> There's really one, in my opinion: when you want to find the first
> character not in a list of more than two or three characters.
> Otherwise I'd use the loop.
> 
> > Then there are a few other cases, but these two cover the most common
> > ones.  Once you learn this function, it crams a lot of logic into a
> > single function, making the code more compact and readable.
> 
> No it doesn't, as demonstrated by my test program.  The while loop
> version takes up almost exactly the same amount of space.

If you write

        char *ptr = str;
        while (*ptr && *ptr == *accept)
                ptr++;
vs
        char *ptr = NULL;
        ptr = str + strspn(str, accept);

as you wrote it, maybe.  However, the latter can be simplified:

        char *ptr = str;
        ptr += strspn(ptr, accept);

And if you add a wrapper that returns a pointer (because that's what
you're often interested in, after all):

        char *ptr = stpspn(str, accept);

This is significantly less text.


Cheers,
Alex

> 
> -- 
> Derek D. Martin    http://www.pizzashack.org/   GPG Key ID: 0xDFBEAD02
> -=-=-=-=-
> This message is posted from an invalid address.  Replying to it will result in
> undeliverable mail due to spam prevention.  Sorry for the inconvenience.
> 

> #include <stdio.h>
> #include <string.h>
> #include <time.h>
> 
> #define ITERATIONS 100000000
> 
> int main(int argc, char **argv) {
>     char *str = argv[1];
>     char *accept = argv[2];
>     
>     clock_t start, end;
>     double cpu_time_used[2];
> 
>     // Timing the loop method
>     start = clock();
>     for (int i = 0; i < ITERATIONS; i++) {
>         char *ptr = str;
>         while (*ptr && *ptr == *accept) ptr++;
>         if (*ptr != '1') printf("%s\n", ptr);
>     }
>     end = clock();
> 
>     cpu_time_used[0] = ((double)(end - start)) / CLOCKS_PER_SEC;
> 
>     // Timing the strspn method
>     start = clock();
>     for (int i = 0; i < ITERATIONS; i++) {
>         char *ptr = NULL;
>         ptr = str + strspn(str, accept);
>         if (*ptr != '1') printf("%s\n", ptr);
>     }
>     end = clock();
>     cpu_time_used[1] = ((double)(end - start)) / CLOCKS_PER_SEC;
>     printf("Loop method time: %f seconds\n", cpu_time_used[0]);
>     printf("strspn method time: %f seconds\n", cpu_time_used[1]);
> 
>     return 0;
> }


-- 
<https://www.alejandro-colomar.es>

Attachment: signature.asc
Description: PGP signature

Reply via email to