Question for the experts.  Let's take the following example:

----->8------------->8--------------------
#include <stdio.h>
#include <string.h>
#include <wchar.h>

#define period          0x2e
#define question        0x3f
#define exclam          0x21
#define ellipsis        L'\u2026'

const wchar_t p[] = { period, question, exclam, ellipsis };

int
main()
{
        const wchar_t s[] = L". Hello.";

        printf("%ls\n", s);
        printf("%lu\n", wcsspn(s, p));

        return 0;
}
-------------8<-----------8<----------------


Now run:

  $ cc -Wall example.c -o example && ./example
  . Hello.
  8
  $ egcc -Wall example.c -o example && ./example
  . Hello.
  1

As you see, compiled with GCC the program does what is expected.  To get
the desired result with CLANG you have to write the string literally.
Change the declaration of p[] above to:

  const wchar_t p[] = L".?!?";
                           ^ This is a UTF-8 ellipsis.

And now:

  $ cc -Wall example.c -o example && ./example
  . Hello.
  1

Using only ASCII or only UTF-8 in the array also works.

Is this a bug in clang's wcsspn() or I'm wrong in assuming that the
array can be declared in the way I did?


-- 
Walter

Reply via email to