Re: [XEN PATCH v2 1/3] EFI: address a violation of MISRA C Rule 13.6

2024-10-08 Thread Roberto Bagnara

On 2024-10-08 07:59, Jan Beulich wrote:

On 02.10.2024 08:54, Roberto Bagnara wrote:

On 2024-10-02 08:09, Jan Beulich wrote:

On 01.10.2024 23:36, Stefano Stabellini wrote:

On Tue, 1 Oct 2024, Jan Beulich wrote:

On 01.10.2024 07:25, Roberto Bagnara wrote:

On 2024-09-30 15:07, Jan Beulich wrote:

On 30.09.2024 14:49, Federico Serafini wrote:

guest_handle_ok()'s expansion contains a sizeof() involving its
first argument which is guest_handle_cast().
The expansion of the latter, in turn, contains a variable
initialization.

Since MISRA considers the initialization (even of a local variable)
a side effect, the chain of expansions mentioned above violates
MISRA C:2012 Rule 13.6 (The operand of the `sizeof' operator shall not
contain any expression which has potential side effect).


I'm afraid I need to ask for clarification of terminology and alike here.
While the Misra doc has a section on Persistent Side Effects in its
Glossary appendix, what constitutes a side effect from its pov isn't
really spelled out anywhere. Which in turn raises the question whether it
is indeed Misra (and not just Eclair) which deems initialization a side
effect. This is even more so relevant as 13.6 talks of only expressions,
yet initializers fall under declarations (in turn involving an expression
on the rhs of the equal sign).

All the same of course affects patch 2 then, too.


MISRA C leaves the definition of "side effect" to the C Standard.
E.g., C18 5.1.2.3p2:

 Accessing a volatile object, modifying an object, modifying a file,
 or calling a function that does any of those operations are all
 side effects,[omitted irrelevant footnote reference] which are
 changes in the state of the execution environment.

The MISRA C:2012/2023 Glossary entry for "Persistent side effect"
indirectly confirms that initialization is always a side effect.


Hmm, that's interesting: There's indeed an example with an initializer
there. Yet to me the text you quote from the C standard does not say
that initialization is a side effect - it would be "modifying an
object" aiui, yet ahead of initialization being complete the object
doesn't "exist" imo, and hence can be "modified" only afterwards.


I feel it's becoming a bit too philosophical. Since there's some room
for interpretation and only two violations left to address, I believe
it's best to stick with the stricter interpretation of the definition.
Therefore, I'd proceed with this series in its current form.


Proceeding with the series in its current form may be okay (as you say,
you view the changes as readability improvements anyway), but imo the
interpretation needs settling on no matter what. In fact even for these
two patches it may affect what their descriptions ought to say (would
be nice imo to avoid permanently recording potentially misleading
information by committing as is). And of course clarity would help
dealing with future instances that might appear. I take it you realize
that if someone had submitted a patch adding code similar to the
original forms of what's being altered here, it would be relatively
unlikely for a reviewer to spot the issue. IOW here we're making
ourselves heavily dependent upon Eclair spotting (supposed) issues,
adding extra work and delays for such changes to go in.


You can do two things to obtain a second opinion:

1) Use the MISRA forum (here is the link to the forum
 section devoted to the side-effect rules of MISRA C:2012
 and MISRA C:2023 (https://forum.misra.org.uk/forumdisplay.php?fid=168).
 The MISRA C Working Group will, in due course, provide
 you with an official answer to your questions about what,
 for the interpretation of Rule 13.6, has to be considered
 a side effect.

2) Reach out to your ISO National Body and try to obtain
 an official answer from ISO/IEC JTC1/SC22/WG14 (the
 international standardization working group for the
 programming language C) to your questions about what the
 C Standard considers to be side effects.


I took the latter route, and to my (positive) surprise I got back an answer
the same day. There was a request for someone to confirm, but so far I didn't
see further replies. Since this is a German institution I raised the question
in German and got back an answer in German (attached); I've tried my best to
translate this without falsifying anything, but I've omitted non-technical
parts:

"Initialization of an object is never a side effect of the initialization
by itself. Of course expressions used for initialization can themselves have
side effects on other objects.

@Andreas: Do you agree? C after all has a far simpler object model than C++.
The (potential) change in object representation (i.e. the bytes underlying
the object) is not a side effect of object initialization, but its primary
purpose."

Further for M

Re: [XEN PATCH v2 1/3] EFI: address a violation of MISRA C Rule 13.6

2024-09-30 Thread Roberto Bagnara

On 2024-09-30 15:07, Jan Beulich wrote:

On 30.09.2024 14:49, Federico Serafini wrote:

guest_handle_ok()'s expansion contains a sizeof() involving its
first argument which is guest_handle_cast().
The expansion of the latter, in turn, contains a variable
initialization.

Since MISRA considers the initialization (even of a local variable)
a side effect, the chain of expansions mentioned above violates
MISRA C:2012 Rule 13.6 (The operand of the `sizeof' operator shall not
contain any expression which has potential side effect).


I'm afraid I need to ask for clarification of terminology and alike here.
While the Misra doc has a section on Persistent Side Effects in its
Glossary appendix, what constitutes a side effect from its pov isn't
really spelled out anywhere. Which in turn raises the question whether it
is indeed Misra (and not just Eclair) which deems initialization a side
effect. This is even more so relevant as 13.6 talks of only expressions,
yet initializers fall under declarations (in turn involving an expression
on the rhs of the equal sign).

All the same of course affects patch 2 then, too.


MISRA C leaves the definition of "side effect" to the C Standard.
E.g., C18 5.1.2.3p2:

  Accessing a volatile object, modifying an object, modifying a file,
  or calling a function that does any of those operations are all
  side effects,[omitted irrelevant footnote reference] which are
  changes in the state of the execution environment.

The MISRA C:2012/2023 Glossary entry for "Persistent side effect"
indirectly confirms that initialization is always a side effect.

Kind regards,

   Roberto



Re: [XEN PATCH v2 1/3] EFI: address a violation of MISRA C Rule 13.6

2024-10-01 Thread Roberto Bagnara

On 2024-10-02 08:09, Jan Beulich wrote:

On 01.10.2024 23:36, Stefano Stabellini wrote:

On Tue, 1 Oct 2024, Jan Beulich wrote:

On 01.10.2024 07:25, Roberto Bagnara wrote:

On 2024-09-30 15:07, Jan Beulich wrote:

On 30.09.2024 14:49, Federico Serafini wrote:

guest_handle_ok()'s expansion contains a sizeof() involving its
first argument which is guest_handle_cast().
The expansion of the latter, in turn, contains a variable
initialization.

Since MISRA considers the initialization (even of a local variable)
a side effect, the chain of expansions mentioned above violates
MISRA C:2012 Rule 13.6 (The operand of the `sizeof' operator shall not
contain any expression which has potential side effect).


I'm afraid I need to ask for clarification of terminology and alike here.
While the Misra doc has a section on Persistent Side Effects in its
Glossary appendix, what constitutes a side effect from its pov isn't
really spelled out anywhere. Which in turn raises the question whether it
is indeed Misra (and not just Eclair) which deems initialization a side
effect. This is even more so relevant as 13.6 talks of only expressions,
yet initializers fall under declarations (in turn involving an expression
on the rhs of the equal sign).

All the same of course affects patch 2 then, too.


MISRA C leaves the definition of "side effect" to the C Standard.
E.g., C18 5.1.2.3p2:

Accessing a volatile object, modifying an object, modifying a file,
or calling a function that does any of those operations are all
side effects,[omitted irrelevant footnote reference] which are
changes in the state of the execution environment.

The MISRA C:2012/2023 Glossary entry for "Persistent side effect"
indirectly confirms that initialization is always a side effect.


Hmm, that's interesting: There's indeed an example with an initializer
there. Yet to me the text you quote from the C standard does not say
that initialization is a side effect - it would be "modifying an
object" aiui, yet ahead of initialization being complete the object
doesn't "exist" imo, and hence can be "modified" only afterwards.


I feel it's becoming a bit too philosophical. Since there's some room
for interpretation and only two violations left to address, I believe
it's best to stick with the stricter interpretation of the definition.
Therefore, I'd proceed with this series in its current form.


Proceeding with the series in its current form may be okay (as you say,
you view the changes as readability improvements anyway), but imo the
interpretation needs settling on no matter what. In fact even for these
two patches it may affect what their descriptions ought to say (would
be nice imo to avoid permanently recording potentially misleading
information by committing as is). And of course clarity would help
dealing with future instances that might appear. I take it you realize
that if someone had submitted a patch adding code similar to the
original forms of what's being altered here, it would be relatively
unlikely for a reviewer to spot the issue. IOW here we're making
ourselves heavily dependent upon Eclair spotting (supposed) issues,
adding extra work and delays for such changes to go in.


You can do two things to obtain a second opinion:

1) Use the MISRA forum (here is the link to the forum
   section devoted to the side-effect rules of MISRA C:2012
   and MISRA C:2023 (https://forum.misra.org.uk/forumdisplay.php?fid=168).
   The MISRA C Working Group will, in due course, provide
   you with an official answer to your questions about what,
   for the interpretation of Rule 13.6, has to be considered
   a side effect.

2) Reach out to your ISO National Body and try to obtain
   an official answer from ISO/IEC JTC1/SC22/WG14 (the
   international standardization working group for the
   programming language C) to your questions about what the
   C Standard considers to be side effects.

Kind regards,

   Roberto

Roberto Bagnara, Ph.D.

Software Verification Expert and Evangelist, BUGSENG (http://bugseng.com)
Professor of Computer Science, University of Parma
Member, ISO/IEC JTC1/SC22/WG14 - C Standardization Working Group
Member, MISRA C Working Group



Re: [PATCH 0/7] Fix MISRA C 2012 Rule 20.7 violations

2022-09-18 Thread Roberto Bagnara

On 03/09/22 02:52, Stefano Stabellini wrote:

+Roberto

I think we need Roberto's advice on Rule 20.7. (Full thread below.)


Hi there, sorry for the delay: I missed this message.
Please see below, where I took the freedom of rearranging the
cases.


The question is on the interpretation of Rule 20.7. Are parenthesis
required by Rule 20.7 in the following cases:

- macro parameters used as function arguments

> [...]
> - macro parameter used as lhs in assignment

You can obtain different semantics depending on whether parentheses
are or are not used (in the macro call and/or macro expansion
depending on the case):


#include 

void g(int v) {
  printf("%d\n", v);
}

#define m1(x, y, ...) g(y)

void f1(int x, int y, ...) {
  g(y);
}

#define p 0, 1

void test1() {
  m1(p, 2);
  f1(p, 2);
}

#define m4(x) x = 4

void f4(int &x) {
  x = 4;
}


void test4() {
  int y;
  int z;
  z = 3;
  m4(y = z);
  printf("%d\n", z);
  z = 3;
  f4(y = z);
  printf("%d\n", z);
}

int main() {
  test1();
  test4();
}


- macro parameters used as macro arguments


Please note that Rule 20.7 depends on the final expansion:
so whether parentheses are or are not used in a certain
macro body is irrelevant, the point being that, at the
end of all expansions, expressions resulting from the
expansion of macro parameters are enclosed in parentheses.


- macro parameter used as array index


This is safe today, but my understanding is that in C++23
the [] operator will accept more than one expression.
A similar change might (who knows?) be considered for C26
or even offered before (intentionally or by mistake) by some
C compiler.


Some of these cases are interesting because they should function
correctly even without parenthesis, hence the discussion. In particular
parenthesis don't seem necessary at least for the function argument
case.


This is not the right spirit for MISRA compliance: why would you want
splitting hairs when inserting a pair of parentheses is so easy?
C and C++ are very complex languages, and the MISRA coding standards
are the result of a (very difficult!) compromise between simplicity
and effectiveness: rules that are exactly targeted to all and only all
the problematic instances would be very difficult to express and to remember.
So, yes: in many cases you might spend time to demonstrate that a particular
(real) MISRA violation does not imply the existence of a real issue,
but this time is not well spent.  Critical code must be boring and obviously
right, in the sense that whomever is reading the code should not be
distracted by thoughts like "there are no parentheses here: am I sure
nothing bad can happen?"


Regardless of the MISRA C interpretation, Xenia noticed that Eclair
reports violations on these cases (cppcheck does not, I don't know other
checkers).


I am not aware of any false positives (or flse negatives) for the
current version of ECLAIR on Rule 20.7.  Nonetheless, ECLAIR can
be configured to selectively deviate on each of the cases mentioned
above by means of checker configuration.  However, as I said,
it only makes sense deviating the rule in the cases where you are
not allowed to add the parentheses (e.g., because both the macro
definition and the macro invocations are in legacy code you are
not allowed to touch).

In contrast, cppcheck is no more than a toy when MISRA compliance
is concerned.  It claims to support 153 out of 175 MISRA C:2012 guidelines.
For 103 of those 153 it has a significant number of false negatives (FN)
and false positives (FP).  I recently participated to an evaluation
of cppcheck 2.8 and here is a summary I can disclose:

Rule 1.3   FP
Rule 2.1   FN
Rule 2.2   FN+FP
Rule 2.4   FN+FP
Rule 2.5   FP
Rule 2.7   FP
Rule 3.2   FN
Rule 4.2   FN
Rule 5.1   FP
Rule 5.3   FN
Rule 5.6   FN+FP
Rule 5.7   FN+FP
Rule 5.8   FN+FP
Rule 5.9   FN+FP
Rule 6.1   FN+FP
Rule 7.1   FN
Rule 7.3   FN
Rule 7.4   FN+FP
Rule 8.1   FN
Rule 8.2   FN+FP
Rule 8.3   FN
Rule 8.4   FP
Rule 8.5   FN+FP
Rule 8.6   FP
Rule 8.7   FN
Rule 8.8   FN
Rule 8.9   FN
Rule 8.10  FN
Rule 8.13  FN
Rule 8.14  FP
Rule 9.1   FN+FP
Rule 9.3   FN
Rule 10.1  FN
Rule 10.2  FN
Rule 10.3  FN+FP
Rule 10.4  FP
Rule 10.5  FN+FP
Rule 10.6  FP
Rule 10.7  FN+FP
Rule 10.8  FP
Rule 11.1  FN+FP
Rule 11.2  FN
Rule 11.3  FN+FP
Rule 11.4  FP
Rule 11.5  FP
Rule 11.7  FN
Rule 11.8  FN+FP
Rule 11.9  FN
Rule 12.1  FN
Rule 12.2  FP
Rule 12.3  F

Re: [PATCH 0/9] MISRA C 2012 8.1 rule fixes

2022-06-23 Thread Roberto Bagnara

Hi there.

Rule 8.1 only applies to C90 code, as all the violating instances are
syntax errors in C99 and later versions of the language.  So,
the following line does not contain a violation of Rule 8.1:

unsigned x;

It does contain a violation of Directive 4.6, though, whose correct
handling depends on the intention (uint32_t, uin64_t, size_t, ...).

As a side note (still on the theme of the many ways of referring
to a concrete type) Rule 6.1 says not to use plain int for a bitfield
and using an explicitly signed or unsigned type instead.
(Note that Directive 4.6 does not apply to bitfield types.)
So

int field1:2;

is not compliant; the following are compliant:

signed int   field1:2;
unsigned int field2:3;

But also the following are compliant, and we much favor
this variant as the specification of "int" buys nothing
and can even mislead someone into thinking that more bits
are reserved:

signed   field1:2;
unsigned field2:3;

I mention this to encourage, as a matter of style, not to add
"int" on bitfield types currently specified as "signed" or "unsigned".
Kind regards,

   Roberto

On 22/06/22 21:23, Stefano Stabellini wrote:

+Roberto


Hi Roberto,

A quick question about Rule 8.1.


Michal sent a patch series to fix Xen against Rule 8.1 (here is a link
if you are interested: https://marc.info/?l=xen-devel&m=165570851227125)

Although we all generally agree that the changes are a good thing, there
was a question about the rule itself. Specifically, is the following
actually a violation?

   unsigned x;


Looking through the examples in the MISRA document I can see various
instances of more confusing and obvious violations such as:

   const x;
   extern x;

but no examples of using "unsigned" without "int". Do you know if it is
considered a violation?


Thanks!

Cheers,

Stefano



On Wed, 22 Jun 2022, Jan Beulich wrote:

On 22.06.2022 12:25, Jan Beulich wrote:

On 20.06.2022 09:02, Michal Orzel wrote:

This series fixes all the findings for MISRA C 2012 8.1 rule, reported by
cppcheck 2.7 with misra addon, for Arm (arm32/arm64 - target allyesconfig).
Fixing this rule comes down to replacing implicit 'unsigned' with explicit
'unsigned int' type as there are no other violations being part of that rule
in the Xen codebase.


I'm puzzled, I have to admit. While I agree with all the examples in the
doc, I notice that there's no instance of "signed" or "unsigned" there.
Which matches my understanding that "unsigned" and "signed" on their own
(just like "long") are proper types, and hence the omission of "int"
there is not an "omission of an explicit type".


[...]


Neither the name of the variable nor the comment clarify that this is about
the specific case of "unsigned". As said there's also the fact that they
don't appear to point out the lack of "int" when seeing plain "long" (or
"long long"). I fully agree that "extern x;" or "const y;" lack explicit
"int".




Re: [PATCH 0/9] MISRA C 2012 8.1 rule fixes

2022-06-23 Thread Roberto Bagnara



Hi Jan.

I know I will sound pedantic ;-)  but an important fact about
the MISRA standards is that reading the headline alone is almost
never enough.  In the specific of (advisory) Directive 4.6,
the Rationale says, among other things:

It might be desirable not to apply this guideline when
interfacing with The Standard Library or code outside
the project’s control.

For this reason, size_t is typically set as an exception in the
tool configuration.  To properly deal with the many Standard Library
functions returning int, one can use a typedef named something
like "lib_int_t" to write, e.g.,

  const lib_int_t r = strncmp(...);

The lib_int_t typedef can be used with a suitable tool configuration,
just as I mentioned one would do with size_t.
Kind regards,

   Roberto

On 23/06/22 09:51, Jan Beulich wrote:

On 23.06.2022 09:37, Roberto Bagnara wrote:

Rule 8.1 only applies to C90 code, as all the violating instances are
syntax errors in C99 and later versions of the language.  So,
the following line does not contain a violation of Rule 8.1:

  unsigned x;

It does contain a violation of Directive 4.6, though, whose correct
handling depends on the intention (uint32_t, uin64_t, size_t, ...).


Interesting - this goes straight against a rule we have set in
./CODING_STYLE. I'm also puzzled by you including size_t in your list
of examples, when the spec doesn't. The sole "goal" of the directive
(which is advisory only anyway) is to be able to determine allocation
size. size_t size, however, varies as much as short, int, long, etc
do.

Jan




Re: MISRA C Rule 20.7 disambiguation

2022-12-19 Thread Roberto Bagnara

On 2022-12-09 23:46, Stefano Stabellini wrote:

Eclair marks it as a violation too. Eclair thinks "nr" needs
parenthesis. Roberto, we have already discussed how the comma operator
"," being the lower precedence doesn't require extra parenthesis.
Roberto, what's your take on the [] square brakets?


Yes, we agreed upon the fact that square brackets are as good as
parentheses, but the ECLAIR configuration on eclairit.com was not
updated consequently: now it should be OK.



Invalid _Static_assert expanded from HASH_CALLBACKS_CHECK

2021-05-23 Thread Roberto Bagnara



Hi there.

I stumbled upon parsing errors due to invalid uses of
_Static_assert expanded from HASH_CALLBACKS_CHECK where
the tested expression is not constant, as mandated by
the C standard.

Judging from the following comment, there is partial awareness
of the fact this is an issue:

#ifndef __clang__ /* At least some versions dislike some of the uses. */
#define HASH_CALLBACKS_CHECK(mask) \
BUILD_BUG_ON((mask) > (1U << ARRAY_SIZE(callbacks)) - 1)

Indeed, this is not a fault of Clang: the point is that some
of the expansions of this macro are not C.  Moreover,
the fact that GCC sometimes accepts them is not
something we can rely upon:

$ cat p.c
void f() {
static const int x = 3;
_Static_assert(x < 4, "");
}
$ gcc -c -O p.c
$ gcc -c p.c
p.c: In function ‘f’:
p.c:3:20: error: expression in static assertion is not constant
3 | _Static_assert(x < 4, "");
| ~^~
$

Finally, I think this can be easily avoided: instead
of initializing a static const with a constant expression
and then static-asserting the static const, just static-assert
the constant initializer.

Kind regards,

   Roberto Bagnara



Re: Invalid _Static_assert expanded from HASH_CALLBACKS_CHECK

2021-05-28 Thread Roberto Bagnara

Hi Jan.

Please see below.

On 25/05/21 10:58, Jan Beulich wrote:

On 24.05.2021 06:29, Roberto Bagnara wrote:

I stumbled upon parsing errors due to invalid uses of
_Static_assert expanded from HASH_CALLBACKS_CHECK where
the tested expression is not constant, as mandated by
the C standard.

Judging from the following comment, there is partial awareness
of the fact this is an issue:

#ifndef __clang__ /* At least some versions dislike some of the uses. */
#define HASH_CALLBACKS_CHECK(mask) \
  BUILD_BUG_ON((mask) > (1U << ARRAY_SIZE(callbacks)) - 1)

Indeed, this is not a fault of Clang: the point is that some
of the expansions of this macro are not C.  Moreover,
the fact that GCC sometimes accepts them is not
something we can rely upon:

$ cat p.c
void f() {
static const int x = 3;
_Static_assert(x < 4, "");
}
$ gcc -c -O p.c
$ gcc -c p.c
p.c: In function ‘f’:
p.c:3:20: error: expression in static assertion is not constant
3 | _Static_assert(x < 4, "");
| ~^~
$


I'd nevertheless like to stick to this as long as not proven
otherwise by future gcc.


Just two observations:

1) Violating the C standard makes MISRA complicance significantly
   more difficult.  In addition, it complicates also compiler
   qualification, for those who are required to do it.

2) GCC is already proving otherwise: if you try compiling
   without optimization, compilation fails.

Kind regards,

   Roberto



Re: [RFC PATCH] xen: Remove -Wdeclaration-after-statement

2024-08-12 Thread Roberto Bagnara

On 09/08/24 21:25, Stefano Stabellini wrote:

Adding Roberto

Does MISRA have a view on this? I seem to remember this is discouraged?


As far as I know, there is nothing in MISRA C against or in favor of
mixing declaration with statements.  The only (slightly) relevant
guideline is advisory Rule 8.9 (An object should be declared at block
scope if its identifier only appears in a single function), which advises
to declare objects at function scope when this is possible.
The rationale of the same rule says, among other things:

  Within a function, whether objects are declared at the outermost
  or innermost block is largely a matter of style.

On the other hand, MISRA C recognizes that reduction of scope
is, all other things being equal, to be preferred, but there
are no guidelines similar to  -Wdeclaration-after-statement.
Rather, the point is "declare it at block scope, if you can;
otherwise declare it at file scope, if you can; otherwise,
declare it at global scope, if you must."
Kind regards,

   Roberto


On Fri, 9 Aug 2024, Alejandro Vallejo wrote:

This warning only makes sense when developing using a compiler with C99
support on a codebase meant to be built with C89 compilers too, and
that's no longer the case (nor should it be, as it's been 25 years since
C99 came out already).

Signed-off-by: Alejandro Vallejo 
---
Yes, I'm opening this can of worms. I'd like to hear others people's
thoughts on this and whether this is something MISRA has views on. If
there's an ulterior non-obvious reason besides stylistic preference I
think it should be documented somewhere, but I haven't seen such an
explanation.

IMO, the presence of this warning causes several undesirable effects:

   1. Small functions are hampered by the preclusion of check+declare
  patterns that improve readability via concision. e.g: Consider a
  silly example like:

  /* with warning */ /* without warning */
  void foo(uint8_t *p)   void foo(uint8_t *p)
  {  {
  uint8_t  tmp1; if ( !p )
  uint16_t tmp2; return;
  uint32_t tmp3;
 uint8_t  tmp1 = OFFSET1 + *p;
  if ( !p )  uint16_t tmp2 = OFFSET2 + *p;
  return;uint32_t tmp3 = OFFSET3 + *p;

  tmp1 = OFFSET1 + *p;   /* Lots of uses of `tmpX` */
  tmp2 = OFFSET2 + *p;   }
  tmp2 = OFFSET2 + *p;

  /* Lots of uses of tmpX */
  }

   2. Promotes scope-creep. On small functions it doesn't matter much,
  but on bigger ones to prevent declaring halfway through the body
  needlessly increases variable scope to the full scope in which they
  are defined rather than the subscope of point-of-declaration to
  end-of-current-scope. In cases in which they can be trivially
  defined at that point, it also means they can be trivially misused
  before they are meant to. i.e: On the example in (1) assume the
  conditional in "with warning" is actually a large switch statement.

   3. It facilitates a disconnect between time-of-declaration and
  time-of-use that lead very easily to "use-before-init" bugs.
  While a modern compiler can alleviate the most egregious cases of
  this, there's cases it simply cannot cover. A conditional
  initialization on anything with external linkage combined with a
  conditional use on something else with external linkage will squash
  the warning of using an uninitialised variable. Things are worse
  where the variable in question is preinitialised to something
  credible (e.g: a pointer to NULL), as then that can be misused
  between its declaration and its original point of intended use.

So... thoughts? yay or nay?


In my opinion, there are some instances where mixing declarations and
statements would enhance the code, but these are uncommon. Given the
choice between:

1) declarations always first
2) declarations always mixed with statements

I would choose 1).

One could say that mixing declarations with statements should be done
only when appropriate, but the challenge lies in the subjective nature
of "when appropriate." Different individuals have varying
interpretations of this, which could lead to unnecessary bikeshedding.

For these reasons, I do not support mixing declarations and statements.





Violations of mandatory MISRA C:2012 Rule 19.1 in X86_64 build

2023-07-11 Thread Roberto Bagnara



Hi there.

Mandatory Rule 19.1 (An object shall not be assigned or copied to an
overlapping object) is directly targeted at two undefined behaviors,
one of which is the subject of 6.5.16.1p3, namely:

  If the value being stored in an object is read from another object
  that overlaps in any way the storage of the first object, then the
  overlap shall be exact and the two objects shall have qualified or
  unqualified versions of a compatible type; otherwise, the behavior
  is undefined.

You can see a number of definite violations in the X86_64 build
at this link:

  
https://saas.eclairit.com:3787/fs/var/local/eclair/XEN.ecdf/ECLAIR_normal/origin/staging/X86_64-Set1/149/PROJECT.ecd;/by_service/MC3R1.R19.1.html

As the rule is mandatory, it cannot be deviated.
Kind regards,

   Roberto



Xen reliance on non-standard GCC features

2023-06-04 Thread Roberto Bagnara

Hi there.

It appears Xen uses lots of GCC features that are outside the C99
standard.  Some of them are documented GNU extensions to the language.
Some of them seem not to be documented, so they do not qualify
as language extensions (at least, not as far as the spirit and letter
of functional-safety standards are concerned).

Language extensions come with two problems:

a) They limit portability.
b) They significantly increase the cost of compiler qualification
   (commercial C compiler validation suites do not cover the extensions,
   and the cost of the extra validation effort will have to be born
   by whomever wants to use Xen for safety-related development).

Said that, maybe the use of certain extensions is intentional, but for
others it may be unintentional.  The purpose of this message is to
distinguish the ones from the others.  In the following, when I mention
the C99 standard, I refer to compiler uses with -std=c99 or -std=gnu99.

Let us start from the (as far as I can tell) undocumented extensions:

U1) Use of _Static_assert in C99 mode.

U2) Empty initialization lists, both in C99 mode (ARM64 and X86_64)
and C18 mode (only X86_64).

U3) Returning void expressions.

U4) Static functions or variables used in inline functions with external
linkage.

U5) Enumerator values outside the range of `int'.

U6) Empty declarations.

U7) Empty enum definitions.

U8) Conversion between incompatible pointer types.

U9) Conversion between function pointers and object pointers.

U10) \m escape sequence.
 Is this an undocumented GCC extension or just a typo?

In case you know where these are documented in the GCC manual,
I would appreciate if you could tell me.

Here is a list of extensions that are documented in the GCC manual:

D1) Arithmetic operators on `void *' type:
See Section "6.24 Arithmetic on void- and Function-Pointers"
of the GCC manual."

D2) Sizeof and alignof on `void *' type:
See Section "6.24 Arithmetic on void- and Function-Pointers"
of the GCC manual."

D3) Use of GNU statement expressions.
See Section "6.1 Statements and Declarations in Expressions"
of the GCC manual.

D4) Use of GNU statement expressions from macro expansion.
See Section "6.1 Statements and Declarations in Expressions"
of the GCC manual.
(My advice is to read it fully, as there are implications.)

D5) Record declarations with no members.
See Section "6.19 Structures with No Members" of the GCC manual."

D6) Structures containing a flexible array member as member of a structure.
See Section "6.18 Arrays of Length Zero" of the GCC manual.

D7) Structures containing a flexible array member as element of an array.
See Section "6.18 Arrays of Length Zero" of the GCC manual.

D8) Binary conditional operator.
See Section "6.8 Conditionals with Omitted Operands" of the GCC manual.

D9) Case labels with upper/lower values.
See Section "6.30 Case Ranges" of the GCC manual.

D10) Anonymous structure and union fields.
 See Section "6.63 Unnamed Structure and Union Fields" of the GCC manual.

D11) Variadic macros called without any argument for '...'
 See Section "6.21 Macros with a Variable Number of Arguments"
 of the GCC manual.

D12) Forward references to enum definitions.
 See Section "6.49 Incomplete enum Types" of the GCC manual.

Finally, Xen seems to rely on explicitly undefined behavior, namely
C99 UB 58: "A structure or union is defined as containing no named
members (6.7.2.1)." All instances but one occur via macro BUILD_BUG_ON_ZERO,
the remaining instance concerns struct
`cpu_policy'.
Reliance on undefined behavior might be hard to justify.

I look forward to receiving your feedback.
Kind regards,

   Roberto



Re: Xen reliance on non-standard GCC features

2023-06-05 Thread Roberto Bagnara

On 05/06/23 09:35, Jan Beulich wrote:

On 05.06.2023 07:28, Roberto Bagnara wrote:

Finally, Xen seems to rely on explicitly undefined behavior, namely
C99 UB 58: "A structure or union is defined as containing no named
members (6.7.2.1)." All instances but one occur via macro BUILD_BUG_ON_ZERO,
the remaining instance concerns struct
`cpu_policy'.
Reliance on undefined behavior might be hard to justify.


For starters just a comment here: I'm pretty sure this case was discussed
before, and that it was agreed that if need be we'd simply add _ as the
name there. Yet then - how did you notice this? Code inspection? I would
expect no analysis tool would spot it because it's used for gcc < 4.6
only. And I doubt you run analysis tools in combination with this old a
gcc?


It is detected by ECLAIR, but the analysis is based on GCC 12,
and the detection of implementation-defined behaviors (including
predefined macros) is completely automatic, so I don't think there
is any configuration error.

One of the instances arises from

xen/arch/arm/dm.c:50.19-50.37:
all the members of struct `dm_op(const struct dmop_args*)::' are 
unnamed (undefined for the C99 standard, ISO/IEC 9899:1999 Annex J.2 item 58: "A structure 
or union is defined as containing no named members (6.7.2.1)." [STD.anonstct]). Tool used 
is `/usr/bin/aarch64-linux-gnu-gcc-12'

and, in turn:

xen/include/xen/lib.h:54.12-54.17: expanded from macro `BUILD_BUG_ON_ZERO'
xen/include/xen/compiler.h:126.3-126.77: expanded from macro `__must_be_array'
xen/include/xen/lib.h:77.53-77.70: expanded from macro `ARRAY_SIZE'

I apologize in advance if I have misunderstood something.
Kind regards,

   Roberto



Re: Xen reliance on non-standard GCC features

2023-06-05 Thread Roberto Bagnara

On 05/06/23 10:58, Andrew Cooper wrote:

On 05/06/2023 6:28 am, Roberto Bagnara wrote:

U10) \m escape sequence.
  Is this an undocumented GCC extension or just a typo?


Where are you seeing this?

The only examples I see are in asm macros, and they're all parameter
substitutions.

This includes the one in x86's bug.h where it's a parameter substitution
into an .ascii string literal, not an escaped special character in the
literal.


The point is that escape sequences must be considered for tokenization,
which happens in translation phase 3, as they are relevant for phase 4,
which is where preprocessing directives are processed and removed.

See C99 Section "5.1.1.2 Translation phases" for the definition of the
phases.  So \m is non-compliant even in areas guarded by

#if defined(__ASSEMBLY__)

and a conforming compiler should flag it.  On this, see footnote 65 to 
6.4.4.4p8,
where the 'm' in '\m' is one of "any other character":

  65) The semantics of these characters were discussed in 5.2.2.
  If any other character follows a backslash, the result is not a token
  and a diagnostic is required. See ‘‘future language directions’’ (6.11.4).

Kind regards,

   Roberto

P.S. Note on passing that the comments on the preprocessing directives
 are a bit misleading due to the use of the exclamation mark
 (IMHO "defined(__ASSEMBLY__)" would be better than "!__ASSEMBLY__"):

#ifndef __ASSEMBLY__

...

#else  /* !__ASSEMBLY__ */

...

#endif /* !__ASSEMBLY__ */




Re: Xen reliance on non-standard GCC features

2023-06-05 Thread Roberto Bagnara

On 05/06/23 11:28, Jan Beulich wrote:

On 05.06.2023 07:28, Roberto Bagnara wrote:

U1) Use of _Static_assert in C99 mode.

U2) Empty initialization lists, both in C99 mode (ARM64 and X86_64)
  and C18 mode (only X86_64).

U3) Returning void expressions.


As per above, tiny extensions like these are, I think, unlikely to be
mentioned anywhere explicitly (or more than in passing). For the last of
the three it may further be that it pre-dates when gcc started to
properly document extensions. Oh, actually - U3 is documented along with
-Wreturn-type.


Noted: thanks!


Uses are generally intentional afaik, but eliminating cases of U2 and U3
would likely be possible with just slight overall impact.


Ok.  As this has an impact on MISRA compliance at some stage we will need
an official position on the subject.


As to U2, it's not clear why you distinguish C99 and C18 mode.


I specified that because for MISRA compliance we need to stick
to one version of the language: while most translation units
are compiled in C99 mode, some are compiled in C18 mode and some
in C90 mode.  However, I agree this is OT for the current discussion.


Throughout
this summary of yours it would likely have been helpful if an example was
provided for the behavior your describing, when the wording doesn't make
it crystal clear (e.g. no example needed for U1 and U3 above).


You are right: here are a few examples for U2:

xen/arch/arm/cpuerrata.c:92.12-92.35:
empty initializer list (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7.8: 
"An empty initialization list." [STD.emptinit]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'
xen/include/xen/spinlock.h:31.21-31.23: expanded from macro `_LOCK_DEBUG'
xen/include/xen/spinlock.h:143.57-143.67: expanded from macro 
`SPIN_LOCK_UNLOCKED'
xen/include/xen/spinlock.h:144.43-144.60: expanded from macro `DEFINE_SPINLOCK'

xen/arch/arm/cpuerrata.c:678.5-678.6:
empty initializer list (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7.8: 
"An empty initialization list." [STD.emptinit]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'

xen/arch/arm/cpufeature.c:33.5-33.6:
empty initializer list (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7.8: 
"An empty initialization list." [STD.emptinit]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'



As to U1, I'm afraid this statement very early in gcc's section
documenting C extensions simply hasn't been properly updated for newer
versions of the standard: "Some features that are in ISO C99 but not C90
or C++ are also, as extensions, accepted by GCC in C90 mode and in C++."
A somewhat similar statement in the middle of 2.1 "C Language" is
slightly better, thus covering at least the specific case of
_Static_assert.


Noted: thanks!


U4) Static functions or variables used in inline functions with external
  linkage.


There's not a lot of "extern inline" that we've accumulated so far, I
think. The only ones I'm aware of are sort() and bsearch(), and there
the use is precisely for allowing the compiler to optimize away function
calls.

The documentation of this functionality is that of the gnu_inline
function attribute, afaict. That would be 6.33.1 "Common Function
Attributes" in 13.1.0 doc.


No, it is not that one:

xen/common/spinlock.c:316.29-316.40:
static function `observe_head(spinlock_tickets_t*)' is used in an inline function with 
external linkage (ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed 
source detected by the parser." [STD.diag/ext_internal_in_extern_inline_quiet]). 
Tool used is `/usr/bin/aarch64-linux-gnu-gcc-12'
xen/common/spinlock.c:301.26-301.37:
`observe_head(spinlock_tickets_t*)' declared here
xen/include/xen/spinlock.h:180.1-180.4:
use 'static' to give inline function `_spin_lock_cb(spinlock_t*, 
void(*)(void*), void*)' internal linkage


U5) Enumerator values outside the range of `int'.


Examples:

xen/arch/x86/include/asm/guest/hyperv-tlfs.h:477.9-477.26: Loc #1 [culprit: ISO C 
restricts enumerator values to range of 'int' (2147483648 is too large) (ill-formed for 
the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the 
parser." [STD.diag/ext_enum_value_not_int]). Tool used is 
`/usr/bin/x86_64-linux-gnu-gcc-12']
xen/arch/x86/include/asm/guest/hyperv-tlfs.h:477.43-477.52: Loc #2 [evidence: 
ISO C restricts enumerator values to range of 'int' (2147483648 is too large)]

xen/arch/x86/include/asm/guest/hyperv-tlfs.h:478.9-478.27: Loc #1 [culprit: ISO C 
restricts enumerator values to range of 'int' (2147483649 is too large) (ill-formed for 
the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected by the 
parser." [STD.diag/ext_enum_value_not_int]). Tool used is 
`/usr/bin/x86_6

Re: [PATCH] docs/misra: new rules addition

2023-06-08 Thread Roberto Bagnara



Hi there.

Please see below.

On 07/06/23 23:53, Stefano Stabellini wrote:

On Wed, 7 Jun 2023, Jan Beulich wrote:

+   * - `Rule 5.6 
`_
+ - Required
+ - A typedef name shall be a unique identifier
+ -


Considering that the rule requires uniqueness across the entire code
base (and hence precludes e.g. two functions having identically named
local typedefs), I'm a little puzzled this was adopted. I for one
question that a use like the one mentioned is really at risk of being
confusing. Instead I think that the need to change at least one of
the names is at risk of making the code less readable then, even if
ever so slightly. (All of this said - I don't know if we have any
violations of this rule.)


I don't think we have many local typedefs and I think we have only few
violations if I remember right. I'll let Roberto confirm how many. This
rule was considered not a difficult rule (some difficult rules were
found, namely 2.1, 5.5 and 7.4.)


There currently are 30 violations for ARM64:

- some involve a typedef module_name (in the call it was said this should
  be renamed module_name_t, which will solve the issue);
- most concern repeated typedefs (UINT64 and similar) which are repeated
  in xen/arch/arm/include/asm/arm64/efibind.h
  and xen/include/acpi/actypes.h
- ret_t and phys_addr_t are also redefined in a couple of places.

There are 90 violations for X86_64, for the same reasons, plus

- another set of typedefs for basic types (such as u8);
- repeated typedefs for things like guest_l1e_t in the same header file:

xen/arch/x86/include/asm/guest_pt.h:60.39-60.49:
for program `xen/.xen-syms.0', the identifier for typedef `guest_l1e_t' is 
reused
xen/arch/x86/include/asm/guest_pt.h:128.22-128.32:
for program `xen/.xen-syms.0', the identifier for typedef `guest_l1e_t' is 
reused

The indicated lines read as follows:

typedef struct { guest_intpte_t l1; } guest_l1e_t;
typedef l1_pgentry_t guest_l1e_t;


+   * - `Rule 6.1 
`_
+ - Required
+ - Bit-fields shall only be declared with an appropriate type
+ -


This requires you have settled on what "an appropriate type" is, i.e.
whether our uses of e.g. types wider than int is meant to become a
deviation, or will need eliminating. I suppose the outcome of this
could do with mentioning as a remark here.


Yes, Roberto showed the "appropriate types" for gcc and there were
plenty including unsigned long if I remember right. I didn't write the
full list down.

Roberto do you have the list ready? I can add it in the Notes section
here.


GCC supports all integer types including enums.  In the analyzed
configurations of the project, bit-fields are declared of the
following types, in addition to the ones the compiler *has* to
support according to C99:

ARM64:   unsigned char, unsigned short, unsigned long, unsigned long long;
X86_64:  unsigned char, unsigned short, unsigned long, enum.


@@ -143,6 +166,12 @@ existing codebase are work-in-progress.
   - Octal constants shall not be used
   -
  
+   * - `Rule 7.2 `_

+ - Required
+ - A "u" or "U" suffix shall be applied to all integer constants
+   that are represented in an unsigned type
+ -


"Represented in an unsigned type" means there is a dependency on the
target architecture and compiler capabilities: Representation can only
be determined once knowing the underlying binary ABI, and uses in #if
and alike require knowing the widest integer type's size that the
compiler supports. As a consequence this looks like it may require, in
certain cases, to add u/U conditionally. Any such conditionals pose a
risk of cluttering the code.


I don't think there is any plan to add u/U conditionally, and I can see
that would be undesirable. I think the idea is to add u/U to all
constants meant to be unsigned. But also here I'll Roberto chime in --
he said they already have a draft patch to fix this.


Yes, the patch will add U to all implicitly unsigned literals.
An open thing is whether it should also add that suffix in order
to avoid inconsistencies.  Here is an example:

/* INIT Record (for IPF) */
#define CPER_NOTIFY_INIT\
UUID_LE(0xCC5263E8, 0x9308, 0x454a, 0x89, 0xD0, 0x34, 0x0B, \
0xD3, 0x9B, 0xC9, 0x8E)
/* Non-Maskable Interrupt */
#define CPER_NOTIFY_NMI \
UUID_LE(0x5BAD89FF, 0xB7E6, 0x42c9, 0x81, 0x4A, 0xCF, 0x24, \
0x85, 0xD6, 0xE9, 0x8A)

While 0xCC5263E8 is implicitly unsigned, 0x5BAD89FF is signed.
My inclination would be to add a U suffix to both, in order to
restore consistency in addition of complying with Rule 7.2.
Someone might say "I want to minimize the number

Re: Xen reliance on non-standard GCC features

2023-06-08 Thread Roberto Bagnara

On 07/06/23 09:39, Jan Beulich wrote:

On 05.06.2023 15:26, Roberto Bagnara wrote:

On 05/06/23 11:28, Jan Beulich wrote:

On 05.06.2023 07:28, Roberto Bagnara wrote:

You are right: here are a few examples for U2:

xen/arch/arm/cpuerrata.c:92.12-92.35:
empty initializer list (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7.8: 
"An empty initialization list." [STD.emptinit]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'
xen/include/xen/spinlock.h:31.21-31.23: expanded from macro `_LOCK_DEBUG'
xen/include/xen/spinlock.h:143.57-143.67: expanded from macro 
`SPIN_LOCK_UNLOCKED'
xen/include/xen/spinlock.h:144.43-144.60: expanded from macro `DEFINE_SPINLOCK'


I'm afraid this is a bad example, as it goes hand-in-hand with using
another extension. I don't think using a non-empty initialization list
is going to work with

union lock_debug { };


Yes, this is C99 undefined behavior 58:
"A structure or union is defined as containing no named members (6.7.2.1)."

Here is another example:

lpae_t pte = {};

whereas we have

typedef union {
uint64_t bits;
lpae_pt_t pt;
lpae_p2m_t p2m;
lpae_walk_t walk;
} lpae_t;



xen/arch/arm/cpuerrata.c:678.5-678.6:
empty initializer list (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7.8: 
"An empty initialization list." [STD.emptinit]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'

xen/arch/arm/cpufeature.c:33.5-33.6:
empty initializer list (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7.8: 
"An empty initialization list." [STD.emptinit]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'


Both of these are a common idiom we use: The "sentinel" of an array
of compound type initializer.


Wouldn't it be possible writing such sentinels in a standard-compliant
way, like {0} or similar, instead of {}?


U6) Empty declarations.


Examples:

xen/arch/arm/arm64/lib/find_next_bit.c:57.29:
empty declaration (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7: 
"An empty declaration." [STD.emptdecl]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'

xen/arch/arm/arm64/lib/find_next_bit.c:103.34:
empty declaration (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7: 
"An empty declaration." [STD.emptdecl]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'


Looks like these could be taken care of by finally purging our
EXPORT_SYMBOL() stub.


xen/arch/arm/include/asm/vreg.h:143.26:
empty declaration (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7: 
"An empty declaration." [STD.emptdecl]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'

xen/arch/arm/include/asm/vreg.h:144.26:
empty declaration (ill-formed for the C99 standard, ISO/IEC 9899:1999 Section 6.7: 
"An empty declaration." [STD.emptdecl]). Tool used is 
`/usr/bin/aarch64-linux-gnu-gcc-12'


I'm having trouble spotting anything suspicious there.


The macro expands to definitions of inline functions
and after the macro invocation there is a ";".

The preprocessed code is then:

static inline void foo() { ... }
;

where the final ";" is an empty declaration not allowed by
the C99 language standard.

Removing the ";" after the macro invocation is a possible solution,
but other possibilities exist if this is strongly unwanted.

We are preparing a new list in RST format, which would go
under docs/misra and constitute a more convenient way for
the community to take note of the used extensions and
contribute to the discussion about their adequacy.



Re: [PATCH] docs/misra: new rules addition

2023-06-12 Thread Roberto Bagnara

On 09/06/23 10:46, Jan Beulich wrote:

On 08.06.2023 13:02, Roberto Bagnara wrote:

On 07/06/23 23:53, Stefano Stabellini wrote:

On Wed, 7 Jun 2023, Jan Beulich wrote:

+   * - `Rule 5.6 
<https://gitlab.com/MISRA/MISRA-C/MISRA-C-2012/Example-Suite/-/blob/master/R_05_06.c>`_
+ - Required
+ - A typedef name shall be a unique identifier
+ -


Considering that the rule requires uniqueness across the entire code
base (and hence precludes e.g. two functions having identically named
local typedefs), I'm a little puzzled this was adopted. I for one
question that a use like the one mentioned is really at risk of being
confusing. Instead I think that the need to change at least one of
the names is at risk of making the code less readable then, even if
ever so slightly. (All of this said - I don't know if we have any
violations of this rule.)


I don't think we have many local typedefs and I think we have only few
violations if I remember right. I'll let Roberto confirm how many. This
rule was considered not a difficult rule (some difficult rules were
found, namely 2.1, 5.5 and 7.4.)


There currently are 30 violations for ARM64:

- some involve a typedef module_name (in the call it was said this should
be renamed module_name_t, which will solve the issue);
- most concern repeated typedefs (UINT64 and similar) which are repeated
in xen/arch/arm/include/asm/arm64/efibind.h
and xen/include/acpi/actypes.h
- ret_t and phys_addr_t are also redefined in a couple of places.

There are 90 violations for X86_64, for the same reasons, plus

- another set of typedefs for basic types (such as u8);
- repeated typedefs for things like guest_l1e_t in the same header file:

xen/arch/x86/include/asm/guest_pt.h:60.39-60.49:
for program `xen/.xen-syms.0', the identifier for typedef `guest_l1e_t' is 
reused
xen/arch/x86/include/asm/guest_pt.h:128.22-128.32:
for program `xen/.xen-syms.0', the identifier for typedef `guest_l1e_t' is 
reused

The indicated lines read as follows:

typedef struct { guest_intpte_t l1; } guest_l1e_t;
typedef l1_pgentry_t guest_l1e_t;


So this is an example where I don't think we can sensibly do away with the
re-use of the same typedef name: We use it so we can build the same source
files multiple ways, dealing with different paging modes guests may be in.


Typedefs being used this way can be deviated with tool configuration.
Here is a list of candidates for that treatment:

guest_intpte_t
guest_l1e_t
guest_l2e_t
ret_

I am not sure about the latter.  Please let me know if this is what
you would prefer and possible additions to/removals from the above list.
Kind regards,

   Roberto





Re: [PATCH v2] docs/misra: new rules addition

2023-06-12 Thread Roberto Bagnara

On 12/06/23 09:33, Jan Beulich wrote:

On 09.06.2023 19:45, Stefano Stabellini wrote:

@@ -133,6 +146,13 @@ existing codebase are work-in-progress.
 headers (xen/include/public/) are allowed to retain longer
 identifiers for backward compatibility.
  
+   * - `Rule 6.1 `_

+ - Required
+ - Bit-fields shall only be declared with an appropriate type
+ - In addition to the C99 types, we also consider appropriate types:
+   unsigned char, unsigned short, unsigned long, unsigned long long,
+   enum.


What about their signed equivalents? I'm surprised that I found only very
few uses (in Arm insn decoding afaict), but they generally have a purpose.
Are the uses we have (and new ones which may appear) intended to become
deviations?


Explicitly signed integer types are all supported by GCC as well.
So they can be added to the list.


@@ -143,6 +163,12 @@ existing codebase are work-in-progress.
   - Octal constants shall not be used
   -
  
+   * - `Rule 7.2 `_

+ - Required
+ - A "u" or "U" suffix shall be applied to all integer constants
+   that are represented in an unsigned type
+ -


I continue to consider "represented in" problematic here without
further qualification.


We should distinguish two things here.  The headline of Rule 7.2
is non negotiable: it is simply as it is.  As all headlines,
it is a compromise between conciseness and mnemonic value.
If what is wanted there is not the headline, then you can add
"implicitly" before "represented".  Or you may leave the headline
and add an explanatory note afterwards.


@@ -314,6 +340,11 @@ existing codebase are work-in-progress.
 used following a subsequent call to the same function
   -
  
+   * - Rule 21.21

+ - Required
+ - The Standard Library function system of  shall not be used
+ -


Still no "inapplicable" note (whichever way it would be worded to also
please Roberto)?


I am not the one to be pleased ;-)

But really, I don't follow: when you say the rule is inapplicable
your reasoning is, IIUC, "nobody would even dream using system() in Xen".
Which is exactly what the rule is asking.  If Xen adopts the rule,
tooling will make sure system() is not used, and seeing that the rule
is applied, assessors will be pleased.



Re: [PATCH v2] docs/misra: new rules addition

2023-06-12 Thread Roberto Bagnara

On 12/06/23 11:50, Jan Beulich wrote:

On 12.06.2023 11:34, Roberto Bagnara wrote:

On 12/06/23 09:33, Jan Beulich wrote:

On 09.06.2023 19:45, Stefano Stabellini wrote:

@@ -143,6 +163,12 @@ existing codebase are work-in-progress.
- Octal constants shall not be used
-
   
+   * - `Rule 7.2 <https://gitlab.com/MISRA/MISRA-C/MISRA-C-2012/Example-Suite/-/blob/master/R_07_02.c>`_

+ - Required
+ - A "u" or "U" suffix shall be applied to all integer constants
+   that are represented in an unsigned type
+ -


I continue to consider "represented in" problematic here without
further qualification.


We should distinguish two things here.  The headline of Rule 7.2
is non negotiable: it is simply as it is.


I understand this, and ...


  As all headlines,
it is a compromise between conciseness and mnemonic value.
If what is wanted there is not the headline, then you can add
"implicitly" before "represented".  Or you may leave the headline
and add an explanatory note afterwards.


... such a note is what my comment was heading towards.


Here is an attempt.  "The rule asks that any integer literal
that is implicitly unsigned is made explicitly unsigned by
using one of the indicated suffixes.  As an example, on
a machine where the int type is 32-bit wide, 0x
is signed whereas 0x8000 is (implicitly) unsigned.
In order to comply with the rule, the latter should be
rewritten as either 0x8000u or 0x8000U.  Consistency
considerations may suggest using the same suffix even
when not required by the rule.  For instance, if one has

   f(0x);  // Original
   f(0x8000);

one might prefer

   f(0xU); // Solution 1
   f(0x8000U);

over

   f(0x);  // Solution 2
   f(0x8000U);

after having ascertained that "Solution 1" is compatible
with the intended semantics."



@@ -314,6 +340,11 @@ existing codebase are work-in-progress.
  used following a subsequent call to the same function
-
   
+   * - Rule 21.21

+ - Required
+ - The Standard Library function system of  shall not be used
+ -


Still no "inapplicable" note (whichever way it would be worded to also
please Roberto)?


I am not the one to be pleased ;-)

But really, I don't follow: when you say the rule is inapplicable
your reasoning is, IIUC, "nobody would even dream using system() in Xen".
Which is exactly what the rule is asking.  If Xen adopts the rule,
tooling will make sure system() is not used, and seeing that the rule
is applied, assessors will be pleased.


My point is that "not using functions of stdlib.h" is ambiguous: It may
mean functions implemented in an external library (which the hypervisor
doesn't use), or it may mean functions of identical name (and purpose).
The full text goes even further and forbids the use of these
identifiers (plural; see next paragraph), so it's clearly not only
about an external library, and we also can't put it off as inapplicable.
(I wouldn't be surprised if we had a local variable or label named
"exit" or "abort".)

Btw - I can't find a rule 21.21 in my two (slightly different) copies
of the doc, nor one with this headline and a different number. What I
have is "21.8 The Standard Library functions abort, exit and system of
 shall not be used". (I further wonder why neither of the two
docs allows me to copy-and-paste a line out of it.)


Rule 21.21 was added in MISRA C:2012 Amendment 2, which you can download
(free of charge) from 
https://www.misra.org.uk/app/uploads/2021/06/MISRA-C-2012-AMD2.pdf



Re: [PATCH] xen/evtchn: Purge ERROR_EXIT{,_DOM}()

2023-06-13 Thread Roberto Bagnara

On 13/06/23 19:45, Andrew Cooper wrote:

On 13/06/2023 6:39 pm, Julien Grall wrote:

Hi,

On 13/06/2023 17:22, Andrew Cooper wrote:

These are disliked specifically by MISRA, but they also interfere
with code


Please explicitly name the rule.


I can't remember it off the top of my head.

Stefano/Bertrand?


Rule 2.1


legibility by hiding control flow.  Expand and drop them.

   * Rearrange the order of actions to write into rc, then render rc
in the
     gdprintk().
   * Drop redundant "rc = rc" assignments
   * Switch to using %pd for rendering domains

No functional change.  Resulting binary is identical.

Signed-off-by: Andrew Cooper 


Reviewed-by: Julien Grall 


Thanks.




base-commit: f29363922c1b41310c3d87fd9a861ffa9db9204a


I notice you have a few e-mail contain this tag. This is a pretty
useful when reviewing patches. Do you know which tool is creating it?


Plain git, and the capability has been around for years and years but
nigh on impossible to search for or find in the manual.  Searching for
"base-commit" gets you a tonne of intros of how to rebase.

You want

[format]
     useAutoBase = "whenAble"

or pass --base=auto to git-format-patch

~Andrew





[XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-15 Thread Roberto Bagnara
This document specifies the C language dialect used by Xen and
the assumptions Xen makes on the translation toolchain.

Signed-off-by: Roberto Bagnara 
---
 docs/misra/C-language-toolchain.rst | 465 
 1 file changed, 465 insertions(+)
 create mode 100644 docs/misra/C-language-toolchain.rst

diff --git a/docs/misra/C-language-toolchain.rst 
b/docs/misra/C-language-toolchain.rst
new file mode 100644
index 00..013cef071c
--- /dev/null
+++ b/docs/misra/C-language-toolchain.rst
@@ -0,0 +1,465 @@
+=
+C Dialect and Translation Assumptions for Xen
+=
+
+This document specifies the C language dialect used by Xen and
+the assumptions Xen makes on the translation toolchain.
+It covers, in particular:
+
+1. the used language extensions;
+2. the translation limits that the translation toolchains must be able
+   to accommodate;
+3. the implementation-defined behaviors upon which Xen may depend.
+
+All points are of course relevant for portability.  In addition,
+programming in C is impossible without a detailed knowledge of the
+implementation-defined behaviors.  For this reason, it is recommended
+that Xen developers have familiarity with this document and the
+documentation referenced therein.
+
+This document needs maintenance and adaptation in the following
+circumstances:
+
+- whenever the compiler is changed or updated;
+- whenever the use of a certain language extension is added or removed;
+- whenever code modifications cause exceeding the stated translation limits.
+
+
+Applicable C Language Standard
+__
+
+Xen is written in C99 with extensions.  The relevant ISO standard is
+
+*ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
+Technical Corrigendum 3.
+ISO/IEC, Geneva, Switzerland, 2007.
+
+
+Reference Documentation
+___
+
+The following documents are referred to in the sequel:
+
+GCC_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
+CPP_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
+ARM64_ABI_MANUAL:
+  
https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
+X86_64_ABI_MANUAL:
+  
https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
+ARM64_LIBC_MANUAL:
+  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
+X86_64_LIBC_MANUAL:
+  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
+
+
+C Language Extensions
+_
+
+
+The following table lists the extensions currently used in Xen.
+The table columns are as follows:
+
+   Extension
+  a terse description of the extension;
+   Architectures
+  a set of Xen architectures making use of the extension;
+   References
+  when available, references to the documentation explaining
+  the syntax and semantics of (each instance of) the extension.
+
+
+.. list-table::
+   :widths: 30 15 55
+   :header-rows: 1
+
+   * - Extension
+ - Architectures
+ - References
+
+   * - Non-standard tokens
+ - ARM64, X86_64
+ - _Static_assert:
+  see Section "2.1 C Language" of GCC_MANUAL.
+   asm, __asm__:
+  see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline 
Assembly Language in C Code" of GCC_MANUAL.
+   __volatile__:
+  see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of 
GCC_MANUAL.
+   __const__, __inline__, __inline:
+  see Section "6.48 Alternate Keywords" of GCC_MANUAL.
+   typeof, __typeof__:
+  see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
+   __alignof__, __alignof:
+  see Sections "6.48 Alternate Keywords" and "6.44 Determining the 
Alignment of Functions, Types or Variables" of GCC_MANUAL.
+   __attribute__:
+  see Section "6.39 Attribute Syntax" of GCC_MANUAL.
+   __builtin_types_compatible_p:
+  see Section "6.59 Other Built-in Functions Provided by GCC" of 
GCC_MANUAL.
+   __builtin_va_arg:
+  non-documented GCC extension.
+   __builtin_offsetof:
+  see Section "6.53 Support for offsetof" of GCC_MANUAL.
+   __signed__:
+  non-documented GCC extension.
+
+   * - Empty initialization list
+ - ARM64, X86_64
+ - Non-documented GCC extension.
+
+   * - Arithmetic operator on void type
+ - ARM64, X86_64
+ - See Section "6.24 Arithmetic on void- and Function-Pointers" of 
GCC_MANUAL."
+
+   * - GNU statement expression
+ - ARM64, X86_64
+ - See Section "6.1 Statements and Declarations in Expressions" of 
GCC_MANUAL.
+
+   * - Structure or union definition with no members
+ - ARM64, X86_64
+ - See Section "6.19 Structures with No Members" of GCC_MANUAL.
+

Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-16 Thread Roberto Bagnara

On 16/06/23 08:53, Jan Beulich wrote:

On 16.06.2023 01:26, Stefano Stabellini wrote:

On Thu, 15 Jun 2023, Roberto Bagnara wrote:
I have a few comments below, mostly to clarify the description of some
of the less documented GCC extensions, for the purpose of having all
community members be able to understand what they can and cannot use.


What do you mean by "can and cannot use"? Is this document intended to
forbid the use of any extensions we may not currently use, or we use
but which aren't enumerated here?

One of the reasons that kept me from replying to this submission is
that the full purpose of this new doc isn't stated in the description.


My full purpose was to give the community a starting point for the
discussion on the assumptions the project makes on the programming
language and the translation toolchains that are intended to be used
now or in the future.  As far as I know, no documentation is currently
provided on these topics, so I believe the document fills a gap and
I hope it is good enough as a starting point.


Which in turn leaves open whether certain items actually need to be
here (see e.g. the libc related remark below).


Because the analyzed build used to included some of the tools, which in turn
relied on libc for program termination.  Once confirmation is given
that the analyzed build is now what is intended, all references to
libc can be removed.


Another is that it's
hard to tell how to convince oneself of this being an exhaustive
enumeration. One extension we use extensively yet iirc is missing here
is omission of the middle operand of the ternary operator.


Not sure I understand: do you mean something different from the following
entry in the document?

   * - Binary conditional expression
 - ARM64, X86_64
 - See Section "6.8 Conditionals with Omitted Operands" of GCC_MANUAL.



+Reference Documentation
+___
+
+The following documents are referred to in the sequel:
+
+GCC_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
+CPP_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf


Why 12.1 when meanwhile there's 12.3 and 13.1?


For no special reason: as I said, my purpose is only to provide
a starting point for discussion and customization of the
assumptions.


+ARM64_ABI_MANUAL:
+  
https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
+X86_64_ABI_MANUAL:
+  
https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
+ARM64_LIBC_MANUAL:
+  https://www.gnu.org/software/libc/manual/pdf/libc.pdf
+X86_64_LIBC_MANUAL:
+  https://www.gnu.org/software/libc/manual/pdf/libc.pdf


How is libc relevant to the hypervisor?


See above.


+   * - Empty declaration
+ - ARM64, X86_64
+ - Non-documented GCC extension.


For the non-documented GCC extensions, would it be possible to add a
very brief example or a couple of words in the "References" sections?
Otherwise I think people might not understand what we are talking about.

For instance in this case I would say:

An empty declaration is a semicolon with nothing before it.
Non-documented GCC extension.


Which then could be confused with empty statements. I think in a document
like this language needs to be very precise, to avoid ambiguities and
confusion as much as possible. (Iirc from going over this doc yesterday
this applies elsewhere as well.)


OK.


+   * - Ill-formed source detected by the parser


As we are documenting compiler extensions that we are using, I am a bit
confused by the name of this category of compiler extensions, and the
reason why they are bundled together. After all, they are all separate
compiler extensions? Should each of them have their own row?


+1


OK.


+
+   * - Unspecified escape sequence is encountered in a character constant or a 
string literal token
+ - X86_64
+ - \\m:
+  non-documented GCC extension.


Are you saying that we are using \m and \m is not allowed by the C
standard?


This exists in the __ASSEMBLY__ part of a header, and I had previously
commented on Roberto's diagnosis (possibly derived from Eclair's) here.
As per that I don't think the item should be here, but I'm of course
open to be shown that my understanding of translation phases is wrong.


I was not convinced by your explanation but, as I think I have said already,
I am not the one to be convinced.  In the specific case, independently
from __ASSEMBLY__ or any other considerations, that thing reaches the C
preprocessor and, to the best of my knowledge, the C preprocessor documentation
does not say how that would be handled.  I have spent a lot of time in the
past 10 years on the study of functional-safety standards, and what I
am providing is a honest opinion on what I believe is compliant
and what is not.  But I may be wrong of course: if you or anyone else feels
like they would not have

Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-16 Thread Roberto Bagnara

On 16/06/23 12:03, Jan Beulich wrote:

On 16.06.2023 09:45, Roberto Bagnara wrote:

On 16/06/23 08:53, Jan Beulich wrote:

On 16.06.2023 01:26, Stefano Stabellini wrote:

+   * - Unspecified escape sequence is encountered in a character constant or a 
string literal token
+ - X86_64
+ - \\m:
+  non-documented GCC extension.


Are you saying that we are using \m and \m is not allowed by the C
standard?


This exists in the __ASSEMBLY__ part of a header, and I had previously
commented on Roberto's diagnosis (possibly derived from Eclair's) here.
As per that I don't think the item should be here, but I'm of course
open to be shown that my understanding of translation phases is wrong.


I was not convinced by your explanation but, as I think I have said already,
I am not the one to be convinced.  In the specific case, independently
from __ASSEMBLY__ or any other considerations, that thing reaches the C
preprocessor and, to the best of my knowledge, the C preprocessor documentation
does not say how that would be handled.  I have spent a lot of time in the
past 10 years on the study of functional-safety standards, and what I
am providing is a honest opinion on what I believe is compliant
and what is not.  But I may be wrong of course: if you or anyone else feels
like they would not have any problems in arguing a different position
from mine in front of an assessor, then please go for it, but please
do not ask me to go beyond my judgment.


Well, disagreement on purely a technical matter can usually be resolved,
unless something is truly unspecified. Since you referred to translation
phases, and since I pointed out that preprocessing directives are carried
out before escape sequences are converted to the execution character set
(which is the point where unknown escape sequences would matter afaict),
there must be something you view differently in this process. It would be
helpful if you could point out what this is, possibly leading to me
recognizing a mistake of mine.

Actually, maybe I figured what you're concerned about: Already at the
stage of decomposing into preprocessing-token-s there is an issue, as
e.g. "\mode" doesn't form a valid string-literal. For other, unquoted
\m I would assume though that the final "each non-white-space character
that cannot be one of the above" (in the enumeration of what a
preprocessing-token is) would catch it.


Yes but, more generally, my concern is that the behavior in presence
of unspecified escape sequences is not specified in the C99 standard
and it is not a documented extension according to the documentation
I have examined.  For this reason, I don't think that feature is
usable for safety-related development unless other (potentially
quite expensive) activities are performed (such as prescribing
extra validation activities for the preprocessor).


Furthermore it is entirely unclear to me what it is that you suggest we
do instead. It can't reasonably be "name all you assembler macro
parameters such that they start with a, b, f, n, r, t, or v". Splitting
headers also wouldn't be very nice - we try to keep related things
together, after all. It also doesn't look like __stringify(\mode) would
be okay, as macro expansion shares a translation phase with execution
of preprocessing directives (so in principle the body of "#if 0" could
be macro-expanded before being discarded). (Plus I think this would
result in "\\mode", i.e. also wouldn't work in the first place. But it
would rule out other possible C macro trickery as well.)


My suggestion is avoiding the use of the C preprocessor
outside its specification.  This includes, among other
possibilities:

a) using a different preprocessor or substitution mechanism;
b) amend the preprocessor specification by, e.g., submitting
   patches with suitable additions for "The C Preprocessor"
   manual of GCC.

In view of that, naming macro parameters so that you never
have an unspecified escape sequence is probably the cheapest
(yet bulletproof) solution.
Kind regards,

   Roberto



Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-16 Thread Roberto Bagnara

On 16/06/23 01:26, Stefano Stabellini wrote:

On Thu, 15 Jun 2023, Roberto Bagnara wrote:

This document specifies the C language dialect used by Xen and
the assumptions Xen makes on the translation toolchain.

Signed-off-by: Roberto Bagnara 


Thanks Roberto for the amazing work of research and archaeology.

I have a few comments below, mostly to clarify the description of some
of the less documented GCC extensions, for the purpose of having all
community members be able to understand what they can and cannot use.

+   * - Arithmetic operator on void type
+ - ARM64, X86_64
+ - See Section "6.24 Arithmetic on void- and Function-Pointers" of 
GCC_MANUAL."
+
+   * - GNU statement expression


"GNU statement expression" is not very clear, at least for me. I would
call it "Statements and Declarations in Expressions".


Agreed.


+   * - Empty declaration
+ - ARM64, X86_64
+ - Non-documented GCC extension.


For the non-documented GCC extensions, would it be possible to add a
very brief example or a couple of words in the "References" sections?
Otherwise I think people might not understand what we are talking about.


Ok.


+   * - Incomplete enum declaration
+ - ARM64
+ - Non-documented GCC extension.


Is this 6.49 of the GCC manual perhaps?


Indeed, on a second reading, I think that section covers also the case
of an enum declaration that is never completed in the course of the
translation unit.


+   * - Implicit conversion from a pointer to an incompatible pointer
+ - ARM64, X86_64
+ - Non-documented GCC extension.


Is this related to -Wincompatible-pointer-types?


In my opinion, this does not specify what the result of the
conversion is.


+   * - Pointer to a function is converted to a pointer to an object or a 
pointer to an object is converted to a pointer to a function
+ - X86_64
+ - Non-documented GCC extension.


Is this J.5.7 of n1570?
https://www.iso-9899.info/n1570.html


This says that function pointer casts are a common extension.
What we need here is documentation for GCC that assures us
that the extension is implemented and what its semantics is.


Or maybe we should link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584


My opinion is that this might not be accepted by an assessor:
if I was an assessor, I would not accept it.


+   * - Ill-formed source detected by the parser


As we are documenting compiler extensions that we are using, I am a bit
confused by the name of this category of compiler extensions, and the
reason why they are bundled together. After all, they are all separate
compiler extensions? Should each of them have their own row?


Agreed.


+ - ARM64, X86_64
+ - token pasting of ',' and __VA_ARGS__ is a GNU extension:
+  see Section "6.21 Macros with a Variable Number of Arguments" of 
GCC_MANUAL.
+   must specify at least one argument for '...' parameter of variadic 
macro:
+  see Section "6.21 Macros with a Variable Number of Arguments" of 
GCC_MANUAL.
+   void function should not return void expression:


I understand that GCC does a poor job at documenting several of these
extensions. In fact a few of them are not even documented at all.
However, if they are extensions, they should be described for what they
do, not for the rule they violate. What do you think?


The point is that we don't know what they do.  We might make observations,
and our observations might substantiate what we believe they do.
But this would not allow us to generalize them.


For example, in this case maybe we should say "void function can return
a void expression" ?


We can certainly say that, but this might not convince an assessor.
One possibility would be to submit patches to the GCC manual and see
whether they are accepted.


+  see the documentation for -Wreturn-type in Section "3.8 Options to 
Request or Suppress Warnings" of GCC_MANUAL.
+   use of GNU statement expression extension from macro expansion:
+  see Section "6.1 Statements and Declarations in Expressions" of 
GCC_MANUAL.
+   invalid application of sizeof to a void type:
+  see Section "6.24 Arithmetic on void- and Function-Pointers" of 
GCC_MANUAL.
+   redeclaration of already-defined enum is a GNU extension:
+  see Section "6.49 Incomplete enum Types" of GCC_MANUAL.
+   static function is used in an inline function with external linkage:
+  non-documented GCC extension.


I am not sure if I follow about this one. Did you mean "static is used
in an inline function with external linkage" ?


An inline function with external linkage can be inlined everywhere.
If that calls a static functions, which is not available everywhere,
the behavior is not defined.


+   struct may not be nested in a struct due to flexible array member:
+  s

Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-19 Thread Roberto Bagnara

On 19/06/23 09:54, Jan Beulich wrote:

On 16.06.2023 17:54, Roberto Bagnara wrote:

On 16/06/23 01:26, Stefano Stabellini wrote:

On Thu, 15 Jun 2023, Roberto Bagnara wrote:

+   static function is used in an inline function with external linkage:
+  non-documented GCC extension.


I am not sure if I follow about this one. Did you mean "static is used
in an inline function with external linkage" ?


An inline function with external linkage can be inlined everywhere.
If that calls a static functions, which is not available everywhere,
the behavior is not defined.


I guess I could do with an example where this leads to UB. What I'd expect
is that it leads to a compilation error.


Here are the two occurrences we have in ARM64 code:

violation for rule MC3R1.R1.1: (required) The program shall contain no 
violations of the standard C syntax and constraints, and shall not exceed the 
implementation's translation limits.
xen/common/spinlock.c:316.29-316.40: Loc #1 [culprit: static function 
`observe_head(spinlock_tickets_t*)' is used in an inline function with external linkage 
(ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected 
by the parser."
xen/common/spinlock.c:301.26-301.37: Loc #2 [evidence: 
`observe_head(spinlock_tickets_t*)' declared here]
xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give 
inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal 
linkage]

violation for rule MC3R1.R1.1: (required) The program shall contain no 
violations of the standard C syntax and constraints, and shall not exceed the 
implementation's translation limits.
xen/common/spinlock.c:324.5-324.12: Loc #1 [culprit: static function `got_lock(union 
lock_debug*)' is used in an inline function with external linkage (ill-formed for the C99 
standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
xen/common/spinlock.c:227.13-227.20: Loc #2 [evidence: `got_lock(union 
lock_debug*)' declared here]
xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give 
inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal 
linkage]





Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-19 Thread Roberto Bagnara

On 19/06/23 09:54, Jan Beulich wrote:

On 16.06.2023 17:54, Roberto Bagnara wrote:

On 16/06/23 01:26, Stefano Stabellini wrote:

On Thu, 15 Jun 2023, Roberto Bagnara wrote:

+   * - Unspecified escape sequence is encountered in a character constant or a 
string literal token
+ - X86_64
+ - \\m:
+  non-documented GCC extension.


Are you saying that we are using \m and \m is not allowed by the C
standard?


The C standard does not specify that escape sequence, so what is
done with it, in particular by the preprocessor, is not specified.


Isn't it rather that gcc doesn't follow the spec to the word here?
As per what preprocessing-token can be, anything that isn't (among
other things) a string-literal or a character-constants falls under
"each non-white-space character that cannot be one of the above".
Hence since "\mode" doesn't form a valid string literal, it would
need to become (using '' notation for separation purposes, not to
indicate character constants) '"' '\' 'mode'. Which of course would
break what subsequently are string literals, as the supposedly
closing double-quote would now be an opening one. Which in turn is
presumably the reason why gcc (and probably other compilers as well)
behaves the way it does.


After a significant amount of work on the matter, we came to the
following conclusions:

1) In this matter, the C Standard is not at all clear regarding
   the conditions upon which it is legitimate placing undefined
   escape sequences in the sources.
2) The GNU C preprocessor manual says nothing in this regard.
3) Experimenting with a lot of compilers, it seems all implementers
   have filled the dots in the same way, that is: during translation
   phase 3, escape sequences are considered for the sole purpose
   of getting preprocessing tokens right; escape sequences, whether
   defined or undefined, are left untouched and passed over to translation
   phase 4.

Summarizing, we are now convinced that what we are facing is one
of the cases (there are many of them), where the C Standard is
not being clear, and not a case of undefined behavior.  Xen use
of \m guarded by __ASSEMBLY__ is thus correct and not problematic.
Indeed, the check for undefined escape sequences can only
be done after preprocessing.  I have asked that ECLAIR
is suitably amended.

Thank you all, and particularly to Jan, for the perseverance.
Kind regards,

   Roberto



Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-20 Thread Roberto Bagnara

On 19/06/23 13:47, Jan Beulich wrote:

On 19.06.2023 12:53, Roberto Bagnara wrote:

On 19/06/23 09:54, Jan Beulich wrote:

On 16.06.2023 17:54, Roberto Bagnara wrote:

On 16/06/23 01:26, Stefano Stabellini wrote:

On Thu, 15 Jun 2023, Roberto Bagnara wrote:

+   static function is used in an inline function with external linkage:
+  non-documented GCC extension.


I am not sure if I follow about this one. Did you mean "static is used
in an inline function with external linkage" ?


An inline function with external linkage can be inlined everywhere.
If that calls a static functions, which is not available everywhere,
the behavior is not defined.


I guess I could do with an example where this leads to UB. What I'd expect
is that it leads to a compilation error.


Here are the two occurrences we have in ARM64 code:

violation for rule MC3R1.R1.1: (required) The program shall contain no 
violations of the standard C syntax and constraints, and shall not exceed the 
implementation's translation limits.
xen/common/spinlock.c:316.29-316.40: Loc #1 [culprit: static function 
`observe_head(spinlock_tickets_t*)' is used in an inline function with external linkage 
(ill-formed for the C99 standard, ISO/IEC 9899:1999: "An ill-formed source detected 
by the parser."
xen/common/spinlock.c:301.26-301.37: Loc #2 [evidence: 
`observe_head(spinlock_tickets_t*)' declared here]
xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give 
inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal 
linkage]

violation for rule MC3R1.R1.1: (required) The program shall contain no 
violations of the standard C syntax and constraints, and shall not exceed the 
implementation's translation limits.
xen/common/spinlock.c:324.5-324.12: Loc #1 [culprit: static function `got_lock(union 
lock_debug*)' is used in an inline function with external linkage (ill-formed for the C99 
standard, ISO/IEC 9899:1999: "An ill-formed source detected by the parser."
xen/common/spinlock.c:227.13-227.20: Loc #2 [evidence: `got_lock(union 
lock_debug*)' declared here]
xen/include/xen/spinlock.h:180.1-180.4: Loc #3 [evidence: use 'static' to give 
inline function `_spin_lock_cb(spinlock_t*, void(*)(void*), void*)' internal 
linkage]


I know _spin_lock_cb() was an example of a violation (it isn't anymore),
but this does not serve as an example for the UB you claim may occur.
The "inline" there was in a .c file, and hence the function could only
be inlined with its (static) helper also in scope.


This is a constraint violation according to C99 6.7.4p3: "An inline definition
of a function with external linkage shall not contain a definition of a 
modifiable
object with static storage duration, and shall not contain a reference to an 
identifier
with internal linkage."  A standard-compliant C compiler ought to diagnose all
constraint violations: when it does not, as is the case for GCC in these 
specific
examples, the behavior is implicitly undefined.




Re: [XEN PATCH] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-20 Thread Roberto Bagnara

On 16/06/23 22:43, Stefano Stabellini wrote:

On Fri, 16 Jun 2023, Roberto Bagnara wrote:

+   * - Implicit conversion from a pointer to an incompatible pointer
+ - ARM64, X86_64
+ - Non-documented GCC extension.


Is this related to -Wincompatible-pointer-types?


In my opinion, this does not specify what the result of the
conversion is.


Fair enough. However, if -Wincompatible-pointer-types and "Implicit
conversion from a pointer to an incompatible pointer" are related, it
would add -Wincompatible-pointer-types as extra info about it. See also
below.



+   * - Pointer to a function is converted to a pointer to an object or a
pointer to an object is converted to a pointer to a function
+ - X86_64
+ - Non-documented GCC extension.


Is this J.5.7 of n1570?
https://www.iso-9899.info/n1570.html


This says that function pointer casts are a common extension.
What we need here is documentation for GCC that assures us
that the extension is implemented and what its semantics is.


Or maybe we should link https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584


My opinion is that this might not be accepted by an assessor:
if I was an assessor, I would not accept it.


I understand your point and I think it is valid. My observation was
that it is better to provide as much information for these undocumented
extensions as we can. Not necessarily to help with an assessors, but for
a new engineer working on this project, reading this document and
understanding what can be done.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584 might not be an
official documentation of the extension but it is better than no
documentation at all. Even better might be a code example.

I am not saying we should document ourselves what GCC failed to
document. I am only saying we should add enough description to
understand what we are talking about.

For instance, I read "Pointer to a function is converted to a pointer to
an object or a pointer to an object is converted to a pointer to a
function" and I have an idea about what this is but I am not really
sure. I googled the sentence and found information on Stackoverflow. I
think it is better to link
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584 or a couple of
sentences from it, although it might not be official.



+   * - Ill-formed source detected by the parser


As we are documenting compiler extensions that we are using, I am a bit
confused by the name of this category of compiler extensions, and the
reason why they are bundled together. After all, they are all separate
compiler extensions? Should each of them have their own row?


Agreed.


+ - ARM64, X86_64
+ - token pasting of ',' and __VA_ARGS__ is a GNU extension:
+  see Section "6.21 Macros with a Variable Number of Arguments"
of GCC_MANUAL.
+   must specify at least one argument for '...' parameter of variadic
macro:
+  see Section "6.21 Macros with a Variable Number of Arguments"
of GCC_MANUAL.
+   void function should not return void expression:


I understand that GCC does a poor job at documenting several of these
extensions. In fact a few of them are not even documented at all.
However, if they are extensions, they should be described for what they
do, not for the rule they violate. What do you think?


The point is that we don't know what they do.  We might make observations,
and our observations might substantiate what we believe they do.
But this would not allow us to generalize them.


For example, in this case maybe we should say "void function can return
a void expression" ?


We can certainly say that, but this might not convince an assessor.
One possibility would be to submit patches to the GCC manual and see
whether they are accepted.


I think we have two different target audiences for this document. One
target is an assessors, and I understand that extra unofficial
information might not help there.

However another target is the community. This document should help the
Xen community write better code, not just the assessors raise red flags.
Right? It should help us have better compiler compatibility, and making
sure that we are clear about the C dialect we use. Actually, I think
this document could be of great help. Do you agree?

 From that point of view "void function should not return void
expression" is not understandable. At least I don't understand it.

A different approach would be to say:

- this is a MISRA C violation or compiler warning/error
- it is not C99 compliant
- it is not documented behavior by GCC

Not try to describe what the extension is at all, and instead focus on
what the MISRA C violation or compiler warning is.

I think it is OK to go down that route, but in that case we need to
reorganize the document so that:
- all documented extensions are referred to as extensions
- all undocumented extensions are referred to by the warning they
   trigger


[XEN PATCH v2] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-20 Thread Roberto Bagnara
This document specifies the C language dialect used by Xen and
the assumptions Xen makes on the translation toolchain.

Signed-off-by: Roberto Bagnara 
---
 docs/misra/C-language-toolchain.rst | 478 
 1 file changed, 478 insertions(+)
 create mode 100644 docs/misra/C-language-toolchain.rst

diff --git a/docs/misra/C-language-toolchain.rst 
b/docs/misra/C-language-toolchain.rst
new file mode 100644
index 00..0ffb14
--- /dev/null
+++ b/docs/misra/C-language-toolchain.rst
@@ -0,0 +1,478 @@
+=
+C Dialect and Translation Assumptions for Xen
+=
+
+This document specifies the C language dialect used by Xen and
+the assumptions Xen makes on the translation toolchain.
+It covers, in particular:
+
+1. the used language extensions;
+2. the translation limits that the translation toolchains must be able
+   to accommodate;
+3. the implementation-defined behaviors upon which Xen may depend.
+
+All points are of course relevant for portability.  In addition,
+programming in C is impossible without a detailed knowledge of the
+implementation-defined behaviors.  For this reason, it is recommended
+that Xen developers have familiarity with this document and the
+documentation referenced therein.
+
+This document needs maintenance and adaptation in the following
+circumstances:
+
+- whenever the compiler is changed or updated;
+- whenever the use of a certain language extension is added or removed;
+- whenever code modifications cause exceeding the stated translation limits.
+
+
+Applicable C Language Standard
+__
+
+Xen is written in C99 with extensions.  The relevant ISO standard is
+
+*ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
+Technical Corrigendum 3.
+ISO/IEC, Geneva, Switzerland, 2007.
+
+
+Reference Documentation
+___
+
+The following documents are referred to in the sequel:
+
+GCC_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
+CPP_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
+ARM64_ABI_MANUAL:
+  
https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
+X86_64_ABI_MANUAL:
+  
https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
+
+
+C Language Extensions
+_
+
+
+The following table lists the extensions currently used in Xen.
+The table columns are as follows:
+
+   Extension
+  a terse description of the extension;
+   Architectures
+  a set of Xen architectures making use of the extension;
+   References
+  when available, references to the documentation explaining
+  the syntax and semantics of (each instance of) the extension.
+
+
+.. list-table::
+   :widths: 30 15 55
+   :header-rows: 1
+
+   * - Extension
+ - Architectures
+ - References
+
+   * - Non-standard tokens
+ - ARM64, X86_64
+ - _Static_assert:
+  see Section "2.1 C Language" of GCC_MANUAL.
+   asm, __asm__:
+  see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline 
Assembly Language in C Code" of GCC_MANUAL.
+   __volatile__:
+  see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of 
GCC_MANUAL.
+   __const__, __inline__, __inline:
+  see Section "6.48 Alternate Keywords" of GCC_MANUAL.
+   typeof, __typeof__:
+  see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
+   __alignof__, __alignof:
+  see Sections "6.48 Alternate Keywords" and "6.44 Determining the 
Alignment of Functions, Types or Variables" of GCC_MANUAL.
+   __attribute__:
+  see Section "6.39 Attribute Syntax" of GCC_MANUAL.
+   __builtin_types_compatible_p:
+  see Section "6.59 Other Built-in Functions Provided by GCC" of 
GCC_MANUAL.
+   __builtin_va_arg:
+  non-documented GCC extension.
+   __builtin_offsetof:
+  see Section "6.53 Support for offsetof" of GCC_MANUAL.
+   __signed__:
+  non-documented GCC extension.
+
+   * - Empty initialization list
+ - ARM64, X86_64
+ - Non-documented GCC extension.
+
+   * - Arithmetic operator on void type
+ - ARM64, X86_64
+ - See Section "6.24 Arithmetic on void- and Function-Pointers" of 
GCC_MANUAL."
+
+   * - Statements and declarations in expressions
+ - ARM64, X86_64
+ - See Section "6.1 Statements and Declarations in Expressions" of 
GCC_MANUAL.
+
+   * - Structure or union definition with no members
+ - ARM64, X86_64
+ - See Section "6.19 Structures with No Members" of GCC_MANUAL.
+
+   * - Zero size array type
+ - ARM64, X86_64
+ - See Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
+
+   * - Bi

Re: [XEN PATCH v2] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-21 Thread Roberto Bagnara

On 20/06/23 16:52, Jan Beulich wrote:

On 20.06.2023 14:10, Roberto Bagnara wrote:

+   * - Non-standard tokens
+ - ARM64, X86_64
+ - _Static_assert:
+  see Section "2.1 C Language" of GCC_MANUAL.
+   asm, __asm__:
+  see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline 
Assembly Language in C Code" of GCC_MANUAL.
+   __volatile__:
+  see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of 
GCC_MANUAL.
+   __const__, __inline__, __inline:
+  see Section "6.48 Alternate Keywords" of GCC_MANUAL.
+   typeof, __typeof__:
+  see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
+   __alignof__, __alignof:
+  see Sections "6.48 Alternate Keywords" and "6.44 Determining the 
Alignment of Functions, Types or Variables" of GCC_MANUAL.
+   __attribute__:
+  see Section "6.39 Attribute Syntax" of GCC_MANUAL.
+   __builtin_types_compatible_p:
+  see Section "6.59 Other Built-in Functions Provided by GCC" of 
GCC_MANUAL.
+   __builtin_va_arg:
+  non-documented GCC extension.
+   __builtin_offsetof:
+  see Section "6.53 Support for offsetof" of GCC_MANUAL.
+   __signed__:
+  non-documented GCC extension.


I'd like to note that there is a patch [1] pending which removes all uses
of this extension. Hopefully in Prague we can settle on how to progress
this ...

Jan

[1] https://lists.xen.org/archives/html/xen-devel/2023-02/msg00445.html


Ok, removed from the document and from the tool configuration.



Re: [XEN PATCH v2] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-21 Thread Roberto Bagnara

On 20/06/23 16:56, Jan Beulich wrote:

On 20.06.2023 14:10, Roberto Bagnara wrote:

+   * - Arithmetic operator on void type
+ - ARM64, X86_64
+ - See Section "6.24 Arithmetic on void- and Function-Pointers" of 
GCC_MANUAL."


The first line is misleading - we don't (and can't) do arithmetic on void.
What we do is arithmetic on pointers to void (like the gcc section title
properly says).


You are right: corrected now.
Thanks,

   Roberto




Re: [XEN PATCH v2] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-21 Thread Roberto Bagnara

On 21/06/23 12:27, Jan Beulich wrote:

On 20.06.2023 14:10, Roberto Bagnara wrote:

+   * - static function is used in an inline function with external linkage
+ - ARM64, X86_64
+ - Non-documented GCC extension. An inline function with external linkage
+   can be inlined everywhere. If that calls a static functions, which is
+   not available everywhere, it is a constraint violation according to
+   C99 6.7.4p3: "An inline definition of a function with external linkage
+   shall not contain a definition of a modifiable object with static
+   storage duration, and shall not contain a reference to an identifier
+   with internal linkage."  A standard-compliant C compiler ought
+   to diagnose all constraint violations: when it does not, as is the
+   case for GCC, the behavior is implicitly undefined.


With _spin_lock_cb() taken care of, do we have any left? Or else can this
be dropped?


Dropped both from the document and the tool configuration.

   Roberto




Re: [XEN PATCH v2] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-21 Thread Roberto Bagnara

On 20/06/23 17:05, Jan Beulich wrote:

On 20.06.2023 14:10, Roberto Bagnara wrote:

+   * - Token pasting of ',' and __VA_ARGS__
+ - ARM64, X86_64
+ - See Section "6.21 Macros with a Variable Number of Arguments" of 
GCC_MANUAL.
+
+   * - No arguments for '...' parameter of variadic macro
+ - ARM64, X86_64
+ - See Section "6.21 Macros with a Variable Number of Arguments" of 
GCC_MANUAL.


Seeing these I think you want to also mention our extensive use of
the pre-C99 way of variadic macros, which also includes pasting of
',' with something.


Good catch.  Added

   * - Named variadic macro arguments
 - ARM64, X86_64
 - See Section "6.21 Macros with a Variable Number of Arguments" of 
GCC_MANUAL.

Thanks,

   Roberto



[XEN PATCH v3] docs/misra: document the C dialect and translation toolchain assumptions.

2023-06-21 Thread Roberto Bagnara
This document specifies the C language dialect used by Xen and
the assumptions Xen makes on the translation toolchain.

Signed-off-by: Roberto Bagnara 

Changes in V2:
  - Clarified several entries.
  - Removed entry about the use of the undefined escape sequence \m.
Changes in V3:
  - Removed mention of the __signed__ non-standard keyword.
  - Fixed the entry about arithmetic operator on pointer to void.
  - Added entry for named variadic macro arguments.
  - Removed entry about static function used in an inline function
with external linkage.

---
 docs/misra/C-language-toolchain.rst | 468 
 1 file changed, 468 insertions(+)
 create mode 100644 docs/misra/C-language-toolchain.rst

diff --git a/docs/misra/C-language-toolchain.rst 
b/docs/misra/C-language-toolchain.rst
new file mode 100644
index 00..8651f59118
--- /dev/null
+++ b/docs/misra/C-language-toolchain.rst
@@ -0,0 +1,468 @@
+=
+C Dialect and Translation Assumptions for Xen
+=
+
+This document specifies the C language dialect used by Xen and
+the assumptions Xen makes on the translation toolchain.
+It covers, in particular:
+
+1. the used language extensions;
+2. the translation limits that the translation toolchains must be able
+   to accommodate;
+3. the implementation-defined behaviors upon which Xen may depend.
+
+All points are of course relevant for portability.  In addition,
+programming in C is impossible without a detailed knowledge of the
+implementation-defined behaviors.  For this reason, it is recommended
+that Xen developers have familiarity with this document and the
+documentation referenced therein.
+
+This document needs maintenance and adaptation in the following
+circumstances:
+
+- whenever the compiler is changed or updated;
+- whenever the use of a certain language extension is added or removed;
+- whenever code modifications cause exceeding the stated translation limits.
+
+
+Applicable C Language Standard
+__
+
+Xen is written in C99 with extensions.  The relevant ISO standard is
+
+*ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
+Technical Corrigendum 3.
+ISO/IEC, Geneva, Switzerland, 2007.
+
+
+Reference Documentation
+___
+
+The following documents are referred to in the sequel:
+
+GCC_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
+CPP_MANUAL:
+  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
+ARM64_ABI_MANUAL:
+  
https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
+X86_64_ABI_MANUAL:
+  
https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
+
+
+C Language Extensions
+_
+
+
+The following table lists the extensions currently used in Xen.
+The table columns are as follows:
+
+   Extension
+  a terse description of the extension;
+   Architectures
+  a set of Xen architectures making use of the extension;
+   References
+  when available, references to the documentation explaining
+  the syntax and semantics of (each instance of) the extension.
+
+
+.. list-table::
+   :widths: 30 15 55
+   :header-rows: 1
+
+   * - Extension
+ - Architectures
+ - References
+
+   * - Non-standard tokens
+ - ARM64, X86_64
+ - _Static_assert:
+  see Section "2.1 C Language" of GCC_MANUAL.
+   asm, __asm__:
+  see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline 
Assembly Language in C Code" of GCC_MANUAL.
+   __volatile__:
+  see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of 
GCC_MANUAL.
+   __const__, __inline__, __inline:
+  see Section "6.48 Alternate Keywords" of GCC_MANUAL.
+   typeof, __typeof__:
+  see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
+   __alignof__, __alignof:
+  see Sections "6.48 Alternate Keywords" and "6.44 Determining the 
Alignment of Functions, Types or Variables" of GCC_MANUAL.
+   __attribute__:
+  see Section "6.39 Attribute Syntax" of GCC_MANUAL.
+   __builtin_types_compatible_p:
+  see Section "6.59 Other Built-in Functions Provided by GCC" of 
GCC_MANUAL.
+   __builtin_va_arg:
+  non-documented GCC extension.
+   __builtin_offsetof:
+  see Section "6.53 Support for offsetof" of GCC_MANUAL.
+
+   * - Empty initialization list
+ - ARM64, X86_64
+ - Non-documented GCC extension.
+
+   * - Arithmetic operator on pointer to void
+ - ARM64, X86_64
+ - See Section "6.24 Arithmetic on void- and Function-Pointers" of 
GCC_MANUAL."
+
+   * - Statements and declarations in expressions
+ - ARM64, X86_64
+ - See Section "6.1 Statement

Re: ECLAIR Xen x86 results and progress

2022-05-15 Thread Roberto Bagnara

On 09/05/22 21:55, Stefano Stabellini wrote:

On Mon, 9 May 2022, Bertrand Marquis wrote:

On 6 May 2022, at 17:31, Stefano Stabellini  wrote:

Hi all,

Roberto kindly provided the ECLAIR x86 results:

https://eclairit.com:8443/job/XEN/Target=X86_64,agent=public/lastSuccessfulBuild/eclair/

Click on "See ECLAIR in action", then you can select "Show 100 entries"
and see all the results in one page. As an example MC3R1.R1.3
corresponds to Rule 1.3 in the spreadsheet.


If you are OK with this, I would like to aim at a follow-up meeting on
Tue May 17 at the same time (8AM California / 4PM UK). If the date/time
doesn't work, I'll run another Doodle poll.


Works for me.


Actually, to make sure more people are able to attend, I would like to
suggest May 19 8AM California / 4PM UK / 5PM Europe (which is the same
slot typically used by the Xen Community Call). Please let me know if
that works or if it is a problem.



By then, I am hoping that the group has already gone through the first
20 rules in the list, up until Rule 8.10. Is that reasonable for all of
you?


I completed that part of the table this morning (up to 8.14), it took me 30 
minutes.


Thank you! I did so as well in about the same amount of time.

I think I should provide a clarification on a couple of rules that are
not clear from the examples.


# Rule 5.4 "Macro identifiers shall be distinct"

This one is about the length of the Macro itself. C90 requires the first
31 characters to be different, while C99 requires the first 63
characteres to be different.

So the problem is the following:

#define this_macro_is_way_way_way_too_long
#define this_macro_is_way_way_way_too_lng

I don't think we have any violations.


# Rule 8.6 " An identifier with external linkage shall have exactly one external 
definition"

This one is meant to catch cases where there are 2 definitions for 1
declaration:

header.h:
extern int hello;

file1.c:
int hello;

file2:
int hello;

There was a question on whether having 1 declaration with no definitions
would be OK, so only the following:

header.h:
extern int hello;

for instance because file1.c has been removed from the build due to a
kconfig. Reading MISRA, I don't think it is a violation of Rule 8.6.
Roberto please correct me if I am wrong.


Hi there.

As I am not 100% sure I will be able to attend the meeting on the 19th,
I am providing some input here:

# Rule 5.4 "Macro identifiers shall be distinct"

The point here is that every standard-conformant C/C++ compiler has the right
of ignoring the characters in an identifier that are outside the "significant
part."  Different kind of identifiers have different minimum lengths
for the significant part.  For macros and non-external identifiers,
we have:

- for C90 and C95: 31 characters;
- for C99, C11 and C18: 63 characters;
- for C++: 1024 characters.

So, for example, a C90 compiler (or, and it is the same, any compiler
in C90 mode) can freely ignore all characters in the name of macros
beyond the 31st.  And if two macro identifiers differ only in the
non-significant part, the behavior is undefined (for the simple reason
that the preprocessor can choose any of the conflicting macros).
Now, the builds of XEN that are analized at https://eclairit.com
are built with (versions of) GCC and GCC goes well beyond the
standard minimal requirements by stipulating that "For internal names,
all characters are significant. For external names, the number of
significant characters are defined by the linker; for almost all
targets, all characters are significant."
(https://gcc.gnu.org/onlinedocs/gcc/Identifiers-implementation.html#Identifiers-implementation).
Now, given that ECLAIR takes into account (completely automatically)
all the implementation-defined behaviors of the actually used toolchain
(including how those are influenced by the command-line options),
why do we have a violation report when macro names only differ
after the 31st character in C90 mode?
For example,

12345678901234567890123456789012345678901
XEN_NETIF_CTRL_TYPE_SET_HASH_MAPPING_SIZE
XEN_NETIF_CTRL_TYPE_SET_HASH_MAPPING

?

The reason is pragmatic: yes, there is no undefined behavior
with GCC, but do you really want to admit macro names that are
different only in characters coming hundreds or thousands positions
from the beginning of the identifier?  I guess not: then choose your
threshold, add a one-liner to the ECLAIR configuration, and let ECLAIR
warn you if and when the rule is violated with *that* threshold.

This brings us to a related topic: why is part of XEN compiled in C90
mode while part of it is compiled in C99 mode?  This is a very important
violation of MISRA C Rule 1.1, which is not in your coding standard,
but it really should be included.  C90 and C99 are not 100% compatible,
and mixing them is asking for trouble.  My advice is choosing one version
of the language (C90, C99, C11 or C18) and stick to it, enabling Rule 1.1
to make sure the source code complies to the chosen language stan

Re: [PATCH 1/3] x86/p2m.h: Add include guards

2022-05-17 Thread Roberto Bagnara

On 17/05/22 17:38, Jan Beulich wrote:

On 09.05.2022 14:24, Andrew Cooper wrote:

Spotted by Eclair MISRA scanner.


I'm sorry, but what exactly was it that the scanner spotted? It was
actually deliberate to introduce this file without guards. I'm of
the general opinion that (private) headers not to be included by
other headers (but only by .c files) are not in need of guards. If
it is project-wide consensus that _all_ header files should have
guards, then I'll try to keep this in mind (in "x86emul: a few
small steps towards disintegration" for example I introduce
another such instance), but then it should also be put down in
./CODING_STYLE.


The rationale of this rule is as follows:

- With a complex hierarchy of nested header files, it is possible
  for a header file to be included more than once.

- This can bring to circular references of header files, which
  can result in undefined behavior and/or be difficult to debug.

- If multiple inclusion leads to multiple or conflicting definitions,
  then this can result in undefined or erroneous behavior.

- Compilation and analysis time is needlessly increased.

There has been a period (which lasted until the end of the '70s
or the beginning of the '80s, I would have to dig up to be
more precise) when the solution was thought to be "headers
shall not to be included by other headers but only by .c files."
Experience then showed that, in medium to large projects,
each .c file had to begin with a long list of #include
directives;  such lists needed to be ordered to accommodate
the dependencies between header files;  in some cases the
lists were so long that:

a) it was a kind of black magic to find out the right
   inclusion order, one that would work in any of
   possibly many project configurations;
b) the lists of #include directives often contained duplicates,
   possibly because the desperate programmers where trying
   to find the right order.

In the end, the software engineering community converged
on the idea that guards against multiple inclusion are
a much better alternative.

Of course there are valid reasons to deviate the rule:
some header files might be conceived to be included
multiple times.  A one-line configuration for ECLAIR
will do the trick to make sure such header files are
not reported.

Kind regards,

   Roberto


Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
---
  xen/arch/x86/mm/p2m.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/mm/p2m.h b/xen/arch/x86/mm/p2m.h
index cc0f6766e4df..dc706b8e4799 100644
--- a/xen/arch/x86/mm/p2m.h
+++ b/xen/arch/x86/mm/p2m.h
@@ -15,6 +15,9 @@
   * along with this program; If not, see .
   */
  
+#ifndef __ARCH_MM_P2M_H__

+#define __ARCH_MM_P2M_H__
+
  struct p2m_domain *p2m_init_one(struct domain *d);
  void p2m_free_one(struct p2m_domain *p2m);
  
@@ -39,6 +42,8 @@ int ept_p2m_init(struct p2m_domain *p2m);

  void ept_p2m_uninit(struct p2m_domain *p2m);
  void p2m_init_altp2m_ept(struct domain *d, unsigned int i);
  
+#endif /* __ARCH_MM_P2M_H__ */

+
  /*
   * Local variables:
   * mode: C







Re: MOVING COMMUNITY CALL Call for agenda items for 9 June Community Call @ 1500 UTC

2022-06-09 Thread Roberto Bagnara

On 07/06/22 04:17, Stefano Stabellini wrote:
> # Rule 9.1 "The value of an object with automatic storage duration shall not be 
read before it has been set"
>
> The question is whether -Wuninitalised already covers this case or not.
> I think it does.
>
> Eclair is reporting a few issues where variables are "possibly
> uninitialized". We should ask Roberto about them, I don't think they are
> actual errors? More like extra warnings?

No, -Wuninitialized is not reliable, as it has plenty of (well known)
false negatives.  This is typical of compilers, for which the generation
of warnings is only a secondary objective.  I wrote about that here:

  https://www.bugseng.com/blog/compiler-warnings-use-them-dont-trust-them

On the specifics:

$ cat p.c
int foo (int b)
{
int a;

if (b)
{
a = 1;
}

return a;
}

$ gcc -c -W -Wall -Wmaybe-uninitialized -O3 p.c
$ gcc -c -W -Wall -Wuninitialized -O3 p.c
$

Note that the example is less contrived than you might think.
See, JF Bastien's talk at 2019 LLVM Developers' Meeting:

  https://www.youtube.com/watch?v=I-XUHPimq3o

More generally, you can only embrace MISRA if you agree on
its preventive nature, which is radically different from
the "bug finding" approach.  The point is rather simple:

1) static analysis alone cannot guarantee correctness;
2) peer review is unavoidable;
3) testing is unavoidable.

In order to effectively conduct a peer review, you cannot
afford being distracted every minute by the thought
"is this initialized?  where is it initialized?  with which
value is it initialized?"
In a MISRA setting, you want that the answer to such questions
is immediately clear to anyone.
In contrast, if you embrace bug finding (that is, checkers with
false negatives like the ones implemented by compilers),
you will miss instances that you may miss also with testing
(testing a program with UB does not give reliable results);
and you will likely miss them with peer review, unless you
can spend a lot of time and resources in the activity.

The checker implemented by ECLAIR for Rule 9.1 embodies this
principle: if it says "violation", then it is a definite
violation;  if it says "caution", then maybe there is no
UB, but a human will have to spend more than 30 seconds
in order to convince herself that there is no UB.

I understand this may sound frustrating to virtuoso programmers,
and there are many of them in the open source world.
But the truth is that virtuosity in programming is not a good
thing for safety-related development.   For safety you want
code that is simple and straightforward to reason about.
Kind regards,

   Roberto






Re: MISRA C meeting tomorrow, was: MOVING COMMUNITY CALL Call for agenda items for 9 June Community Call @ 1500 UTC

2022-06-09 Thread Roberto Bagnara

On 09/06/22 09:04, Jan Beulich wrote:

On 09.06.2022 03:20, Stefano Stabellini wrote:

Finally, for Rule 13.2, I updated the link to ECLAIR's results. There
are a lot more violations than just 4, but I don't know if they are
valid or false positives.


I've picked just the one case in xen/common/efi/ebmalloc.c to check,
and it says "possibly". That's because evaluation of function call
arguments involves the calling of (in this case two) further
functions. If those functions had side effects (which apparently the
tool can't figure), there would indeed be a problem.

The (Arm based) count of almost 10k violations is clearly a concern.
I don't consider it even remotely reasonable to add 10k comments, no
matter how brief, to cover all the false positives.


Again, the MISRA approach is a preventive one.
If you have reasons you want to write

   f(g(), h());

then declare g() and h() as pure (or const, if they are const).
E.g.:

#if COMPILER_SUPPORTS_PURE
#define PURE __attribute__((pure))
#else
#define PURE
#endif

int g(void) PURE;
int h(void) PURE;

It's good documentation, it improves compiler diagnostics,
and it satisfies Rule 13.2.
Kind regards,

   Roberto





Re: [PATCH] xen: Use -Wuninitialized and -Winit-self

2024-01-04 Thread Roberto Bagnara

On 2024-01-04 15:33, Andrew Cooper wrote:

On 04/01/2024 1:41 pm, Jan Beulich wrote:

On 28.12.2023 20:39, Andrew Cooper wrote:

The use of uninitialised data is undefined behaviour.  At -O2 with trivial
examples, both Clang and GCC delete the variable, and in the case of a
function return, the caller gets whatever was stale in %rax prior to the call.

Clang includes -Wuninitialized within -Wall, but GCC only includes it in
-Wextra, which is not used by Xen at this time.

Furthermore, the specific pattern of assigning a variable to itself in its
declaration is only diagnosed by GCC with -Winit-self.  Clang does diagnoise
simple forms of this pattern with a plain -Wuninitialized, but it fails to
diagnose the instances in Xen that GCC manages to find.

GCC, with -Wuninitialized and -Winit-self notices:

   arch/x86/time.c: In function ‘read_pt_and_tsc’:
   arch/x86/time.c:297:14: error: ‘best’ is used uninitialized in this function 
[-Werror=uninitialized]
 297 | uint32_t best = best;
 |  ^~~~
   arch/x86/time.c: In function ‘read_pt_and_tmcct’:
   arch/x86/time.c:1022:14: error: ‘best’ is used uninitialized in this 
function [-Werror=uninitialized]
1022 | uint64_t best = best;
 |  ^~~~

and both have logic paths where best can be returned while uninitalised.

I disagree. In both cases the variables are reliably set during the first
loop iteration.


I suggest you pay attention to the precision of the integers.

It is hard (likely prohibitively hard) to avoid entering the if(), but
it is not impossible.

The compiler really has emitted logic paths where stack rubble is returned.


Furthermore this initialize-to-self is a well known pattern to suppress the
-Wuninitialized induced warnings, originally used by Linux'es
uninitialized_var().


I'm glad you cited this, because it proves my point.

Notice how it was purged from Linux slowly over the course of 8 years
because it had been shown to create real bugs, by hiding real uses of
uninitialised variables.


There is a worse problem for initialize-to-self: it is undefined behavior
per se.  If this is done to suppress a warning, then what happens is
paradoxical: in order to suppress a warning about a potential undefined
behavior (the variable might indeed be always written before being read)
one introduces a definite undefined behavior.
Kind regards,

   Roberto



Re: [PATCH] xen: Use -Wuninitialized and -Winit-self

2024-01-05 Thread Roberto Bagnara

On 2024-01-05 07:56, Jan Beulich wrote:

On 04.01.2024 21:43, Roberto Bagnara wrote:

On 2024-01-04 15:33, Andrew Cooper wrote:

On 04/01/2024 1:41 pm, Jan Beulich wrote:

On 28.12.2023 20:39, Andrew Cooper wrote:

The use of uninitialised data is undefined behaviour.  At -O2 with trivial
examples, both Clang and GCC delete the variable, and in the case of a
function return, the caller gets whatever was stale in %rax prior to the call.

Clang includes -Wuninitialized within -Wall, but GCC only includes it in
-Wextra, which is not used by Xen at this time.

Furthermore, the specific pattern of assigning a variable to itself in its
declaration is only diagnosed by GCC with -Winit-self.  Clang does diagnoise
simple forms of this pattern with a plain -Wuninitialized, but it fails to
diagnose the instances in Xen that GCC manages to find.

GCC, with -Wuninitialized and -Winit-self notices:

arch/x86/time.c: In function ‘read_pt_and_tsc’:
arch/x86/time.c:297:14: error: ‘best’ is used uninitialized in this 
function [-Werror=uninitialized]
  297 | uint32_t best = best;
  |  ^~~~
arch/x86/time.c: In function ‘read_pt_and_tmcct’:
arch/x86/time.c:1022:14: error: ‘best’ is used uninitialized in this 
function [-Werror=uninitialized]
 1022 | uint64_t best = best;
  |  ^~~~

and both have logic paths where best can be returned while uninitalised.

I disagree. In both cases the variables are reliably set during the first
loop iteration.


I suggest you pay attention to the precision of the integers.

It is hard (likely prohibitively hard) to avoid entering the if(), but
it is not impossible.

The compiler really has emitted logic paths where stack rubble is returned.


Furthermore this initialize-to-self is a well known pattern to suppress the
-Wuninitialized induced warnings, originally used by Linux'es
uninitialized_var().


I'm glad you cited this, because it proves my point.

Notice how it was purged from Linux slowly over the course of 8 years
because it had been shown to create real bugs, by hiding real uses of
uninitialised variables.


There is a worse problem for initialize-to-self: it is undefined behavior
per se.  If this is done to suppress a warning, then what happens is
paradoxical: in order to suppress a warning about a potential undefined
behavior (the variable might indeed be always written before being read)
one introduces a definite undefined behavior.


I don't think so - aiui this is another of the many gcc extensions to the
language (no code is generated for this type of initialization, iirc).


Well, it is undefined behavior for the C Standard: "undefined behavior"
is a technical term (see, e.g., C18 3.4.3p1) that is not dependent on
any compiler or any implementation technique for the C language.
Any implementation can deal with undefined behavior with absolute
freedom, which includes, among infinite other possibilities, doing what
you believe it does (iyrc ;-)

In addition, the compiler is typically not the only element of the toolchain
that needs to reliably interpret and process C code: so, even if we could reach
absolute certainty about what a specific version of one compiler does in 
response
to one instance of undefined behavior (and I think we cannot), then we would
need to obtain similar guarantees for other tools.



Re: [PATCH 0/7] Fix MISRA C 2012 Rule 20.7 violations

2022-09-28 Thread Roberto Bagnara

Hi Xenia.  Please see below.

On 9/26/22 10:50, Xenia Ragiadakou wrote:

On 9/18/22 16:02, Roberto Bagnara wrote:

The question is on the interpretation of Rule 20.7. Are parenthesis
required by Rule 20.7 in the following cases:

- macro parameters used as function arguments

 > [...]
 > - macro parameter used as lhs in assignment

You can obtain different semantics depending on whether parentheses
are or are not used (in the macro call and/or macro expansion
depending on the case):


#include 

void g(int v) {
   printf("%d\n", v);
}

#define m1(x, y, ...) g(y)

void f1(int x, int y, ...) {
   g(y);
}

#define p 0, 1

void test1() {
   m1(p, 2);
   f1(p, 2);
}



In the example above something bothers me. Let me explain.

Running the above example gives:
2
1

The results differ mainly because m1() is substituted before p.
Thus, adding parentheses around the macro parameter 'y' of m1() i.e
#define m1(x, y, ...) g((y))
has no impact.

If the example is changed into the following:

#include 

void g(int v) {
    printf("%d\n", v);
}

#define m1(y, ...) g(y)

void f1(int y, ...) {
    g(y);
}

#define p 0, 1

void test1() {
     m1(p, 2);
     f1(p, 2);
}

if no parentheses are added around 'y' in the definition of m1(), the compiler complains 
with "too many arguments to function g".
If parentheses are added around 'y', the compiler does not complain but the 
behavior will still differ and the result will be
1
0

This happens because in the case of m1(), p is interpreted as an expression 
(due to the parentheses added there) and the comma is evaluated as a comma 
operator, while in f1(), p is interpreted as a list of expressions and the 
comma is evaluated as a comma separator.

Hence, in my opinion, parentheses should not be added around macro parameters 
used as function arguments because they can hide a bug due to missing 
parentheses around the entire macro definition.
Since macro 'p' is supposed to represent an expression, and the semantics of 
the comma token are those of a comma operator and not a comma separator, then 
parentheses need to be placed around the entire macro definition i.e
#define p (0, 1)


Your analysis is correct: the example was meant only to show that
the use of a macro or a function with the same actual parameters
and apparently equivalent bodies can make a difference and that the
addition of parentheses (around the body of p, as you suggest, or around
the occurrence of p in the call to f1()) can avoid this problem.
All this, however, is outside the scope of Rule 20.7, so the example
may have been confusing: sorry about that.


AFAIK, there is no requirement in MISRA C guidelines to add parentheses around 
the entire macro definition when it is used as an expression and this is 
something I cannot understand.
Unless I got it all wrong I guess ...


Yes, this is known and it is has also been brought to the attention of
the MISRA C working group.


Can a deviation being added in the basis of C99 standard since according to the 
standard, E1[E2] is identical to (*((E1)+(E2))), and therefore, macro 
parameters used as subscript expressions are implicitly
parenthesized and can be exempted from the rule.


Sure, you can always deviate any non-mandatory guideline: just be ware
of the fact that complying is often cheaper than deviating.


For instance, if you noticed Rule 20.7 reports given by
ECLAIR and not by cppcheck, then maybe line

Rule 20.7  FP

should be

Rule 20.7  FN+FP

If you can let me have an indication of the code that
ECLAIR is flagging for Rule 20.7 and cppcheck does not
flag, I will be happy to double-check.


ECLAIR flags as violations of Rule 20.7 the cases where unparenthesized macro 
parameters are used as (1) function arguments or (2) array indexes, while 
cppcheck does not.

For instance:
(1) in xen/arch/arm/include/asm/atomic.h
#define read_atomic(p) ({   \
     union { typeof(*(p)) val; char c[0]; } x_;  \
     read_atomic_size(p, x_.c, sizeof(*(p)));    \
     x_.val; \
})
ECLAIR flags as violations missing parentheses around 'p', when used as an 
argument of read_atomic_size().


ECLAIR is right in reporting these violations of Rule 20.7;
these are false negatives of cppcheck.


(2) in xen/arch/arm/arm64/cpufeature.c
#define SANITIZE_REG(field, num, reg)  \
 sanitize_reg(&system_cpuinfo.field.bits[num], new->field.bits[num], \
  #reg, ftr_##reg)
ECLAIR flags as violations missing parentheses around 'num'.


Same as above.

I am probably repeating myself, but the MISRA guidelines are the result
of carefully-chosen compromises between the simplicity of the guideline
and its ability to protect against the targeted bad thing.  As Rule 20.7
is required, any violati

Re: [PATCH] misra: increase identifiers length to 64

2024-11-17 Thread Roberto Bagnara

On 2024-11-16 01:23, Stefano Stabellini wrote:

Currently the identifiers characters limit is arbitrarily set to 40. It
causes a few violations as we have some identifiers longer than 40.

Increase the limit to another rather arbitrary limit of 64. Thanks to
this change, we remove a few violations, getting us one step closer to
marking Rules 5.2 and 5.4 as clean.

Also update the ECLAIR config that was actually set to 63 as character
limit.

Signed-off-by: Stefano Stabellini 

diff --git a/automation/eclair_analysis/ECLAIR/toolchain.ecl 
b/automation/eclair_analysis/ECLAIR/toolchain.ecl
index 86e9a79b52..8fb1778bce 100644
--- a/automation/eclair_analysis/ECLAIR/toolchain.ecl
+++ b/automation/eclair_analysis/ECLAIR/toolchain.ecl
@@ -155,8 +155,8 @@
  -doc_end
  
  -doc_begin="See Section \"4.3 Identifiers\" of "GCC_MANUAL"."

--config=STD.extidsig, behavior+={c99, GCC_ARM64, "63"}
--config=STD.extidsig, behavior+={c99, GCC_X86_64, "63"}
+-config=STD.extidsig, behavior+={c99, GCC_ARM64, "64"}
+-config=STD.extidsig, behavior+={c99, GCC_X86_64, "64"}
  -doc_end
  
  #

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index 4a144da8d6..3ed5801bff 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -154,7 +154,7 @@ maintainers if you want to suggest a change.
 * - `Rule 5.1 
`_
   - Required
   - External identifiers shall be distinct
- - The Xen characters limit for identifiers is 40. Public headers
+ - The Xen characters limit for identifiers is 64. Public headers
 (xen/include/public/) are allowed to retain longer identifiers
 for backward compatibility.
  
@@ -162,7 +162,7 @@ maintainers if you want to suggest a change.

   - Required
   - Identifiers declared in the same scope and name space shall be
 distinct
- - The Xen characters limit for identifiers is 40. Public headers
+ - The Xen characters limit for identifiers is 64. Public headers
 (xen/include/public/) are allowed to retain longer identifiers
 for backward compatibility.
  
@@ -177,7 +177,7 @@ maintainers if you want to suggest a change.

 * - `Rule 5.4 
`_
   - Required
   - Macro identifiers shall be distinct
- - The Xen characters limit for macro identifiers is 40. Public
+ - The Xen characters limit for macro identifiers is 64. Public
 headers (xen/include/public/) are allowed to retain longer
 identifiers for backward compatibility.


While for external identifiers 64 can be considered as random as 63,
for internal identifiers and macro names 63, which is what the C99
standard guarantees, is better than 64 (which is one more than the
standard guarantees).

Kind regards,

   Roberto