On Wed, May 28, 2014 at 12:32:35AM +0200, Alain Miniussi wrote:
> Unfortunately, the debug library works like a charm (which make the
> uninitialized variable issue more likely).
> 
> Still, the stack trace point to mca_btl_openib_add_procs in
> ompi/mca/btl/openib/btl_openib.c and there is only one division in that
> function (although not floating point) at the end:
> 
>     openib_btl->local_procs += local_procs;
>     openib_btl->device->mem_reg_max = calculate_max_reg () /
> openib_btl->local_procs;
> 
> now, I'm not sure how much I would trust the local_procs initialization:
> 
> for (i = 0, local_procs = 0 ; i < (int) nprocs; i++) {
> 
> I suspect that a compiler could (wrongly) decide to pass the init of
> local_proc if procs = 0 or in  a few other corner cases.

If that is the case the compiler has a bug. From C99 §  6.8.5:

"
6.8.5.3 The for statement
1 The statement
for ( clause-1 ; expression-2 ; expression-3 ) statement
behaves as follows: The expression expression-2 is the controlling expression 
that is
evaluated before each execution of the loop body. The expression expression-3 is
evaluated as a void expression after each execution of the loop body. If 
clause-1 is a
declaration, the scope of any variables it declares is the remainder of the 
declaration and
the entire loop, including the other two expressions; it is reached in the 
order of execution
before the first evaluation of the controlling expression. If clause-1 is an 
expression, it is
evaluated as a void expression before the first evaluation of the controlling 
expression.134)
2 Both clause-1 and expression-3 can be omitted. An omitted expression-2 is 
replaced by a
nonzero constant.


If clause-1 is an expression, it is evaluated as a void expression
before the first evaluation of the controlling expression.
"

That final bit says that clause-1 will always get evaluated.

> Anyway, applying the attache patch on btl_openlib.c seems to fix the issue
> on my small case (but I have no exhaustive test suite to run).

Not sure what is going on here. Is local_procs = 0 when
openib_btl->device->mem_reg_max is calculated? Not sure how that can
happen but if it is possible the correct fix is to check for
openib_btl->local_procs to be zero.

-Nathan

Attachment: pgpZLX4FkPk4F.pgp
Description: PGP signature

Reply via email to