Thanks for digging into this!

The assembly portion of OMPI is quite squirrelly and dangerous to mess with. We'll need to check into this carefully to make sure that it works properly on all supported architectures...

As for other bounds checking, would you mind checking the OMPI development SVN trunk instead of the v1.2 series? We're working on releasing the new version (v1.3 series) and there have been many, many changes since the v1.2 series. There's a little instability on the trunk right now with some recent PML changes that went in, but hopefully we'll have those solved soon.


On Jun 13, 2008, at 5:13 AM, Gabriele Fatigati wrote:

I'm sorry.
The previous code block reported, is referred to 32 bit not 64. So, the right code block is:

static inline int opal_atomic_cmpset_32( volatile int32_t *addr,
int32_t oldval, int32_t newval)
{
   unsigned char ret;
   __asm__ __volatile (
                       SMPLOCK "cmpxchgl %1,%2   \n\t"
                               "sete     %0      \n\t"
                       : "=qm" (ret)
: "q"(newval), "m"(*(volatile long*)addr), "a"(oldval) //<<<<< HERE
                       : "memory");

   return (int)ret;
}

2008/6/13 Gabriele Fatigati <g.fatig...@cineca.it>:
Maybe, i solved this bug, deleting long cast.
Now, in compile time, it works well, but at runtime, there are other problems, like this:

../../../opal/class/opal_object.h:428:Bounds error: pointer arithmetic would overrun the end of the object.
../../../opal/class/opal_object.h:428:  Pointer value: 0x8, Size: 8
../../../opal/class/opal_object.h:428:  Object `orte_system_info':
../../../opal/class/opal_object.h:428: Address in memory: 0x0 .. 0xf ../../../opal/class/opal_object.h:428: Size: 64 bytes ../../../opal/class/opal_object.h:428: Element size: 1 bytes
../../../opal/class/opal_object.h:428:    Number of elements:   64
../../../opal/class/opal_object.h:428: Created at: util/ sys_info.c, line 43
../../../opal/class/opal_object.h:428:    Storage class:        static

There are very much error of this type, differenting by line code error in /opal/class/opal_object.h: . All errors are generated by same line code:

util/sys_info.c, line 43

Final status of MPI Job is ever "Undefined".

Another bug?


2008/6/12 Gabriele Fatigati <g.fatig...@cineca.it>:
I found that the error starts in this line code:

static opal_atomic_lock_t class_lock = { { OPAL_ATOMIC_UNLOCKED } };

in class/opal_object.c, line 52

and generates the bound error in this code block:

static inline int opal_atomic_cmpset_64( volatile int64_t *addr,

           int64_t oldval, int64_t newval)
{
   unsigned char ret;
   __asm__ __volatile (
                       SMPLOCK "cmpxchgq %1,%2   \n\t"
                               "sete     %0      \n\t"
                       : "=qm" (ret)
: "q"(newval), "m"(*((volatile long*)addr)), "a"(oldval) //<<<<< HERE
                       : "memory");

   return (int)ret;
}

in /opal/include/opal/sys/amd64/atomic.h, at line 89

The previous enviroment variable is GCC_BOUNDS_OPTS

Thanks in advance.


2008/6/12 Gabriele Fatigati <g.fatig...@cineca.it>:
Hi,

i have installed OpenMPI 1.2.6, using gcc with bounds checking. But, when i compile an MPI program, i have many time the same error:

../opal/include/opal/sys/amd64/atomic.h:89: Address in memory: 0x8 .. 0xb ../opal/include/opal/sys/amd64/atomic.h:89: Size: 4 bytes ../opal/include/opal/sys/amd64/atomic.h:89: Element size: 1 bytes
../opal/include/opal/sys/amd64/atomic.h:89:    Number of elements:   4
../opal/include/opal/sys/amd64/atomic.h:89: Created at: class/opal_object.c, line 52 ../opal/include/opal/sys/amd64/atomic.h:89: Storage class: static ../opal/include/opal/sys/amd64/atomic.h:89:Bounds error: attempt to reference memory overrunning the end of an object. ../opal/include/opal/sys/amd64/atomic.h:89: Pointer value: 0x8, Size: 8

Setting the enviroment variable to "-never-fatal", the compile phase, ends successfull. But, at runtime, i have ever the error above, very much time, and the program fails, with "undefined status".

Is this an OpenMPI bug?





--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it Tel: +39 051 6171722

g.fatig...@cineca.it



--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it Tel: +39 051 6171722

g.fatig...@cineca.it



--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it Tel: +39 051 6171722

g.fatig...@cineca.it



--
Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it Tel: +39 051 6171722

g.fatig...@cineca.it _______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to