Thanks for digging into this!
The assembly portion of OMPI is quite squirrelly and dangerous to mess
with. We'll need to check into this carefully to make sure that it
works properly on all supported architectures...
As for other bounds checking, would you mind checking the OMPI
development SVN trunk instead of the v1.2 series? We're working on
releasing the new version (v1.3 series) and there have been many, many
changes since the v1.2 series. There's a little instability on the
trunk right now with some recent PML changes that went in, but
hopefully we'll have those solved soon.
On Jun 13, 2008, at 5:13 AM, Gabriele Fatigati wrote:
I'm sorry.
The previous code block reported, is referred to 32 bit not 64. So,
the right code block is:
static inline int opal_atomic_cmpset_32( volatile int32_t *addr,
int32_t oldval, int32_t
newval)
{
unsigned char ret;
__asm__ __volatile (
SMPLOCK "cmpxchgl %1,%2 \n\t"
"sete %0 \n\t"
: "=qm" (ret)
: "q"(newval), "m"(*(volatile long*)addr),
"a"(oldval) //<<<<< HERE
: "memory");
return (int)ret;
}
2008/6/13 Gabriele Fatigati <g.fatig...@cineca.it>:
Maybe, i solved this bug, deleting long cast.
Now, in compile time, it works well, but at runtime, there are other
problems, like this:
../../../opal/class/opal_object.h:428:Bounds error: pointer
arithmetic would overrun the end of the object.
../../../opal/class/opal_object.h:428: Pointer value: 0x8, Size: 8
../../../opal/class/opal_object.h:428: Object `orte_system_info':
../../../opal/class/opal_object.h:428: Address in memory:
0x0 .. 0xf
../../../opal/class/opal_object.h:428: Size: 64
bytes
../../../opal/class/opal_object.h:428: Element size: 1
bytes
../../../opal/class/opal_object.h:428: Number of elements: 64
../../../opal/class/opal_object.h:428: Created at: util/
sys_info.c, line 43
../../../opal/class/opal_object.h:428: Storage class: static
There are very much error of this type, differenting by line code
error in /opal/class/opal_object.h: . All errors are generated by
same line code:
util/sys_info.c, line 43
Final status of MPI Job is ever "Undefined".
Another bug?
2008/6/12 Gabriele Fatigati <g.fatig...@cineca.it>:
I found that the error starts in this line code:
static opal_atomic_lock_t class_lock = { { OPAL_ATOMIC_UNLOCKED } };
in class/opal_object.c, line 52
and generates the bound error in this code block:
static inline int opal_atomic_cmpset_64( volatile int64_t *addr,
int64_t oldval, int64_t newval)
{
unsigned char ret;
__asm__ __volatile (
SMPLOCK "cmpxchgq %1,%2 \n\t"
"sete %0 \n\t"
: "=qm" (ret)
: "q"(newval), "m"(*((volatile long*)addr)),
"a"(oldval) //<<<<< HERE
: "memory");
return (int)ret;
}
in /opal/include/opal/sys/amd64/atomic.h, at line 89
The previous enviroment variable is GCC_BOUNDS_OPTS
Thanks in advance.
2008/6/12 Gabriele Fatigati <g.fatig...@cineca.it>:
Hi,
i have installed OpenMPI 1.2.6, using gcc with bounds checking. But,
when i compile an MPI program, i have many time the same error:
../opal/include/opal/sys/amd64/atomic.h:89: Address in memory:
0x8 .. 0xb
../opal/include/opal/sys/amd64/atomic.h:89: Size:
4 bytes
../opal/include/opal/sys/amd64/atomic.h:89: Element size:
1 bytes
../opal/include/opal/sys/amd64/atomic.h:89: Number of elements: 4
../opal/include/opal/sys/amd64/atomic.h:89: Created at:
class/opal_object.c, line 52
../opal/include/opal/sys/amd64/atomic.h:89: Storage class:
static
../opal/include/opal/sys/amd64/atomic.h:89:Bounds error: attempt to
reference memory overrunning the end of an object.
../opal/include/opal/sys/amd64/atomic.h:89: Pointer value: 0x8,
Size: 8
Setting the enviroment variable to "-never-fatal", the compile
phase, ends successfull. But, at runtime, i have ever the error
above, very much time, and the program fails, with "undefined status".
Is this an OpenMPI bug?
--
Gabriele Fatigati
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatig...@cineca.it
--
Gabriele Fatigati
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatig...@cineca.it
--
Gabriele Fatigati
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatig...@cineca.it
--
Gabriele Fatigati
CINECA Systems & Tecnologies Department
Supercomputing Group
Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
www.cineca.it Tel: +39 051 6171722
g.fatig...@cineca.it _______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems