Aha!! I found this in our users mailing list archives:
http://www.open-mpi.org/community/lists/users/2012/01/18091.php
Looks like this is a known compiler vectorization issue.
On Jun 4, 2014, at 1:52 PM, Fischer, Greg A. wrote:
> Ralph,
>
> Thanks for looking. Let me know if there's any othe
Ralph,
Thanks for looking. Let me know if there's any other testing that I can do.
I recompiled with GCC and it works fine, so that lends credence to your theory
that it has something to do with the Intel compilers, and possibly their
interplay with SUSE.
Greg
-Original Message-
From:
Urggg...unfortunately, the people who know the most about that code are all
at the MPI Forum this week, so we may not be able to fully address it until
their return. It looks like you are still going down into that malloc
interceptor, so I'm not correctly blocking it for you.
This run segfa
Ralph,
It segfaults. Here's the backtrace:
Core was generated by `ring_c'.
Program terminated with signal 11, Segmentation fault.
#0 opal_memory_ptmalloc2_int_malloc (av=0x2b82b5300020, bytes=47840385564856)
at ../../../../../openmpi-1.8.1/opal/mca/memory/linux/malloc.c:4098
4098 bck->
Sorry for delay - digging my way out of the backlog. This is very strange as
you are failing in a simple asprintf call. We check that all the players are
non-NULL, and it appears that you are failing to allocate the memory for the
resulting (rather short) string.
I'm wondering if this is some s
He isn't getting that far - he's failing in MPI_Init when the RTE attempts to
connect to the local daemon
On Jun 4, 2014, at 9:53 AM, Gus Correa wrote:
> Hi Greg
>
> From your original email:
>
> >> [binf102:fischega] $ mpirun -np 2 --mca btl openib,self ring_c
>
> This may not fix the prob
Hi Greg
From your original email:
>> [binf102:fischega] $ mpirun -np 2 --mca btl openib,self ring_c
This may not fix the problem,
but have you tried to add the shared memory btl to your mca parameter?
mpirun -np 2 --mca btl openib,sm,self ring_c
As far as I know, sm is the preferred transport
Hi,
I'd like to revive this thread, since I am still periodically getting
errors of this type. I have built 1.8.1 with --enable-debug and run with
-mca btl_openib_verbose 10. Unfortunately, this doesn't seem to provide any
additional information that I can find useful. I've gone ahead and attached
Thanks!! Really appreciate your help - I'll try to figure out what went wrong
and get back to you
On Jun 4, 2014, at 8:07 AM, Fischer, Greg A. wrote:
> I re-ran with 1 processor and got more information. How about this?
>
> Core was generated by `ring_c'.
> Program terminated with signal 11,
I re-ran with 1 processor and got more information. How about this?
Core was generated by `ring_c'.
Program terminated with signal 11, Segmentation fault.
#0 opal_memory_ptmalloc2_int_malloc (av=0x2b48f6300020, bytes=47592367980728)
at ../../../../../openmpi-1.8.1/opal/mca/memory/linux/malloc.c:
Does the trace go any further back? Your prior trace seemed to indicate an
error in our OOB framework, but in a very basic place. Looks like it could be
an uninitialized variable, and having the line number down as deep as possible
might help identify the source
On Jun 4, 2014, at 7:55 AM, Fis
Oops, ulimit was set improperly. I generated a core file, loaded it in GDB, and
ran a backtrace:
Core was generated by `ring_c'.
Program terminated with signal 11, Segmentation fault.
#0 opal_memory_ptmalloc2_int_malloc (av=0x2b8e4fd00020, bytes=47890224382136)
at ../../../../../openmpi-1.8.1/o
I recompiled with "-enable-debug" but it doesn't seem to be providing any more
information or a core dump. I'm compiling ring.c with:
mpicc ring_c.c -g -traceback -o ring_c
and running with:
mpirun -np 4 --mca btl openib,self ring_c
and I'm getting:
[binf112:05845] *** Process received signal
13 matches
Mail list logo