Yvan,

Shame on me for bringing this bug back into the 1.1. Until we release the 1.1.1 please use the nightly build or the beta from our web site. Starting with the revision 10710 this bug was removed. Forever I hope :)

  Thanks,
    george.

On Jul 10, 2006, at 5:27 PM, Yvan Fournier wrote:

Hello,

I just retried replicating the datatype bug on a SUSE Linux 10.1 system
(on a 32-bit Pentium-M system). Actually, I even get a segmentation
fault at some point. I attach the logfile for the test case
compiled in debug mode, run once directly, the again with valgrind,
as well as my ompi_info output.

I have also encountered the bug on the "parent" case (similar, but
more complex) on my work machine (dual Xeon under Debian Sarge),
but I'll check this simpler test on it just in case.

Best regards,

        Yvan Fournier



On Sun, 2006-07-09 at 12:00 -0400, users-requ...@open-mpi.org wrote:
Send users mailing list submissions to
        us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
        users-requ...@open-mpi.org

You can reach the person managing the list at
        users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

   1. Re: Datatype bug regression from Open MPI 1.0.2 to Open MPI
      1.1 (George Bosilca)


--------------------------------------------------------------------- -

Message: 1
Date: Sat, 8 Jul 2006 13:47:05 -0400 (Eastern Daylight Time)
From: George Bosilca <bosi...@cs.utk.edu>
Subject: Re: [OMPI users] Datatype bug regression from Open MPI 1.0.2
        to Open MPI 1.1
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <Pine.WNT.4.64.0607081344080.2944@bosilca>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

Yvan,

I'm unable to replicate this one with the latest Open MPI trunk version. As there is no difference between the trunk and the latest 1.1 version on the datatype, I think the bug cannot be reproduced using the 1.1 either. I compiled the test twice once using the indexed datatype and once without and the output is exactly the same. I run it on my Apple G5 desktop as well as on a cluster of AMD 64, over shared memory and TCP. Can you please
recheck that your error is comming from the type indexed please.

   Thanks,
     george.


On Sat, 1 Jul 2006, Yvan Fournier wrote:

Hello,

I had encountered a bug in Open MPI 1.0.1 using indexed datatypes
with MPI_Recv (which seems to be of the "off by one" sort), which
was corrected in Open MPI 1.0.2.

It seems to have resurfaced in Open MPI 1.1 (I encountered it using
different data and did not recognize it immediately, but it seems
it can reproduced using the same simplified test I had sent
the first time, which I re-attach here just in case).

Here is a summary of the case:

------------------

Each processor reads a file ("data_p0" or "data_p1") giving a list of global element ids. Some elements (vertices from a partitionned mesh)
may belong to both processors, so their id's may appear on both
processors: we have 7178 global vertices, 3654 and 3688 of them being
known by ranks 0 and 1 respectively.

In this simplified version, we assign coordinates {x, y, z} to each
vertex equal to it's global id number for rank 1, and the negative of
that for rank 0 (assigning the same values to x, y, and z). After
finishing the "ordered gather", rank 0 prints the global id and
coordinates of each vertex.

lines should print (for example) as:
 6456 ;   6455.00000   6455.00000   6456.00000
 6457 ;  -6457.00000  -6457.00000  -6457.00000
depending on whether a vertex belongs only to rank 0 (negative
coordinates) or belongs to rank 1 (positive coordinates).

With the OMPI 1.0.1 bug (observed on Suse Linux 10.0 with gcc 4.0 and on Debian sarge with gcc 3.4), we have for example for the last vertices:
 7176 ;   7175.00000   7175.00000   7176.00000
 7177 ;   7176.00000   7176.00000   7177.00000
seeming to indicate an "off by one" type bug in datatype handling

Not using an indexed datatype (i.e. not defining USE_INDEXED_DATATYPE
in the gather_test.c file), the bug dissapears.

------------------

Best regards,

       Yvan Fournier



"We must accept finite disappointment, but we must never lose infinite
hope."
                                   Martin Luther King



------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 328, Issue 1
*************************************

<ompi_info>
<logfile.gz>
<logfile_valgrind.gz>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to