Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

Gilles Gouaillardet Wed, 28 Oct 2015 19:58:27 -0400 (EDT)

Diego,

your problem might be numerically unstable, that's why results mightdiffer between one run and an other.floating point numbers have their own restrictions (rounding errors,absorption, ...)


are you running single or double precision ?
if you are running single precision, you might give double precision a try.

(if your code is written in fortran, you can use the -r8 flag to treatreal (single precision) as double)


let me give you a (theoretical) example :

1 / (1.e+100 + 1 - 1.e+100) = ?

if you do this by hand, the answer is 1

now if you ask a computer using floating point numbers to do that, hemight do


1.e+100 +1 ~= 1.e+100
1.e+100 - 1.e+100 = 0
1 / 0 = Division by zero

an other classic example is
sum = 0
do i=1,n
   sum = sum + 1/i
done

that might look trivial, but it is very hard to get accurate resultswith a computer :

a naive approach will give you inaccurate results

bottom line, you notice differences, and that is normal.

the question is how do you compare your results and how much do theydiffer ?if you do a binary comparison of the results, it is very likely resultswill differ.if you compare a and b, and abs(a-b) / abs(a) is very low (depending onyou using single vs double precision),

then this is likely the normal behaviour.

now if this number is high, that could be a bug in your code (never saynever ...) or your algorithm might be numerically unstable (at least foryour test case)


Cheers,

Gilles

On 10/29/2015 7:58 AM, Diego Avesani wrote:

dear Damin,
I wrote the solver by myself. I have not understood your answer.

Diego

On 28 October 2015 at 23:09, Damien <dam...@khubla.com<mailto:dam...@khubla.com>> wrote:


    Diego,

    There aren't many linear solvers that are bit-consistent, where
    the answer is the same no matter how many cores or processes you
    use.  Intel's version of Pardiso is bit-consistent and I think
    MUMPS 5.0 might be, but that's all.  You should assume your answer
    will not be exactly the same as you change the number of cores or
    processes, although you should reach the same overall error
    tolerance in approximately the same number of iterations.

    Damien


    On 2015-10-28 3:51 PM, Diego Avesani wrote:

    dear Andreas, dear all,
    The code is quite long. It is a conjugate gradient algorithm to
    solve a complex system.

    I have noticed that when a do cycle is small, let's say
    do i=1,3

    enddo

    the results are identical. If the cycle is big, let's say do
    i=1,20, the results are different and the difference increase
    with the number of iterations.

    What do you think?



    Diego


    On 28 October 2015 at 22:32, Andreas Schäfer <gent...@gmx.de
    <mailto:gent...@gmx.de>> wrote:

        On 22:03 Wed 28 Oct     , Diego Avesani wrote:
        > When I use a single CPU a get a results, when I use 4 CPU I
        get another
        > one. I do not think that very is a bug.

        Sounds like a bug to me, most likely in your code.

        > Do you think that these small differences are normal?

        It depends on what small means. Floating point operations in a
        computer are generally not commutative, so parallelization
        may in deed
        lead to different results.

        > Is there any way to get the same results? is some align
        problem?

        Impossible to say without knowing your code.

        Cheers
        -Andreas


        --
        ==========================================================
        Andreas Schäfer
        HPC and Grid Computing
        Department of Computer Science 3
        Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
        +49 9131 85-27910 <tel:%2B49%209131%2085-27910>
        PGP/GPG key via keyserver
        http://www.libgeodecomp.org
        ==========================================================

        (\___/)
        (+'.'+)
        (")_(")
        This is Bunny. Copy and paste Bunny into your
        signature to help him gain world domination!

        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2015/10/27933.php




    _______________________________________________ users mailing
    list us...@open-mpi.org <mailto:us...@open-mpi.org> Subscription:
    http://www.open-mpi.org/mailman/listinfo.cgi/users

    Link to this 
post:http://www.open-mpi.org/community/lists/users/2015/10/27934.php



    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2015/10/27935.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/10/27936.php

Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

Reply via email to