Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

Joshua Mora Wed, 28 Oct 2015 20:31:25 -0400 (EDT)

Diego.
Assuming you are not properly coding the solver.
Write a problem so you know the exact solution.
That is know A (a very simple non singular SDP) and x_ , where x_ !=0. Make x
a linear function or a constant so it is super easy to spot where it is
happening the bad x's.
I assume A has the boundary conditions on it and you are not taking care of
them outside of A. If not you will have to check those too.
Then find b=A*x_.
Given A and b, initialize x to zero.
Then use your CG to find x.
Display/plot your x or the error ( abs(x-x_) ) as you go on the iterations.
Display them in a map where you can easily identify/locate your cores and the
distribution of A,x and b.
You will soon see where (eg. not exchanging data between your cores properly)
your data x is not being properly updated.
I assume you are also using an easy 1D or 2D partitioning of your data so you
can easily spot the issues.


Good luck.
Joshua

PS. Usually you have to build a communication library for your algebra that
you trust (thoroughly tested). Then you build your data types of the algebra
bit a bit: scalar, vector, matrix. Then the operators (addition, product), and
finally your solver  : CG, BiCGSTAB, GMRESR,...

------ Original Message ------
Received: 05:58 PM CDT, 10/28/2015
From: Diego Avesani <diego.aves...@gmail.com>
To: Open MPI Users <us...@open-mpi.org>
Subject: Re: [OMPI users] single CPU vs four CPU result differences,    is it
normal?

> dear Damin,
> I wrote the solver by myself. I have not understood your answer.
> 
> Diego
> 
> 
> On 28 October 2015 at 23:09, Damien <dam...@khubla.com> wrote:
> 
> > Diego,
> >
> > There aren't many linear solvers that are bit-consistent, where the
answer
> > is the same no matter how many cores or processes you use.  Intel's
version
> > of Pardiso is bit-consistent and I think MUMPS 5.0 might be, but that's
> > all.  You should assume your answer will not be exactly the same as you
> > change the number of cores or processes, although you should reach the
same
> > overall error tolerance in approximately the same number of iterations.
> >
> > Damien
> >
> >
> > On 2015-10-28 3:51 PM, Diego Avesani wrote:
> >
> > dear Andreas, dear all,
> > The code is quite long. It is a conjugate gradient algorithm to solve a
> > complex system.
> >
> > I have noticed that when a do cycle is small, let's say
> > do i=1,3
> >
> > enddo
> >
> > the results are identical. If the cycle is big, let's say do i=1,20, the
> > results are different and the difference increase with the number of
> > iterations.
> >
> > What do you think?
> >
> >
> >
> > Diego
> >
> >
> > On 28 October 2015 at 22:32, Andreas Schäfer <gent...@gmx.de> wrote:
> >
> >> On 22:03 Wed 28 Oct     , Diego Avesani wrote:
> >> > When I use a single CPU a get a results, when I use 4 CPU I get
another
> >> > one. I do not think that very is a bug.
> >>
> >> Sounds like a bug to me, most likely in your code.
> >>
> >> > Do you think that these small differences are normal?
> >>
> >> It depends on what small means. Floating point operations in a
> >> computer are generally not commutative, so parallelization may in deed
> >> lead to different results.
> >>
> >> > Is there any way to get the same results? is some align problem?
> >>
> >> Impossible to say without knowing your code.
> >>
> >> Cheers
> >> -Andreas
> >>
> >>
> >> --
> >> ==========================================================
> >> Andreas Schäfer
> >> HPC and Grid Computing
> >> Department of Computer Science 3
> >> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> >> +49 9131 85-27910
> >> PGP/GPG key via keyserver
> >> http://www.libgeodecomp.org
> >> ==========================================================
> >>
> >> (\___/)
> >> (+'.'+)
> >> (")_(")
> >> This is Bunny. Copy and paste Bunny into your
> >> signature to help him gain world domination!
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> >> http://www.open-mpi.org/community/lists/users/2015/10/27933.php
> >>
> >
> >
> >
> > _______________________________________________
> > users mailing listus...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27934.php
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2015/10/27935.php
> >
> 

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
http://www.open-mpi.org/community/lists/users/2015/10/27936.php

Re: [OMPI users] single CPU vs four CPU result differences, is it normal?

Reply via email to