2009/7/17 Dr. David Kirkby <david.kir...@onetel.net>:
> Bill Hart wrote:
>>
>> This sage ticket 6453 is too hard. It actually looks like a "bug" in MPIR
>> to me.
>
> Hi,
>
> I copied this to sage-devel too.
>
>> The only way to test David Kirkby's fix on T2 it is to do sage -sh and
>> then run the spkg-install, etc.
>
>> That means I'd need my own version of sage built on that machine,
>> otherwise the spkg-install fails due to permissions problems.
>
>
>> From
>> what I recall, Sage takes many hours to build on that machine. Well
>> that is if it did actually build in the first place with that
>> compiler.
>
> I've not managed to build Sage completely on either t2 or my home machine.
> Both fail at the same point though
>
> "sage: There was an error installing modified sage library code."
>
>
> The build speed problem is due to ATLAS. Here is the list of packages
> currently built on t2 (it's building now). Look at the times between them
>
> kir...@t2:[/tmp/kirkby/sage-4.1/spkg/installed] $ ls -lrt
> total 640
> -rw-r--r--   1 kirkby   1093           0 Jul 16 20:45 dir-0.1
> -rw-r--r--   1 kirkby   1093           0 Jul 16 20:45 prereq-0.3
> -rw-r--r--   1 kirkby   1093           0 Jul 16 20:45 bzip2-1.0.5
> -rw-r--r--   1 kirkby   1093         221 Jul 16 20:45 sage_scripts-4.1
> -rw-r--r--   1 kirkby   1093         227 Jul 16 20:45 conway_polynomials-0.2
> -rw-r--r--   1 kirkby   1093         216 Jul 16 20:58 mpir-1.2.p4
> -rw-r--r--   1 kirkby   1093         221 Jul 16 20:59 termcap-1.3.1.p0
> -rw-r--r--   1 kirkby   1093         220 Jul 16 21:00 readline-5.2.p7
> -rw-r--r--   1 kirkby   1093         218 Jul 16 21:12 pari-2.3.3.p1
> -rw-r--r--   1 kirkby   1093         217 Jul 16 21:30 ntl-5.4.2.p9
> -rw-r--r--   1 kirkby   1093         222 Jul 16 22:04 eclib-20080310.p7
> -rw-r--r--   1 kirkby   1093         220 Jul 16 22:04 graphs-20070722
> -rw-r--r--   1 kirkby   1093         224 Jul 16 22:04 elliptic_curves-0.1
> -rw-r--r--   1 kirkby   1093         216 Jul 16 22:04 extcode-4.1
> -rw-r--r--   1 kirkby   1093         219 Jul 16 22:06 flint-1.3.0.p2
> -rw-r--r--   1 kirkby   1093         218 Jul 16 22:06 zlib-1.2.3.p4
> -rw-r--r--   1 kirkby   1093         220 Jul 16 22:12 sqlite-3.5.3.p4
> -rw-r--r--   1 kirkby   1093         224 Jul 16 22:14 libgpg_error-1.6.p1
> -rw-r--r--   1 kirkby   1093         223 Jul 16 22:21 libgcrypt-1.4.3.p1
> -rw-r--r--   1 kirkby   1093         218 Jul 16 22:24 opencdk-0.6.6
> -rw-r--r--   1 kirkby   1093         220 Jul 16 22:37 gnutls-2.2.1.p2
> -rw-r--r--   1 kirkby   1093         218 Jul 16 22:38 libpng-1.2.35
> -rw-r--r--   1 kirkby   1093         220 Jul 16 22:58 python-2.6.2.p1
> -rw-r--r--   1 kirkby   1093         222 Jul 16 23:04 freetype-2.3.5.p1
> -rw-r--r--   1 kirkby   1093         217 Jul 16 23:07 gd-2.0.35.p2
> -rw-r--r--   1 kirkby   1093         221 Jul 16 23:07 gdmodule-0.56.p6
> -rw-r--r--   1 kirkby   1093         224 Jul 16 23:07 fortran-20071120.p5
> -rw-r--r--   1 kirkby   1093         223 Jul 16 23:15 lapack-20071123.p0
> -rw-r--r--   1 kirkby   1093         219 Jul 17 07:43 atlas-3.8.3.p5
> -rw-r--r--   1 kirkby   1093         216 Jul 17 08:14 gsl-1.10.p1
> -rw-r--r--   1 kirkby   1093         218 Jul 17 08:16 iml-1.0.1.p11
> -rw-r--r--   1 kirkby   1093         221 Jul 17 08:16 ipython-0.9.1.p0
> -rw-r--r--   1 kirkby   1093         221 Jul 17 08:23 givaro-3.2.13rc2
> -rw-r--r--   1 kirkby   1093         220 Jul 17 08:52 linbox-1.1.6.p0
> -rw-r--r--   1 kirkby   1093         220 Jul 17 08:53 f2c-20070816.p1
> -rw-r--r--   1 kirkby   1093         218 Jul 17 08:53 blas-20070724
> -rw-r--r--   1 kirkby   1093         219 Jul 17 09:02 numpy-1.3.0.p0
> -rw-r--r--   1 kirkby   1093         238 Jul 17 09:09
> matplotlib-0.98.5.3rc0-svn6910.p4
> -rw-r--r--   1 kirkby   1093         223 Jul 17 09:09 mercurial-1.1.2.p0
> -rw-r--r--   1 kirkby   1093         217 Jul 17 09:25 mpfr-2.4.1p0
> -rw-r--r--   1 kirkby   1093         230 Jul 17 09:27
> mpfi-1.3.4-cvs20071125.p7
> -rw-r--r--   1 kirkby   1093         219 Jul 17 09:27 pexpect-2.0.p4
> -rw-r--r--   1 kirkby   1093         222 Jul 17 09:28 pycrypto-2.0.1.p4
>
> What you may notice is a huge time between lapack-20071123.p0 @ 2315 and
> atlas-3.8.3.p5 @ 0743 the next day. In other words ATLAS is taking 8.5 hours
> to build. It is the 'tuning' process which is taking 90% of that time.
>
> That could be reduced dramatically if we could save the temporary files it
> builds, produce a set of tuning values for the sun4v architecture, then
> ATLAS would not need to be tuned every time. But despite trying, and asking
> for some help, I can't stop Sage deleting the intermediate files. See my
> thread:
>
> "Can I keep build data once a package is  installed ok?"
>
> There is a bit of code in $SAGE_ROOT/local/bin/sage-spkg
>
>
> DELETE_TMP=1
>
> changing that to zero, which is what William suggested to do, causes a
> syntax error. I can't work out what's happening. I suspect fixing this
> should be a high priority.

I agree, this is should be a high priority.

>
> I'll create a trac ticket for this.
>
>> I'm not the right person to be reviewing this.
>>
>> I'm also completely unsure whether the patch is valid or not. David
>> has done things like turn testing on which I've no idea is the right
>> thing to do or not, from a Sage point of view. That should be bundled
>> as a separate patch in my view.
>
> I did *not* turn testing on. Perhaps I should have made that clearer.

I am referring to the trac ticket which says:

"A few others things are done in this patch.

Removed a comment at the bottom of spkg-install to bypass the checks
on release systems. Given failures have occurred, it would be unwise
to bypass any checks.
Add a comment to remind people not to bypass the tests.
Update the source to use the latest patches, as strongly recommenced
in the INSTALL file. This brings the code to MPFR 2.4.1 patch level 5,
though I'm considering it mpfr-2.4.1p0 for Sage purposes."

>
> The last few lines of spkg-install in mpfr-2.4.1.spkg were:
>
> #uncomment the line below to avoid running the test suite.
> #this should be done in all final releases
> #exit 0
> cd ..; ./spkg-check
>
> In other words, despite the comment, the testing was on. I changed only the
> comments there, to:
>
> # Do NOT uncomment this line and bypass the checks, as some failures
> # have been observed and this should be carefully tested.
> cd ..; ./spkg-check
>
> (I just realised, I should have said do not comment the line and bypass the
> checks. )

OK.

>
> Since this was 4.1, I assume that is a 'release'. Either way, the testing
> was on.
>
> My personal opinion is that given
>
> * Sage takes many hours to almost build
> * Testing take about 10 minutes,
> * Failures have been seen in mpfr
>
> I feel it's best to leave mpfr testing on permanently.

I agree totally. But some sage devs complain that building Sage takes
too long, so there is always a tradeoff between better testing and
taking too long to build. It all adds up.

>
>> Besides that the issue seems to be a compiler bug, which occurs from
>> gcc 4.3.0 onwards which makes MPN_ZERO miscompile when supplied with a
>> zero operand. But MPN_ZERO is supposed to work with a zero operand, so
>> this is actually a bug in MPIR. Well it would be if it weren't a
>> compiler bug.
>
> The gcc guys don't think it is. They appear to think it is a bug in Solaris,
> as they think the assembly code is correct.

Oh dear. Groan.

>
>> David Kirkby claims the problem is due to memset being implemented
>> differently on that architecture with that version of Sun. But I don't
>> think MPN_ZERO actually uses memset. It is way too slow. We have a
>> macro which gets expanded for MPN_ZERO.
>
> OK.
>
>> Basically on T2 we should be patching MPIR so that it uses code which
>> compiles correctly on T2, otherwise it is a bug in MPIR which may
>> cause an infinite number of bugs in Sage. (MPN_ZERO is used a *LOT*).
>> For this reason, I'm declaring the fix incorrect. I hate doing this,
>> because David has put a huge amount of effort into this one.
>
> OK, no problem. I'll just use it as a private one, if I need to use another
> version of gcc.

Yeah, see my comments in the latter email. The MPN_ZERO which is
broken is the one in MPFR not MPIR. So my comments are just wrong. The
issue is with memset, which has nothing to do with gcc probably, but
libc (or the solaris equivalent whatever that is).

>
>> Note there is no native mpn_store on T2.
>>
>> The *only* correct fix for this is to write a native assembly language
>> mpn_store for this architecture. We should also do the same for ia64
>> while we are at it.
>
>
>> Warning: this is not going to happen in the next day, or even week.
>> We'll open an MPIR ticket for it. In the mean time, the *only* correct
>> fix for Sage is to not compile Sage on T2 with gcc 4.3.0 or later. It
>> should bail out with a warning. The compiler bug should also be
>> reported by the MPIR crew to gcc, once we have verified that it is
>> indeed a miscompilation. It is clear they haven't fixed this since it
>> appeared in gcc 4.3.0.
>
> It has already been reported. In fact, the URL to the report is in SPKG.txt:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40757

I see.

>
>> You *cannot trust* Sage built on T2 with gcc 4.3.0 or later regardless
>> of whether you get the test suite to pass or not. MPN_ZERO is a
>> critical function used extensively in MPIR itself and many things
>> which use it, e.g. it is used in the FFT.
>
> I can understand your point of view here Bill.
>
> One concern I have over not patching is that using gcc 4.2.4 might break the
> build of polybori. I believe it does, but I am in the process of doing a
> fresh build from scratch and verifying this. Polybori is seriously broken in
> too many ways to list, but Alexander Dreyer is looking at fixing that. He
> now has an account on t2.
>

I see, so you are saying we'd actually have to use an earlier version
of gcc 4.2.4 or earlier.

> There is no significant specific issues on t2 that I am aware - I'm
> observing the same on my machine at home, though I am not experiencing the
> MPFR issue.
>

In retrospect, I think your patch is the right way to go, and I'd have
no issue with using a later gcc (4.3.1 or later) on T2. The most
important thing which needs to be done is to fix the ATLAS issue, then
someone can begin to test some of these patches you have been coming
up with.

An alternative might just be to not bother testing the patch as part
of the review process, on t2, but only test that it doesn't break the
build on some other (non-t2) machine. I mean, it is not like this
patch is going to stop sage building on t2, as it doesn't build
anyway.

I'm prepared to sign off on the fact that the premise behind the patch
is fine. If someone else will glance at the actual patch code and test
it against an already working sage on a non-t2 machine, perhaps people
will allow the patch to pass review without further testing.

Bill.

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to 
sage-devel-unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to