Re: [sage-devel] Randomised testing against Mathematica

Tim Daly Wed, 03 Mar 2010 06:40:40 -0800

There are two test suites with validated results at
http://axiom-developer.org/axiom-website/CATS/


The CATS (Computer Algebra Test Suite) effort targets
the development of known-good answers that get run
against several systems. These "end result" suites test
large portions of the system. As they are tested against
published results they can be used by all systems.

The integration suite found several bugs in the published
results which are noted in the suite. It also found a bug
introduced by an improper patch to Axiom.

It would be generally useful if Sage developed known-good
test suites in other areas, say infinite sequences and series.
Perhaps such a suite would make a good GSOC effort with
several moderators from different systems.

I have done some more work toward a trigonometric test
suite. So far I have found that Mathematica and Maxima
tend to agree on branch cuts and Axiom and Maple tend
to agree on branch cuts. The choice is arbitrary but
it affects answers. I am having an internal debate about
whether to choose MMA/Maxima compatible answers just to
"regularize" the expected results users will see.

Standardized test suites give our users confidence that
we are generating known-good results for some (small)
range of expected inputs.

An academic-based effort (which Axiom is not) could
approach NIST for funding an effort to develop such
suites. NIST has a website (http://dlmf.nist.gov/)
Digital Library of Mathematical Functions. I proposed
developing Computer Algebra test suites for their
website but NIST does not fund independent open source
projects. Sage, however, could probably get continuous
funding to develop such suites which would benefit all
of the existing CAS efforts.

NSF might also be convinced since such test suites raise
the level of expected quality of answers without directly
competing against commercial efforts. I'd like to see a
CAS testing research lab that published standardized
answers to a lot of things we all end up debating, such
as branch cuts, sqrt-of-squares, foo^0, etc.

Tim Daly


Dr. David Kirkby wrote:

Joshua Herman wrote:
Is there a mathematica test suite we could adapt or a standardized set
of tests we could use? Maybe we could take the 100 most often used
functions and make a test suite?
I'm not aware of one. A Google found very little of any real use.
I'm sure Wolfram Research have such test suites internally, but theyare not public. There is discussion of how they have an internalversion of Mathematica which runs very slowly, but tests things ingreater detail.
http://reference.wolfram.com/mathematica/tutorial/TestingAndVerification.html
Of course, comparing 100 things is useful, but comparing millions ofthem in the way I propose would more likely show up problems.
I think we are all aware that it is best to test on the hardware youare using to be as confident as possible that the results are right.
Of course, Wolfram Research could supply a test suite to checkMathematica on an end user's computer, but they do not do that. Theycould even encrypt it, so users did not know what was wrong, but couldat least alert Wolfram Research.
I'm aware of one bug in Mathematica that only affected old/slowerSPARC machines if Solaris was updated to Solaris 10. I suspect itwould have affected newer machines too, had they been heavily loaded.(If I was sufficiently motivated, I would probably prove that, but I'mnot, so my hypothesis is unproven).
It did not produce incorrect results, but pegged the CPU at 100%forever if you computed something as simple as 1+1.) It was amazinghow that was solved between myself, Casper Dik a kernel engineer atSun and various other people on the Internet. It was Casper whofinally nailed the problem, after I posted the output of lsof, hecould see what Mathematica was doing.
I've got a collection of a few Mathematica bugs, mainly affecting onlySolaris, although one affected at least one Linux distribution too.
http://www.g8wrb.org/mathematica/
One thing I know Mathematica does do, which Sage could do, is toautomatically generate bug report if it finds a problem. At the mostprimitive level, that code might be
if (x < 0)
  function_less()
else if (x == 0)
  function_equal()
else if (x > 0)
  function_greater()
else
  function_error()
If the error is generated, a URL is given, which you click and cansend a bug report to them. It lists the name of the file and linenumber which generated the error. That's something that could be donein Sage and might catch some bugs.
Dave
---- LOOK ITS A SIGNATURE CLICK IF YOU DARE---
http://www.google.com/profiles/zitterbewegung
On Wed, Mar 3, 2010 at 12:04 AM, David Kirkby<david.kir...@onetel.net> wrote:
Has anyone ever considered randomised testing of Sage againstMathematica?
As long as the result is either

a) True or False
b) An integer

then comparison should be very easy. As a dead simple example,

1) Generate a large random number n.
2) Use is_prime(n) in Sage to determine if n is prime or composite.
3) Use PrimeQ[n] in Mathematica to see if n is prime or composite.
4) If Sage and Mathematica disagree, write it to a log file.

Something a bit more complex.
1) Generating random equation f(x) - something that one couldintegrate.
2) Generate generate random upper and lower limits, 'a' and 'b'
3) Perform a numerical integration of f(x) between between 'a' and'b' in Sage
4) Perform a numerical integration of f(x) between between 'a' and 'b'
in Mathematica
5) Compare the outputs of the Sage and Mathematica

A floating point number, would be more difficult to compare, as one
would need to consider what is a reasonable level of difference.

Comparing symbolic results directly would be a much more difficult
task, and probably impossible without a huge effort, since you can
often write an equation in several different ways which are equal, but
a computer program could not easily be programmed to determine if they
are equal.

One could potentially let a computer crunch away all the time, looking
for differences. Then when they are found, a human would had to
investigate why the difference occurs.

One could then add a trac item for "Mathematica bugs" There was once a
push for a public list of Mathematica bugs. I got involved a bit with
that, but it died a death and I became more interested in Sage.

Some of you may know of Vladimir Bondarenko, who is a strange
character who regularly used to publish Mathematica and Maple bugs he
had found. In some discussions I've had with him, he was of the
opinion that Wolfram Research took bug reports more seriously than
Maplesoft. I've never worked out what technique he uses, but I believe
is doing some randomised testing, though it is more sophisticated that
what I'm suggesting above.

There must be a big range of problem types where this is practical -
and a much larger range where it is not.

You could at the same also compare the time taken to execute the
operation to find areas where Sage is much faster or slower than
Mathematica.

Dave

--
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email tosage-devel+unsubscr...@googlegroups.comFor more options, visit this group athttp://groups.google.com/group/sage-devel
URL: http://www.sagemath.org


--
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Re: [sage-devel] Randomised testing against Mathematica

Reply via email to