On 08/11/2010 07:19 PM, Robert Bradshaw wrote:
> On Wed, Aug 11, 2010 at 4:15 PM, Mitesh Patel <qed...@gmail.com> wrote:
>> On 08/11/2010 03:25 AM, Mitesh Patel wrote:
>>> On 08/05/2010 11:12 PM, Robert Bradshaw wrote:
>>>> On Thu, Aug 5, 2010 at 4:26 AM, Mitesh Patel <qed...@gmail.com> wrote:
>>>>> On 08/04/2010 03:10 AM, Robert Bradshaw wrote:
>>>>>> So it looks like you're getting segfaults all over the place as
>>>>>> well... Hmm... Could you test with
>>>>>> https://sage.math.washington.edu:8091/hudson/job/sage-build/163/artifact/cython-devel.spkg
>>>>>> ?
>>>>>
>>>>> With the new package, I get similar results, i.e., apparently random
>>>>> segfaults.  The core score is about the same:
>>>>
>>>> Well, it's clear there's something going on. What about testing on a
>>>> plain-vanilla sage (with the old Cython)?
>>>
>>> I get similar results on sage.math with the released 4.5.3.alpha0:
>>>
>>> $ cd /scratch/mpatel/tmp/cython/sage-4.5.3.alpha0-segs
>>> $ find -name core_x\* -type f | wc
>>>      30      30    1315
>>> $ grep egmentation ptestlong-j20-*log | wc
>>>      11      70     939
>>>
>>> So the problem could indeed lie elsewhere, e.g., in the doctesting system.
>>>
>>> I'll try to run some experiments with the attached make-based parallel
>>> doctester.
>>
>> With the alternate tester and vanilla 4.5.3.alpha0, I get "only" the 20
>> cores in
>>
>> data/extcode/genus2reduction/
>>
>> (I don't know yet how many times and with which file(s) this fault
>> happens during each long doctest run nor whether certain files
>> reproducibly trigger the fault.)


I made 20 copies of 4.5.3.alpha0.  In each copy, I ran the long doctest
suite serially with the alternate tester, which I modified to rename any
new cores after each test.  All copies end up with a "stealth" core in

data/extcode/genus2reduction/

and point to

sage/interfaces/genus2reduction.py

as the only source of this core.  I'll open a ticket.


>> Moreover, the only failed doctests are in startup.py (19 times) and
>> decorate.py (once).  The logs maketestlong-j20-* don't explicitly
>> mention segmentation faults.
>>
>> So sage-ptest may be responsible, somehow, for the faults that leave
>> cores in apparently random directories and perhaps also for random test
>> failures.


Here's one problem: When we test

/path/to/foo.py

sage-doctest writes

SAGE_TESTDIR/.doctest_foo.py

runs the new file through 'python', and deletes it.  This can cause
collisions when we test in parallel multiple files with the same
basename, e.g., __init__, all, misc, conf, constructor, morphism, index,
tests, homset, element, twist, tutorial, sagetex, crystals,
cartesian_product, template, ring, etc.  (There's a similar problem with
testing non-library files, which sage-doctest first effectively copies
to SAGE_TESTDIR.)  We could instead use

.doctest_path_to_foo.py

or

.doctest_path_to_foo_ABC123.py

where ABC123 is unique.  With the latter we could run multiple
simultaneous tests of the same file.  I'll open a ticket or maybe use an
existing one.


>> The Cython beta, at least, may be off the hook.  But I'll check with the
>> alternate tester.


I get the same results with 4.5.3.alpha0 + the Cython beta (rev 3629).


> Thanks for looking into this, another data point is really helpful. I
> put a vanilla Sage in hudson and for a while it was passing all of its
> tests every time, then all of the sudden it started failing too. Very
> strange... For now I've resorted to starting up Sage in a loop (as the
> segfault always happened during startup) and am seeing about a 0.5%
> failure rate (which is the same that I see with a vanilla Sage).
> Hopefully we can get the parallel testing to work much more reliably
> so we can use it as a good indicator in our Cython build farm to keep
> people from breaking Sage (and I'm honestly really surprised we
> haven't run into these issues during release management as well...)

Strange, indeed.

Could the startup segfault be distinct from and not present among the
doctest faults?  Testing about 2500 files with a 0.5% startup failure
rate would give us about 13 extra, random(?) faults per run, which we
don't see.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

Reply via email to