Sage Enhancement Proposal - Valgrind & Sage integration: Hunting
memory leaks

Forward: Please excuse the rambling - I have meant to take more time
to write this up but had meetings all day and need to finish a couple
other work related things. Certain points may have been made more than
once and others which I should have been made were omitted by
accident. Feel free to chip in corrections or ask about details - I
plan to move this documents (shall the idea get embraced) into the
wiki. If there is enough interest I am also willing to do a
presentation/demo about the subject at Sage Days 6.

This document might benefit from further proofreading, all spelling
errors are mine.

--------------
I. The Problem
--------------

As bug #274 clearly shows Sage has a problem with memory leaks inside
the pyrex code. Those bugs come in various flavors (I am using the
valgrind terminology):

 - lost memory
 - indirectly lost memory
 - missing deallocations

I am not going to go into details about that at the moment, but plan
to add proper explanations once I start moving this to the wiki.

Another interesting issue has been bug #454 (not resolved yet) which
I will discuss below in detail. It illustrates a problem found when
running "ModularSymbols(501,2).decomposition(3)"

--------------------
II. Sage vs. Valgrind
--------------------

Valgrind Sage is not as staight forward as one would hope. Sage's
Cython code sits underneath python and the python interpreter
itself has some issues when it is being valgrinded. There is
currently no  way to limit the valgrinding to the Cython code,
so we have to devise some technics to work around that.

1) The diff-approach: Start sage with the valgrind option and quit
immediately. Then start sage with the valgrind option again and
run some piece of code. Quit sage and compare the two logs. This
requires some experience to spot the interesting bits of the log.
The log itself is also quite big (on my Linux x86-64 box with
sage 2.8.2pre the start & quit log of sage is about 5.5MB. If you
hit some additional problems it might easily be another couple
megabytes

An example is ticket #454 (slightly edited because the commands were
issued as one single line and I would like to avoid wrapping):

 sage: get_memory_usage()
 m = ModularSymbols(501,2).decomposition(3)
 del m
 ModularSymbols_clear_cache()
 get_memory_usage()
 159.16015625
 185.04296875
 sage: get_memory_usage()
 m = ModularSymbols(501,2).decomposition(3)
 del m
 ModularSymbols_clear_cache()
 get_memory_usage()
 185.04296875
 190.796875
 sage: get_memory_usage()
 m = ModularSymbols(501,2).decomposition(3)
 del m
 ModularSymbols_clear_cache()
 get_memory_usage()
 190.796875
 195.05859375

A valgrind run results in the following output (just the result):

==14358== LEAK SUMMARY:
==14358==    definitely lost: 3,297,622 bytes in 194,267 blocks.
==14358==    indirectly lost: 703,345 bytes in 4,603 blocks.
==14358==      possibly lost: 387,022 bytes in 984 blocks.
==14358==    still reachable: 153,105,982 bytes in 1,028,503 blocks.
==14358==         suppressed: 0 bytes in 0 blocks.

Compared to the empty run you see that there are some problems:

==2609== LEAK SUMMARY:
==2609==    definitely lost: 2,500 bytes in 1 blocks.
==2609==      possibly lost: 276,902 bytes in 769 blocks.
==2609==    still reachable: 130,182,544 bytes in 159,833 blocks.
==2609==         suppressed: 0 bytes in 0 blocks.


2) Finding meta leaks vs. finding specific leaks:

A meta leak is a leak caused by an algorithm which can be compose of
many
specific leaks. It is usually easier to spot specific leaks because
the
code paths should be less complicated and therefore easier to deal
with
from the standpoint of complexity.

Take the following code from ticket #274:

 K = GF(2^16, 'a')
 print type(K)
 R.<x> = K[]
 print type(R)
 s = K.random_element()

 def leak(n):
    m = get_memory_usage()
    for i in range(n):
        _ = R([1])
    print get_memory_usage() - m

 leak(10000)

The code illustrates a memory leak specific to a rather low level
datastructure.

-----------------------
III. Suggested Approach
-----------------------

a) Measure the amount of memory used. Run your favorite piece of code
repeatedly and print out the amount of memory consumed after each
iteration.
If it increases repeatedly open a ticket in the bug tracker.
b) Method (a) can be used on algorithms as well as more simple code
snippets.
Tickets #454 and #274 illustrate those two different cases.
c) Over time all subsystems and data structures should have some
generic code
that checks if repeated execution of the same command leaks memory.
This is
obviously a dream that might never come true, but at least one can
hope. It
is very important in my opinion that Sage cames as close to memory
leak free
as humanly possible because otherwise it might be difficult to run
certain
types of code. If Sage crashes due to a memory leak people will get a
bad
impression. The freeness of (major) memory leaks should also become a
part
of the review process. Correctness should come before features.
d) There needs to be better documentation and also some tutorial on
how
to run sage in valgrind mode as well as interpret the result and most
importantly how to avoid introducing memory leaks in the first place.
This
document is intended as a starting point to do that.

If you look at many commercial as well as open source software
projects you
will be surprised how many of them don't run cleanly under valgrind.
Sage
should not be one of those programs.

While I certainly don't wield any power in the Sage project to make
people
work on certain issues I hope that others will joins me on this
journey. I
am not very familiar with the pyrex code in Sage but somewhat an Idiot
Savant
with valgrind, so I hope that together we can get rid of those pesky
memory
leaks over time.

-------------------------------------
IV. Detailed Analysis of ticket #454
-------------------------------------

The ModularSymbols(501,2).decomposition(3) command uses lots
of linear algebra commands of which several do have leaks and
other problems. So while the valgrind session reveals several
interesting problems it is a good idea to decompose the meta
leak into several specific leaks. More about this see (IV)

For the complete log see 
http://sage.math.washington.edu/home/mabshoff/sage-2.8.1-ticket-454.valgrind
(caution 6.5MB)

Specifically this valgrind run reveals problems in the following
areas:

- with the dense matrix code:

==14358== 1,409,576 bytes in 176,197 blocks are definitely lost in
loss record 7,571 of 7,579
==14358==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==14358==    by 0x95A2697: __gmpz_init (in /tmp/Work2/sage-2.8.1/
sage-2.8.1/local/lib/libgmp.so.3.4.1)
==14358==    by 0x206BC4EF:
__pyx_f_20matrix_integer_dense_20Matrix_integer_dense__zero_out_matrix
(matrix_integer_dense.c:4622)
==14358==    by 0x206EA824:
__pyx_f_20matrix_integer_dense_20Matrix_integer_dense___init__
(matrix_integer_dense.c:3740)
==14358==    by 0x459FB1: type_call (typeobject.c:436)
==14358==    by 0x4156B2: PyObject_Call (abstract.c:1860)
==14358==    by 0x480413: PyEval_EvalFrameEx (ceval.c:3775)
==14358==    by 0x48627F: PyEval_EvalCodeEx (ceval.c:2831)
==14358==    by 0x484243: PyEval_EvalFrameEx (ceval.c:3660)
==14358==    by 0x48627F: PyEval_EvalCodeEx (ceval.c:2831)
==14358==    by 0x4CFA40: function_call (funcobject.c:517)
==14358==    by 0x4156B2: PyObject_Call (abstract.c:1860)

- with the sparse matrix code:

==14358== 65,552 (63,520 direct, 2,032 indirect) bytes in 12 blocks
are definitely lost in loss record 7,511 of 7,579
==14358==    at 0x4A05809: malloc (vg_replace_malloc.c:149)
==14358==    by 0x2018411A:
__pyx_tp_new_18matrix_modn_sparse_Matrix_modn_sparse
(matrix_modn_sparse.c:2793)
==14358==    by 0x45A59A: tp_new_wrapper (typeobject.c:4022)
==14358==    by 0x4156B2: PyObject_Call (abstract.c:1860)
==14358==    by 0x47D801: PyEval_CallObjectWithKeywords (ceval.c:3433)
==14358==    by 0x20D2F169:
__pyx_f_21matrix_integer_sparse_21Matrix_integer_sparse__mod_int_c
(matrix_integer_sparse.c:6968)
==14358==    by 0x20D2B7CB:
__pyx_f_21matrix_integer_sparse_21Matrix_integer_sparse__mod_int
(matrix_integer_sparse.c:6883)
==14358==    by 0x4156B2: PyObject_Call (abstract.c:1860)
==14358==    by 0x47D801: PyEval_CallObjectWithKeywords (ceval.c:3433)
==14358==    by 0x27BB59AA:
__pyx_f_4misc_matrix_rational_echelon_form_multimodular (misc.c:11850)
==14358==    by 0x4156B2: PyObject_Call (abstract.c:1860)
==14358==    by 0x47D801: PyEval_CallObjectWithKeywords (ceval.c:3433)

- with the LinBox Glue code:

==14358== 132,536 (3,640 direct, 128,896 indirect) bytes in 13 blocks
are definitely lost in loss record 7,482 of 7,579
==14358==    at 0x4A06019: operator new(unsigned long)
(vg_replace_malloc.c:167)
==14358==    by 0x1FCDA467:
LinBox::MatrixHom::BlasBlackboxMAP<LinBox::Modular<double>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >,
LinBox::MatrixContainerCategory::Container>::operator()
(LinBox::BlasBlackbox<LinBox::Modular<double> >*&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::Modular<double> const&,
LinBox::MatrixContainerCategory::Container) (in /tmp/Work2/sage-2.8.1/
sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FCDA685: void
LinBox::MatrixHom::map<LinBox::Modular<double>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >
>(LinBox::BlasBlackbox<LinBox::Modular<double> >*&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::Modular<double> const&) (in /tmp/Work2/sage-2.8.1/sage-2.8.1/
local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FD28331: LinBox::GivPolynomial<Integer>&
LinBox::cia<LinBox::GivPolynomial<Integer>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >
>(LinBox::GivPolynomial<Integer>&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::BlasEliminationTraits const&) (in /tmp/Work2/sage-2.8.1/
sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FD28B97: LinBox::GivPolynomial<Integer>&
LinBox::charpoly<LinBox::GivPolynomial<Integer>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >
>(LinBox::GivPolynomial<Integer>&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::RingCategories::IntegerTag const&,
LinBox::BlasEliminationTraits const&) (in /tmp/Work2/sage-2.8.1/
sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FD28BD9: LinBox::GivPolynomial<Integer>&
LinBox::charpoly<LinBox::GivPolynomial<Integer>,
LinBox::UnparametricField<Integer>,
LinBox::RingCategories::IntegerTag>(LinBox::GivPolynomial<Integer>&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::RingCategories::IntegerTag const&, LinBox::HybridSpecifier
const&) (in /tmp/Work2/sage-2.8.1/sage-2.8.1/local/lib/
liblinboxwrap.so.0.0.0)
==14358==    by 0x1FD28C0D: LinBox::GivPolynomial<Integer>&
LinBox::charpoly<LinBox::DenseMatrix<LinBox::UnparametricField<Integer>
>, LinBox::GivPolynomial<Integer>,
LinBox::HybridSpecifier>(LinBox::GivPolynomial<Integer>&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::HybridSpecifier const&) (in /tmp/Work2/sage-2.8.1/sage-2.8.1/
local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FD28C39: LinBox::GivPolynomial<Integer>&
LinBox::charpoly<LinBox::DenseMatrix<LinBox::UnparametricField<Integer>
>, LinBox::GivPolynomial<Integer> >(LinBox::GivPolynomial<Integer>&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&) (in /
tmp/Work2/sage-2.8.1/sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FC73567: linbox_integer_dense_charpoly (in /tmp/
Work2/sage-2.8.1/sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1F98F30C:
__pyx_f_6linbox_20Linbox_integer_dense__poly(_object*, _object*,
_object*) (linbox.cpp:1121)
==14358==    by 0x4156B2: PyObject_Call (abstract.c:1860)
==14358==    by 0x47D801: PyEval_CallObjectWithKeywords (ceval.c:3433)

In LinBox itself, even though I have never seen this in LinBox itself
(yet):

==14358== 380,688 bytes in 27 blocks are indirectly lost in loss
record 7,539 of 7,579
==14358==    at 0x4A06019: operator new(unsigned long)
(vg_replace_malloc.c:167)
==14358==    by 0x1FC9C896:
__gnu_cxx::new_allocator<double>::allocate(unsigned long, void const*)
(in /tmp/Work2/sage-2.8.1/sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FC9C8BE: std::_Vector_base<double,
std::allocator<double> >::_M_allocate(unsigned long) (in /tmp/Work2/
sage-2.8.1/sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FC9C8F0: std::_Vector_base<double,
std::allocator<double> >::_Vector_base(unsigned long,
std::allocator<double> const&) (in /tmp/Work2/sage-2.8.1/sage-2.8.1/
local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FC9C975: std::vector<double, std::allocator<double>
>::vector(unsigned long, double const&, std::allocator<double> const&)
(in /tmp/Work2/sage-2.8.1/sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FC9CA47:
LinBox::DenseMatrixBase<double>::DenseMatrixBase(unsigned long,
unsigned long) (in /tmp/Work2/sage-2.8.1/sage-2.8.1/local/lib/
liblinboxwrap.so.0.0.0)
==14358==    by 0x1FC9CC07:
LinBox::BlasMatrix<double>::BlasMatrix(int, int) (in /tmp/Work2/
sage-2.8.1/sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FCCBDF9:
LinBox::BlasBlackbox<LinBox::Modular<double>
>::BlasBlackbox(LinBox::Modular<double> const&, unsigned long,
unsigned long) (in /tmp/Work2/sage-2.8.1/sage-2.8.1/local/lib/
liblinboxwrap.so.0.0.0)
==14358==    by 0x1FCDA487:
LinBox::MatrixHom::BlasBlackboxMAP<LinBox::Modular<double>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >,
LinBox::MatrixContainerCategory::Container>::operator()
(LinBox::BlasBlackbox<LinBox::Modular<double> >*&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::Modular<double> const&,
LinBox::MatrixContainerCategory::Container) (in /tmp/Work2/sage-2.8.1/
sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FCDA685: void
LinBox::MatrixHom::map<LinBox::Modular<double>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >
>(LinBox::BlasBlackbox<LinBox::Modular<double> >*&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::Modular<double> const&) (in /tmp/Work2/sage-2.8.1/sage-2.8.1/
local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FD28331: LinBox::GivPolynomial<Integer>&
LinBox::cia<LinBox::GivPolynomial<Integer>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >
>(LinBox::GivPolynomial<Integer>&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::BlasEliminationTraits const&) (in /tmp/Work2/sage-2.8.1/
sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==    by 0x1FD28B97: LinBox::GivPolynomial<Integer>&
LinBox::charpoly<LinBox::GivPolynomial<Integer>,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> >
>(LinBox::GivPolynomial<Integer>&,
LinBox::DenseMatrix<LinBox::UnparametricField<Integer> > const&,
LinBox::RingCategories::IntegerTag const&,
LinBox::BlasEliminationTraits const&) (in /tmp/Work2/sage-2.8.1/
sage-2.8.1/local/lib/liblinboxwrap.so.0.0.0)
==14358==

and so on. You can see all the gory details in the log.

Thoughts?

Cheers,

Michael


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

Reply via email to