Re: [sage-devel] Re: Memory leak in |EllipticCurve([n,0]).root_number()| and problem in algebraic geometry

Marc Culler Sun, 01 Sep 2024 09:24:39 -0700

I would say that the getheap behavior is a symptom of the same memory 
management bug(s).  Also, it is not as simple as you suggest.  It is not 
always true that every call to getheap leaves something on the heap:


>>> from cypari import pari
>>> pari.getheap()
[4, 41]
>>> pari.getheap()
[5, 58]
>>> pari.getheap()
[5, 58]
>>> pari.getheap()
[5, 58]

It is also not true that the cypari and cypari2 behaviors are equivalent, 
although they are similar.  (Running the test loop with 10**5 iterations 
does not use anything close to 28 GB with cypari).  I believe that some, 
but definitely not all, of the memory management issues -- namely the most 
glaring ones described in issue #112 -- were improved in cypari by removing 
the code which tries to keep wrapped Pari GENs on the stack instead of 
moving them to the heap.

The behavior where Pari GENs go on the heap and never get removed is 
intertwined with python memory management.  Python and the Pari heap use 
independent reference counting schemes.  The __dealloc__ method of a Python 
Gen object calls gunclone to reduce the reference count of the Pari heap 
GEN which is referenced by the python Gen object.  However, as demonstred 
in issue #112, cypari2 was creating other internal Python objects which 
held references to a Python Gen wrapping a vector or matrix entry.  That 
prevented those Python Gens from being destroyed and therefore prevented 
the Pari GENs wrapped by those Python Gens from being removed from the Pari 
heap.

Apart from the initial call to gclone when a Python Gen is created, I 
haven't found any code within cypari which could increase the reference 
count of a Pari GEN on the Pari heap.  Unless Pari sometimes increases that 
reference count, this would suggest that the core issue lies with Python 
reference counts.  Evidently Python Gen objects are not being dealloc'ed 
because other Python objects hold references to them, and this is 
preventing Pari from removing GENs from its heap.

On Saturday, August 31, 2024 at 1:11:45 PM UTC-5 dim...@gmail.com wrote:

> On Sat, Aug 31, 2024 at 4:35 AM Marc Culler <marc....@gmail.com> wrote:
> >
> > As Dima says, and as the issue he mentions supports, the current cypari2 
> code which attempts to keep Pari Gens on the Pari stack as much as possible 
> is badly broken. There are many situations where Python Gen objects cannot 
> be garbage-collected after being destroyed. I am sure that is a big part of 
> this problem. But I don't think it is the whole story.
> >
> > CyPari has returned to the older design which moves the Pari Gen wrapped 
> by a Python Gen to the pari heap when the python object is created. This 
> eliminates the leaks reported in cypari2 issue #112. But in this context, I 
> am seeing 12 GB of memory (including several gigabytes of swap) in use 
> after I do the following in ipython:
> >
> > In [1]: from cypari import *
> > In [2]: def test(N):
> > ...: for a in range(1, N):
> > ...: e = pari.ellinit([a, 0])
> > ...: m = pari.ellrootno(e)
> > In [3]: %time test(10**5)
> > CPU times: user 699 ms, sys: 38.3 ms, total: 737 ms
> > Wall time: 757 ms
> > In [4]: %time test(10**6)
> > CPU times: user 7.47 s, sys: 392 ms, total: 7.86 s
> > Wall time: 7.93 s
> > In [5]: %time test(10**7)
> > CPU times: user 1min 41s, sys: 6.62 s, total: 1min 47s
> > Wall time: 1min 49s
>
> the picture is very similar to cypari2.
> (with cypari2, one has to run
>
> pari=Pari()
>
> after the
>
> from cypari2 import *
>
> but otherwise it's the same code)
>
> You can inspect the Pari heap, by calling
> pari.getheap()
>
> each call to test() gets you about 3 new objects per call on the heap,
> so after 10^5 calls you get around 300000
> objects there (and with 10^6 calls, around 3 million are added). The
> following is with cypari
>
> In [4]: %time test(10**5)
> CPU times: user 1.46 s, sys: 85.8 ms, total: 1.55 s
> Wall time: 1.55 s
> In [5]: pari.getheap()
> Out[5]: [300001, 14394782]
> In [6]: %time test(10**6)
> CPU times: user 14.9 s, sys: 756 ms, total: 15.7 s
> Wall time: 15.7 s
> In [7]: pari.getheap()
> Out[7]: [3299999, 163655656]
>
> With cypari2, similar:
>
> In [9]: pari.getheap() # 10^5
> Out[9]: [299969, 14392931]
>
> In [12]: pari.getheap() # 10^6
> Out[12]: [3299662, 163635286]
>
>
> And gc.collect() does not do anything, in either case, Pari heap
> remains this big.
>
> As well, with cypari, a call to pari.getheap() adds 1 object there, a
> bug, I guess.
> (this does not happen with cypari2)
> In [14]: pari.getheap()
> Out[14]: [3300004, 163655741]
> In [15]: pari.getheap()
> Out[15]: [3300005, 163655758]
> In [16]: pari.getheap()
> Out[16]: [3300006, 163655775]
> In [17]: pari.getheap()
> Out[17]: [3300007, 163655792]
> In [18]: pari.getheap()
> Out[18]: [3300008, 163655809]
>
> Looks like a memory management bug in both, cypari and cypari2.
>
> Dima
>
> >
> > - Marc
> >
> > On Thursday, August 29, 2024 at 1:19:05 PM UTC-5 dim...@gmail.com wrote:
> >>
> >> It would be good to reproduce this with cypari2 alone.
> >> cypari2 is known to have similar kind (?) of problems:
> >> https://github.com/sagemath/cypari2/issues/112
> >>
> >>
> >> On Thu, Aug 29, 2024 at 6:47 PM Nils Bruin <nbr...@sfu.ca> wrote:
> >> >
> >> > On Thursday 29 August 2024 at 09:51:04 UTC-7 Georgi Guninski wrote:
> >> >
> >> > I observe that the following does not leak:
> >> >
> >> > E=EllipticCurve([5*13,0]) #no leak
> >> > rn=E.root_number()
> >> >
> >> >
> >> > How do you know that doesn't leak? Do you mean that repeated 
> execution of those commands in the same session does not swell memory use?
> >> >
> >> >
> >> > The size of the leak is suspiciously close to a power of two.
> >> >
> >> >
> >> > I don't think you can draw conclusions from that. Processes generally 
> request memory in large blocks from the operating system, to amortize the 
> high overhead in the operation. It may even be the case that 128 Mb is the 
> chunk size involved here! The memory allocated to a process by the 
> operating system isn't a fully accurate measure of memory allocation use in 
> the process either: a heap manager can decide it's cheaper to request some 
> new pages from the operating system than to reorganize its heap and reuse 
> the fragmented space on it. I think for this loop, memory allocation 
> consistently swells with repeated execution, so there probably really is 
> something leaking. But given that it's not in GC-tracked objects on the 
> python heap, one would probably need valgrind information or a keen look at 
> the code involved to locate where it's coming from.
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google 
> Groups "sage-devel" group.
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send an email to sage-devel+...@googlegroups.com.
> >> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sage-devel/e63e2ec9-106a-4ddd-ab16-5c6db4fe83b4n%40googlegroups.com
> .
> >
> > --
> > You received this message because you are subscribed to the Google 
> Groups "sage-devel" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to sage-devel+...@googlegroups.com.
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sage-devel/8f309f36-5a39-4677-a137-60a724e0d970n%40googlegroups.com
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sage-devel/dc3730d9-1b16-4f55-9237-2f31061f80abn%40googlegroups.com.

Re: [sage-devel] Re: Memory leak in |EllipticCurve([n,0]).root_number()| and problem in algebraic geometry

Reply via email to