Re: [sage-devel] Re: Memory leak in |EllipticCurve([n,0]).root_number()| and problem in algebraic geometry

Marc Culler Tue, 03 Sep 2024 11:04:11 -0700

Georgi, sorry for misspelling your name.

- Marc


On Tuesday, September 3, 2024 at 12:56:49 PM UTC-5 Marc Culler wrote:

> That is a pretty interesting exchange, it indicates that this memory leak 
> has been known to exist for 6 years, and the remedy is also discussed 
> there.  Details and answers to questions asked in this thread follow.
>
> To answer Dima's question first, the Pari documentation that I am 
> referring to is the User's Guide to the PARI library (libpari.pdf).  Now 
> for the details ...
>
> The Pari GEN constructed by pari.ellinit is what Pari calls a "lazy 
> vector".  These are discussed in section 12.3, which begins by saying
> "The functions in this section are used to implement ell structures and 
> analogous objects, which are vectors some of whose components are 
> initialized to dummy values, later computed on demand."
> The only example known to me of a lazy vector is the GEN constructed by the
> ellinit function.
>
> A "Technical Note" in thst section contains more detail:
> "In the current implementation, S is a t_VEC with d + 1 entries. The first 
> d components are ordinary t_GEN entries, which you can read or assign to in 
> the customary way. But the last component gel(S, d + 1), a t_VEC of length 
> n initialized to zerovec(n), must be handled in a special way: you should 
> never access or modify its components directly, only through the API we are 
> about to describe."
>
> You can see an example of this behavior in Sage as follows:
>
>   sage: import cypari2
>   sage: pari = cypari2.Pari()
>   sage: e = pari.ellinit([11, 0]); e
>   [0, 0, 0, 11, 0, 0, 22, 0, -121, -528, 0, -85184, 1728, Vecsmall([1]),
>   [Vecsmall([64, -1])], [0, 0, 0, 0, 0, 0, 0, 0]]
>
> The last entry of e is a t_VEC containing 8 references to the PARI Zero 
> GEN, created by obj_init.  The ellrootno function fills in this vector, 
> replacing the Zero placeholders by pointers to Pari GENS on the heap.  To 
> see this in sage:
>
> sage: pari.ellrootno(e)
> 1
> sage: e
> [0, 0, 0, 11, 0, 0, 22, 0, -121, -528, 0, -85184, 1728, Vecsmall([1]), 
> [Vecsmall([64, -1])], [0, 0, 0, 0, 0, [3872, 4, [2, 5; 11, 2], [[5, 3, 0, 
> 2], [2, 3, 0, 2]]], [1, Vecsmall([-1, 1])], [[2, 11]~]]]
>
> To answer Giorgi's question one can observe that, while ellrootno 
> *usually* fills in the lazy components, sometimes it doesn't need to:
> sage: e = pari.ellinit([2**70, 0]); e
> [0, 0, 0, 1180591620717411303424, 0, 0, 2361183241434822606848, 0, 
> -1393796574908163946345982392040522594123776, -56668397794435742564352, 0, 
> -105312291668557186697918027683670432318895095400549111254310977536, 1728, 
> Vecsmall([1]), [Vecsmall([64, -1])], [0, 0, 0, 0, 0, 0, 0, 0]]
> sage: pari.ellrootno(e)
> 1
> sage: e
> [0, 0, 0, 1180591620717411303424, 0, 0, 2361183241434822606848, 0, 
> -1393796574908163946345982392040522594123776, -56668397794435742564352, 0, 
> -105312291668557186697918027683670432318895095400549111254310977536, 1728, 
> Vecsmall([1]), [Vecsmall([64, -1])], [0, 0, 0, 0, 0, [32, 4, Mat([2, 5]), 
> [[5, -7, 0, 4]]], [1, Vecsmall([-1])], [[2]~, [131072, 0, 0, 0], [0, 0, 0, 
> 4, 0, 0, 8, 0, -16, -192, 0, -4096, 1728, Vecsmall([1]), [Vecsmall([64, 
> -1])], [0, 0, 0, 0, 0, 0, 0, 0]]]]]
>
> (Note that the vector in the last entry is still all zeroes.)
>
> In any case, the heap GENs added lazily by ellrootno cause the memory 
> leak.  If the PARI GEN managed by the Python Gen e is passed to gunclone 
> (as happens in the Gen.__dealloc__ method) those added heap GENs are not 
> freed.  However, the documented API addresses this issue via the function 
> obj_free:
> "void obj_free(GEN S) destroys all clones stored in the n tagged 
> components, and replace them by the initial value 0. The regular entries of 
> S are unaffected, and S remains a valid object. This is used to avoid 
> memory leaks."
>
> Thus the fix for this memory leak (now implemented in CyPari) is to track 
> which Python Gen objects are managing "lazy vector" GENs, and to modify the 
> Gen.__dealloc__ method to make it call obj_free before calling gunclone for 
> those objects.  Unfortunately, if obj_free is passed a GEN which is not a 
> lazy vector, i.e. was not initialized by obj_init, then a segfault results. 
>  So the tracking is essential.  In CyPari I added a cdef field to the Gen 
> class which is used to track whether the GEN is a lazy vector.  I had to 
> modify the auto_gen code to allow for setting that flag for all such GENs. 
>  (Currently that means only the GENs produced by calling ellinit.  I 
> couldn't find any others, although obj_init does seem to be called by 
> rnfinit as well.)
>
> - Marc
>
> On Monday, September 2, 2024 at 12:13:43 PM UTC-5 dim...@gmail.com wrote:
>
>> It appears that all these gunclone_deep() etc aren't documented in any 
>> proper way (and this is a root cause of this; same applies to memleaks we 
>> have in libsingular interface). Can you post a link to docs you are reading?
>>
>> I can only find an email exchange from 2018 where 
>> similar issues are discussed.
>> It seems that gunclone_deep should be used sparingly.
>> The last message,from Jeroen Demeyer, in that thread says:
>>
>> > On 2018-08-20 18:14, Karim Belabas wrote: 
>>
>> In libpari code, it's usually easy to explicitly call a destructor [ e.g. 
>> obj_free ] any time a function such as 'ellinit' was used 
>>
>> > It's not easy if the list of functions "such as 'ellinit'" is not 
>> known. 
>>
>> <https://pari.math.u-bordeaux.fr/archives/pari-dev-1808/msg00012.html> 
>>
>>
>> On 2 September 2024 03:30:14 BST, Marc Culler <marc....@gmail.com> wrote:
>>
>>> Unsurprisingly, changing gunclone to gunclone_deep in Gen.__dealloc__ is 
>>> not a full fix.  It stops this leak but it also causes a crash when running 
>>> the doctests.  On macOS the crash error is "freeing a pointer which was not 
>>> allocated".  On linux the error is "corrupted size vs. prev_size in 
>>> fastbins".
>>>
>>> I won't force this list to participate in the full debugging experience. 
>>> The basic problem has been identified.  The remedy is far from clear.
>>>
>>> - Marc
>>>
>>> On Sun, Sep 1, 2024, 5:37 PM Marc Culler <marc....@gmail.com> wrote:
>>>
>>>> This is now fixed in cypari.  The fix consisted of replacing gunclone 
>>>> by gunclone_deep in the Gen.__dealloc__ method.  The same change would 
>>>> probably help in cypari2, although I would not expect it to fix the other 
>>>> memory leaks reported in issue #112.
>>>>
>>>> Here is the explanation.  The Pari GEN returned by pari.ellinit is of 
>>>> type t_VEC with entries of various types including t_INT and t_VECSMALL.  
>>>> The function pari.ellrootno returns a Python int, but it has side effects. 
>>>>  
>>>> It also inserts items into the t_VEC GEN representing the elliptic curve.  
>>>> That t_VEC GEN is managed by a Python Gen object, and that Python Gen 
>>>> object would pass the t_VEC GEN to gunclone in its __dealloc method.  
>>>> However, that was not sufficient to cause the new items which were 
>>>> inserted 
>>>> into the t_VEC gen by pari.ellrootno to also be uncloned.  Consequently, 
>>>> those extra GENs remained on the Pari heap forever.  Once the python 
>>>> object 
>>>> was gone, there was no event which would cause them to be destroyed.
>>>>
>>>> If you read the Pari documentation you will "learn" that gunclone_deep 
>>>> is only useful in the context of the peculiar memory model used by gp.  
>>>> Well, apparently it is also useful in the context of the peculiar memory 
>>>> model used by cypari and cypari2.
>>>>
>>>> I tested the cypari fix on linux ubuntu using the test function 
>>>> described earlier in this thread.  With 10**8 iterations it never used 
>>>> over 
>>>> 8.4MB of memory.
>>>>
>>>> - Marc
>>>>
>>>> On Sunday, September 1, 2024 at 12:07:26 PM UTC-6 Marc Culler wrote:
>>>>
>>>>> My experiments with cypari so far indicate the opposite of what I 
>>>>> expected.  (This is with one small but significant bug fix that I found 
>>>>> along the way.) The main observations are:
>>>>>    (1) Without the call to pari.ellrootno there is no memory leak.  
>>>>> The call to pari.getheap returns the same value before and after creating 
>>>>> many elliptic curves in a for loop.
>>>>>   (2) Calling pari.ellrootno(e) in a loop, with the same elliptic 
>>>>> curve e,  also does not increase the Pari heap size.
>>>>>   (3) With both the call to pari.ellinit and the call to  
>>>>> pari.ellrootno in the loop, every Python Gen object created in the loop 
>>>>> gets deallocated and its Pari GEN gets uncloned with reference count 1, 
>>>>> meaning it is removed from the heap. Nonetheless, the heap grows 
>>>>> considerably during the loop.  
>>>>>
>>>>> I conclude that Pari is creating and cloning its own GENs (i.e. GENs 
>>>>> not managed by any Python object) during the call to ellrootno and is not 
>>>>> uncloning all of those GENs.  But it only does that with a newly created 
>>>>> curve.
>>>>>  
>>>>>
>>>>> - Marc
>>>>> On Sunday, September 1, 2024 at 10:24:33 AM UTC-6 Marc Culler wrote:
>>>>>
>>>>>> I would say that the getheap behavior is a symptom of the same memory 
>>>>>> management bug(s).  Also, it is not as simple as you suggest.  It is not 
>>>>>> always true that every call to getheap leaves something on the heap:
>>>>>>
>>>>>> >>> from cypari import pari
>>>>>> >>> pari.getheap()
>>>>>> [4, 41]
>>>>>> >>> pari.getheap()
>>>>>> [5, 58]
>>>>>> >>> pari.getheap()
>>>>>> [5, 58]
>>>>>> >>> pari.getheap()
>>>>>> [5, 58]
>>>>>>
>>>>>> It is also not true that the cypari and cypari2 behaviors are 
>>>>>> equivalent, although they are similar.  (Running the test loop with 
>>>>>> 10**5 
>>>>>> iterations does not use anything close to 28 GB with cypari).  I believe 
>>>>>> that some, but definitely not all, of the memory management issues -- 
>>>>>> namely the most glaring ones described in issue #112 -- were improved in 
>>>>>> cypari by removing the code which tries to keep wrapped Pari GENs on the 
>>>>>> stack instead of moving them to the heap.
>>>>>>
>>>>>> The behavior where Pari GENs go on the heap and never get removed is 
>>>>>> intertwined with python memory management.  Python and the Pari heap use 
>>>>>> independent reference counting schemes.  The __dealloc__ method of a 
>>>>>> Python 
>>>>>> Gen object calls gunclone to reduce the reference count of the Pari heap 
>>>>>> GEN which is referenced by the python Gen object.  However, as 
>>>>>> demonstred 
>>>>>> in issue #112, cypari2 was creating other internal Python objects which 
>>>>>> held references to a Python Gen wrapping a vector or matrix entry.  That 
>>>>>> prevented those Python Gens from being destroyed and therefore prevented 
>>>>>> the Pari GENs wrapped by those Python Gens from being removed from the 
>>>>>> Pari 
>>>>>> heap.
>>>>>>
>>>>>> Apart from the initial call to gclone when a Python Gen is created, I 
>>>>>> haven't found any code within cypari which could increase the reference 
>>>>>> count of a Pari GEN on the Pari heap.  Unless Pari sometimes increases 
>>>>>> that 
>>>>>> reference count, this would suggest that the core issue lies with Python 
>>>>>> reference counts.  Evidently Python Gen objects are not being dealloc'ed 
>>>>>> because other Python objects hold references to them, and this is 
>>>>>> preventing Pari from removing GENs from its heap.
>>>>>>
>>>>>> On Saturday, August 31, 2024 at 1:11:45 PM UTC-5 dim...@gmail.com 
>>>>>> wrote:
>>>>>>
>>>>>>> On Sat, Aug 31, 2024 at 4:35 AM Marc Culler <marc....@gmail.com> 
>>>>>>> wrote: 
>>>>>>> > 
>>>>>>> > As Dima says, and as the issue he mentions supports, the current 
>>>>>>> cypari2 code which attempts to keep Pari Gens on the Pari stack as much 
>>>>>>> as 
>>>>>>> possible is badly broken. There are many situations where Python Gen 
>>>>>>> objects cannot be garbage-collected after being destroyed. I am sure 
>>>>>>> that 
>>>>>>> is a big part of this problem. But I don't think it is the whole story. 
>>>>>>> > 
>>>>>>> > CyPari has returned to the older design which moves the Pari Gen 
>>>>>>> wrapped by a Python Gen to the pari heap when the python object is 
>>>>>>> created. 
>>>>>>> This eliminates the leaks reported in cypari2 issue #112. But in this 
>>>>>>> context, I am seeing 12 GB of memory (including several gigabytes of 
>>>>>>> swap) 
>>>>>>> in use after I do the following in ipython: 
>>>>>>> > 
>>>>>>> > In [1]: from cypari import * 
>>>>>>> > In [2]: def test(N): 
>>>>>>> > ...: for a in range(1, N): 
>>>>>>> > ...: e = pari.ellinit([a, 0]) 
>>>>>>> > ...: m = pari.ellrootno(e) 
>>>>>>> > In [3]: %time test(10**5) 
>>>>>>> > CPU times: user 699 ms, sys: 38.3 ms, total: 737 ms 
>>>>>>> > Wall time: 757 ms 
>>>>>>> > In [4]: %time test(10**6) 
>>>>>>> > CPU times: user 7.47 s, sys: 392 ms, total: 7.86 s 
>>>>>>> > Wall time: 7.93 s 
>>>>>>> > In [5]: %time test(10**7) 
>>>>>>> > CPU times: user 1min 41s, sys: 6.62 s, total: 1min 47s 
>>>>>>> > Wall time: 1min 49s 
>>>>>>>
>>>>>>> the picture is very similar to cypari2. 
>>>>>>> (with cypari2, one has to run 
>>>>>>>
>>>>>>> pari=Pari() 
>>>>>>>
>>>>>>> after the 
>>>>>>>
>>>>>>> from cypari2 import * 
>>>>>>>
>>>>>>> but otherwise it's the same code) 
>>>>>>>
>>>>>>> You can inspect the Pari heap, by calling 
>>>>>>> pari.getheap() 
>>>>>>>
>>>>>>> each call to test() gets you about 3 new objects per call on the 
>>>>>>> heap, 
>>>>>>> so after 10^5 calls you get around 300000 
>>>>>>> objects there (and with 10^6 calls, around 3 million are added). The 
>>>>>>> following is with cypari 
>>>>>>>
>>>>>>> In [4]: %time test(10**5) 
>>>>>>> CPU times: user 1.46 s, sys: 85.8 ms, total: 1.55 s 
>>>>>>> Wall time: 1.55 s 
>>>>>>> In [5]: pari.getheap() 
>>>>>>> Out[5]: [300001, 14394782] 
>>>>>>> In [6]: %time test(10**6) 
>>>>>>> CPU times: user 14.9 s, sys: 756 ms, total: 15.7 s 
>>>>>>> Wall time: 15.7 s 
>>>>>>> In [7]: pari.getheap() 
>>>>>>> Out[7]: [3299999, 163655656] 
>>>>>>>
>>>>>>> With cypari2, similar: 
>>>>>>>
>>>>>>> In [9]: pari.getheap() # 10^5 
>>>>>>> Out[9]: [299969, 14392931] 
>>>>>>>
>>>>>>> In [12]: pari.getheap() # 10^6 
>>>>>>> Out[12]: [3299662, 163635286] 
>>>>>>>
>>>>>>>
>>>>>>> And gc.collect() does not do anything, in either case, Pari heap 
>>>>>>> remains this big. 
>>>>>>>
>>>>>>> As well, with cypari, a call to pari.getheap() adds 1 object there, 
>>>>>>> a 
>>>>>>> bug, I guess. 
>>>>>>> (this does not happen with cypari2) 
>>>>>>> In [14]: pari.getheap() 
>>>>>>> Out[14]: [3300004, 163655741] 
>>>>>>> In [15]: pari.getheap() 
>>>>>>> Out[15]: [3300005, 163655758] 
>>>>>>> In [16]: pari.getheap() 
>>>>>>> Out[16]: [3300006, 163655775] 
>>>>>>> In [17]: pari.getheap() 
>>>>>>> Out[17]: [3300007, 163655792] 
>>>>>>> In [18]: pari.getheap() 
>>>>>>> Out[18]: [3300008, 163655809] 
>>>>>>>
>>>>>>> Looks like a memory management bug in both, cypari and cypari2. 
>>>>>>>
>>>>>>> Dima 
>>>>>>>
>>>>>>> > 
>>>>>>> > - Marc 
>>>>>>> > 
>>>>>>> > On Thursday, August 29, 2024 at 1:19:05 PM UTC-5 dim...@gmail.com 
>>>>>>> wrote: 
>>>>>>> >> 
>>>>>>> >> It would be good to reproduce this with cypari2 alone. 
>>>>>>> >> cypari2 is known to have similar kind (?) of problems: 
>>>>>>> >> https://github.com/sagemath/cypari2/issues/112 
>>>>>>> >> 
>>>>>>> >> 
>>>>>>> >> On Thu, Aug 29, 2024 at 6:47 PM Nils Bruin <nbr...@sfu.ca> 
>>>>>>> wrote: 
>>>>>>> >> > 
>>>>>>> >> > On Thursday 29 August 2024 at 09:51:04 UTC-7 Georgi Guninski 
>>>>>>> wrote: 
>>>>>>> >> > 
>>>>>>> >> > I observe that the following does not leak: 
>>>>>>> >> > 
>>>>>>> >> > E=EllipticCurve([5*13,0]) #no leak 
>>>>>>> >> > rn=E.root_number() 
>>>>>>> >> > 
>>>>>>> >> > 
>>>>>>> >> > How do you know that doesn't leak? Do you mean that repeated 
>>>>>>> execution of those commands in the same session does not swell memory 
>>>>>>> use? 
>>>>>>> >> > 
>>>>>>> >> > 
>>>>>>> >> > The size of the leak is suspiciously close to a power of two. 
>>>>>>> >> > 
>>>>>>> >> > 
>>>>>>> >> > I don't think you can draw conclusions from that. Processes 
>>>>>>> generally request memory in large blocks from the operating system, to 
>>>>>>> amortize the high overhead in the operation. It may even be the case 
>>>>>>> that 
>>>>>>> 128 Mb is the chunk size involved here! The memory allocated to a 
>>>>>>> process 
>>>>>>> by the operating system isn't a fully accurate measure of memory 
>>>>>>> allocation 
>>>>>>> use in the process either: a heap manager can decide it's cheaper to 
>>>>>>> request some new pages from the operating system than to reorganize its 
>>>>>>> heap and reuse the fragmented space on it. I think for this loop, 
>>>>>>> memory 
>>>>>>> allocation consistently swells with repeated execution, so there 
>>>>>>> probably 
>>>>>>> really is something leaking. But given that it's not in GC-tracked 
>>>>>>> objects 
>>>>>>> on the python heap, one would probably need valgrind information or a 
>>>>>>> keen 
>>>>>>> look at the code involved to locate where it's coming from. 
>>>>>>> >> > 
>>>>>>> >> > -- 
>>>>>>> >> > You received this message because you are subscribed to the 
>>>>>>> Google Groups "sage-devel" group. 
>>>>>>> >> > To unsubscribe from this group and stop receiving emails from 
>>>>>>> it, send an email to sage-devel+...@googlegroups.com. 
>>>>>>> >> > To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/sage-devel/e63e2ec9-106a-4ddd-ab16-5c6db4fe83b4n%40googlegroups.com.
>>>>>>>  
>>>>>>>
>>>>>>> > 
>>>>>>> > -- 
>>>>>>> > You received this message because you are subscribed to the Google 
>>>>>>> Groups "sage-devel" group. 
>>>>>>> > To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to sage-devel+...@googlegroups.com. 
>>>>>>> > To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/sage-devel/8f309f36-5a39-4677-a137-60a724e0d970n%40googlegroups.com.
>>>>>>>  
>>>>>>>
>>>>>>>
>>>>>> -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "sage-devel" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/sage-devel/fWBls6YbXmw/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> sage-devel+...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/sage-devel/5211d1b1-0bfc-4e1b-bd0a-8f499539c4cfn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/sage-devel/5211d1b1-0bfc-4e1b-bd0a-8f499539c4cfn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sage-devel/0302f25d-6d39-40fa-be04-11f9d9a5f04cn%40googlegroups.com.

Re: [sage-devel] Re: Memory leak in |EllipticCurve([n,0]).root_number()| and problem in algebraic geometry

Reply via email to