Hi,
The following function is completely reasonable. It shouldn't be hard
to implement it (a few lines of C code).
def reset_peak_memory():
# in _tracemalloc.c
tracemalloc_peak_trace_memory = tracemalloc_traced_memory;
Reset the peak to tracemalloc_traced_memory is correct :-)
Can you please open an issue at https://bugs.python.org/ to request
the feature? Do you want to implement it?
Put me (vstinner) in the nosy list of the issue. I wrote tracemalloc
and so could help you to implement the feature ;-)
Victor
Le jeu. 14 mai 2020 à 15:06, <[email protected]> a écrit :
>
> Hi,
>
> It would be helpful for us if tracemalloc had a function that reset the peak
> memory usage counter, without clearing the current traces. At the moment, I
> don't think there's a way to find the peak memory of a subset of the code
> since the initial tracemalloc.start() call, without calling
> tracemalloc.clear_traces(). The latter disturbs other parts of the tracing.
>
> Specifically, it might be a function like (pseudo-implementation):
>
> def reset_peak_memory():
> # in _tracemalloc.c
> tracemalloc_peak_trace_memory = tracemalloc_traced_memory;
>
> This would allow easily determining the peak memory usage of a specific piece
> of code, without disturbing all of the traces. For example, the following
> would set specific_peak to the highest size of traced memory of just line X:
>
> tracemalloc.start()
> # ... code where allocations matter, but the peak does not ...
> peak_memory_doesnt_matter()
>
> tracemalloc.reset_peak_memory()
> peak_memory_is_important() # X
> _, specific_peak = tracemalloc.get_traced_memory()
>
> # ... more code with relevant allocations ...
> peak_memory_doesnt_matter()
>
> tracemalloc.stop()
>
> As sketched above, the implementation of this should be quite small, with the
> core being the line mentioned above, plus all the required extras (locking,
> wrapping, documentation, tests, ...). Thoughts?
>
>
> Full motivation for why we want to do this:
>
> In <https://github.com/stellargraph/stellargraph>, we're using the
> tracemalloc module to understand the memory usage of our core StellarGraph
> graph class (a nodes-and-edges graph, not a plot, to be clear). It stores
> some NumPy arrays of feature vectors associated with each node in the graph,
> along with all of the edge information. Any of these pieces can be large, and
> we want to keep the resource usage as small as possible. We're monitoring
> this by instrumenting the construction: start from a raw set of nodes
> (including potentially large amounts of features) and edges, and build a
> StellarGraph object, recording some metrics:
>
> 1. the time
> 2. the total memory usage of the graph instance
> 3. the additional memory usage, that's not shared with the raw data (in
> particular, if the raw data is 1GB, it's useful to know whether a 1.5GB graph
> instance consists of 0.5GB of new memory, or 1.5GB of new memory)
> 4. the peak memory usage during construction
>
> 2, 3 and 4 we record using a combination of tracemalloc.take_snapshot() and
> tracemalloc.get_traced_memory(), something like:
>
> def diff(after, before): return sum(elem.size_diff for
> after.compare_to(before, "lineno"))
>
> snap_start = take_snapshot()
>
> raw = load_data_from_disk()
> snap_raw = take_snapshot()
>
> # X
>
> graph = create_graph(raw)
> snap_raw_graph = take_snapshot()
> _, mem_peak = get_traced_memory() # 4
>
> del raw
> snap_graph = take_snapshot()
>
> mem_raw = diff(snap_raw, snap_start) # baseline
> mem_graph = diff(snap_graph, snap_start) # 2
> mem_graph_not_shared = diff(snap_raw_graph, snap_raw) # 3
>
> ('measure_memory' in
> <https://nbviewer.jupyter.org/github/stellargraph/stellargraph/blob/93fce46166645dd0d1ca2ea2862b68355826e3fc/demos/zzz-internal-developers/graph-resource-usage.ipynb#Measurement>
> has all the gory details.)
>
> Unfortunately, we want to ignore any peak during data loading: the peak
> during create_graph is all we care about, even if the overall peak (in data
> loading) is higher. That is, we want to only consider the peak memory usage
> after line X. One way to do this would be to call clear_traces() at X, but
> this invalidates the traces used for the 2 and 3 comparisons. I believe
> tracemalloc.reset_peak_memory() is the necessary function to call at X. (Why
> do we want to ignore the peak during data loading? The loading is under the
> control of a user (of stellargraph) as it's typically done via Pandas or
> NumPy and those libraries are out of our control and offer a variety of
> options for tweaking data-loading behavior, whereas the internals of the
> `StellarGraph` instance are in our control and not as configurable by users.)
>
> Thanks,
> Huon Wilson
> _______________________________________________
> Python-ideas mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/QDWI37A4TJXOYUKULGPY2GKD7IG2JNDC/
> Code of Conduct: http://python.org/psf/codeofconduct/
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/T5XIRL4HTW57KM4RWHR67KJTHYF76U2D/
Code of Conduct: http://python.org/psf/codeofconduct/