Hi,
It would be helpful for us if tracemalloc had a function that reset the peak
memory usage counter, without clearing the current traces. At the moment, I
don't think there's a way to find the peak memory of a subset of the code since
the initial tracemalloc.start() call, without calling
tracemalloc.clear_traces(). The latter disturbs other parts of the tracing.
Specifically, it might be a function like (pseudo-implementation):
def reset_peak_memory():
# in _tracemalloc.c
tracemalloc_peak_trace_memory = tracemalloc_traced_memory;
This would allow easily determining the peak memory usage of a specific piece
of code, without disturbing all of the traces. For example, the following would
set specific_peak to the highest size of traced memory of just line X:
tracemalloc.start()
# ... code where allocations matter, but the peak does not ...
peak_memory_doesnt_matter()
tracemalloc.reset_peak_memory()
peak_memory_is_important() # X
_, specific_peak = tracemalloc.get_traced_memory()
# ... more code with relevant allocations ...
peak_memory_doesnt_matter()
tracemalloc.stop()
As sketched above, the implementation of this should be quite small, with the
core being the line mentioned above, plus all the required extras (locking,
wrapping, documentation, tests, ...). Thoughts?
Full motivation for why we want to do this:
In <https://github.com/stellargraph/stellargraph>, we're using the tracemalloc
module to understand the memory usage of our core StellarGraph graph class (a
nodes-and-edges graph, not a plot, to be clear). It stores some NumPy arrays of
feature vectors associated with each node in the graph, along with all of the
edge information. Any of these pieces can be large, and we want to keep the
resource usage as small as possible. We're monitoring this by instrumenting the
construction: start from a raw set of nodes (including potentially large
amounts of features) and edges, and build a StellarGraph object, recording some
metrics:
1. the time
2. the total memory usage of the graph instance
3. the additional memory usage, that's not shared with the raw data (in
particular, if the raw data is 1GB, it's useful to know whether a 1.5GB graph
instance consists of 0.5GB of new memory, or 1.5GB of new memory)
4. the peak memory usage during construction
2, 3 and 4 we record using a combination of tracemalloc.take_snapshot() and
tracemalloc.get_traced_memory(), something like:
def diff(after, before): return sum(elem.size_diff for
after.compare_to(before, "lineno"))
snap_start = take_snapshot()
raw = load_data_from_disk()
snap_raw = take_snapshot()
# X
graph = create_graph(raw)
snap_raw_graph = take_snapshot()
_, mem_peak = get_traced_memory() # 4
del raw
snap_graph = take_snapshot()
mem_raw = diff(snap_raw, snap_start) # baseline
mem_graph = diff(snap_graph, snap_start) # 2
mem_graph_not_shared = diff(snap_raw_graph, snap_raw) # 3
('measure_memory' in
<https://nbviewer.jupyter.org/github/stellargraph/stellargraph/blob/93fce46166645dd0d1ca2ea2862b68355826e3fc/demos/zzz-internal-developers/graph-resource-usage.ipynb#Measurement>
has all the gory details.)
Unfortunately, we want to ignore any peak during data loading: the peak during
create_graph is all we care about, even if the overall peak (in data loading)
is higher. That is, we want to only consider the peak memory usage after line
X. One way to do this would be to call clear_traces() at X, but this
invalidates the traces used for the 2 and 3 comparisons. I believe
tracemalloc.reset_peak_memory() is the necessary function to call at X. (Why do
we want to ignore the peak during data loading? The loading is under the
control of a user (of stellargraph) as it's typically done via Pandas or NumPy
and those libraries are out of our control and offer a variety of options for
tweaking data-loading behavior, whereas the internals of the `StellarGraph`
instance are in our control and not as configurable by users.)
Thanks,
Huon Wilson
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/QDWI37A4TJXOYUKULGPY2GKD7IG2JNDC/
Code of Conduct: http://python.org/psf/codeofconduct/