Copilot commented on code in PR #136:
URL: https://github.com/apache/datasketches-rust/pull/136#discussion_r3321957366
##########
datasketches/src/cpc/pair_table.rs:
##########
@@ -246,4 +246,9 @@ impl PairTable {
}
}
}
+
+ /// Returns the size of the heap allocations in bytes
+ pub fn heap_size(&self) -> usize {
+ self.slots.len() * std::mem::size_of::<u32>()
+ }
Review Comment:
`slots` is a `Vec<u32>`, whose true heap footprint is `capacity() *
size_of::<u32>()` rather than `len() * size_of::<u32>()`. Although the current
`new`/`rebuild` paths happen to allocate with `vec![..; n]` (giving `capacity
== len`), relying on that invariant is fragile; using `capacity()` makes the
helper correct under any future change.
##########
datasketches/src/cpc/sketch.rs:
##########
@@ -450,6 +450,18 @@ impl CpcSketch {
matrix
}
+
+ /// Returns the size of the sketch in bytes
+ pub fn size(&self) -> usize {
+ let heap_size = self.sliding_window.len()
+ + self
+ .surprising_value_table
+ .as_ref()
+ .map(|t| t.heap_size())
+ .unwrap_or(0);
Review Comment:
For `Vec`-backed storage the actual heap footprint is `capacity() *
size_of::<T>()`, not `len() * size_of::<T>()`. `Vec` may over-allocate, so
using `len()` can under-report the true memory usage. Consider using
`self.sliding_window.capacity()` here (and `self.slots.capacity()` in
`PairTable::heap_size`). The `Box<[T]>`-backed helpers in HLL don't have this
issue because boxed slices don't have spare capacity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]