Hi,
Currently we have two similar but distinct approaches to measuring memory
usage
in Servo.
- Servo uses the heapsize and heapsize_derive crates, from crates.io.
- Gecko uses the malloc_size_of and malloc_size_of_derive crates, which are
in
the tree.
Because of this, you see this pattern quite a bit in style code:
> #[cfg_attr(feature = "gecko", derive(MallocSizeOf))]
> #[cfg_attr(feature = "servo", derive(HeapSizeOf))]
> struct Foo {
> ...
> }
Why the difference? heapsize is the original approach. It sorta works, but
has
some big flaws. malloc_size_of is a redesign that addresses these flaws.
- heapsize assumes you only want a single number for the size of any data
structure, when sometimes you want to break it down into different
buckets.
malloc_size_of provides both "shallow" and "deep" measurements, which give
greater flexibilty, which helps with the multi-bucket case.
- heapsize assumes jemalloc is the allocator. This causes build problems in
some configurations, e.g. https://github.com/servo/heapsize/issues/80.
It also means it doesn't integrate with DMD, which is the tool we use to
identify heap-unclassified memory in Firefox.
malloc_size_of doesn't assume a particular allocator. You pass in
functions
that measure heap allocations. This avoids the build problems and also
allows
integration with DMD.
- heapsize doesn't measure HashMap/HashSet properly -- it computes an
estimate
of the size, instead of getting the true size from the allocator. This
estimate can be (and in practice often will be) an underestimate.
malloc_size_of does measure HashMap/HashSet properly. However, this
requires
that the allocator provide a function that can measure the size of an
allocation from an interior pointer. (Unlike Vec, HashMap/HashSet don't
provide a function that gives the raw pointer to the storage.) I had to
add
support for this to mozjemalloc, and vanilla jemalloc doesn't support it.
(I
guess we could fall back to computing the size when the allocator doesn't
support this, e.g. for Servo, which uses vanilla jemalloc.)
- heapsize doesn't measure Rc/Arc properly -- currently it just defaults to
measuring through the Rc/Arc, which can lead to double-counting.
Especially
when you use derive, where it's easy to overlook that Rc/Arc typically
need
special handling. (This is https://github.com/servo/heapsize/issues/37.)
malloc_size_of does measure Rc/Arc properly. It lets you provide a table
that
tracks which pointers have already been measured, which is used to prevent
double-counting. malloc_size_of also doesn't implement the standard
MallocSizeOf trait for Rc/Arc, which means they can't be unintentionally
measured via derive. (You can still use derive if you explicitly choose to
ignore all Rc/Arc fields, however.)
Basically, malloc_size_of is heapsize done right, and using both is silly. I
went with this dual-track approach while adding memory reporting to Stylo
because time was tight and the exact design choices required to handle all
the
necessary cases weren't clear. But now that things have settled down I'd
like
to pay back this technical debt by removing the duplication.
I see two options.
- Overwrite the heapsize crate on crates.io with the malloc_size_of code. So
the crate name wouldn't change, but the API would change significantly,
and
it would still be on crates.io. Then switch Servo over to using heapsize
everywhere.
- Switch Servo over to using malloc_size_of everywhere. (This leaves open
the
question of what should happen to the heapsize crate.)
I personally prefer the second option, mostly because I view all of this
code
as basically unstable -- much like the allocator APIs in Rust itself -- and
publishing it on crates.io makes me uneasy. Also, keeping the code in the
tree
makes it easier to modify.
Thoughts?
Nick
_______________________________________________
dev-servo mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-servo