David,
I don't have a direct solution, but I have been playing with an analysis
technique for solving this particular type of problem: scads of Strings
and a need to "make sense of the reference graph".
If you look at this example[1], you'll see that there are 202K instances
of String (this example is from a GemStone db). the nodes pointing to
the String node, show that of the 202K strings, there are 70K instances
of Array referencing one or more of the 202k strings; 47K instances of
MCVersionInfo and 43K MethodVersionRecord instances.
I've found that this approach can help you understand why you have so
many strings ...
The basic technique is to gather the instances of String, then for each
instance of string, gather the collection of objects that reference the
String instance and summarize the reference by class instance count and
keep an IdentitySet of the instances referencing Strings by class so
that you can build the next level of references ....
For GemStone, this information is displayed using Roassal2 and the
calculations are done by scanning a backup of the repository...
You could probably brute force calculate this in Pharo (be sure to
isolate the objects that you are using in your analysis from the set of
objects being analyzed otherwise things get out of control ...
HTH,
Dale
[1]
https://github.com/dalehenrich/obex#class-instance-counts-based-on-selected-set-of-instances