On Mon, Jul 29, 2013 at 05:53:21PM +0100, Michael Meeks wrote: > > I couldn't immediately find the duplication of the names. > > In this case the strings are the full zip file entry paths. e.g. > > "sw/res/sidebar/pageproppanel/portraitcopy_24x24.png" > > Riight - that's interesting :-) IIRC in the past there were two chunks > of code in package/ that duplicated those names (I think). The fragment > from the (AMD) report from December 2006 shows: > > 'package' zip code > -1022k > +500k > reading the large images.zip file creates a huge hash > table with lots of duplicated string stems – 3 days > > Of course, I couldn't tell you if this is still the case; possibly > we're no longer duplicating those strings in that way. The problem was > around 'images.zip' - the archive that has all of our icons in it for > the UI - at least back ~7 years ago ;-)
That seems to make sense that this is about image paths. Most paths seem to come from opt/share/config/images.zip. But that file contains 3800+ entries and only a few seem to be reused later. > > And as far as I can see all the full path names are unique, so no > > actual sharing is taking place here. But is there a place where these > > strings are reused (and also interned)? > > Interesting; of course - we can dump the contents of the interned table > to see if they have ref-count 1 quite simply (?). > > > Replacing the intern with a normal OUString constructor like: > ... > > Seems to save ~200K of memory at least for a quick: > > Nice :-) well - we should just do that then :-) I am tempted to. Will do some more testing first to make sure I am not missing something. > > But that might be too quick to see any effects of this intern action. > > The reason it was added was for images.zip - if the package code has > improved then we should take & save that space/time. I haven't yet found the code which references the image/resources maybe it needs interning itself. But it certainly looks like the current code is a bit too eager interning everything. > > So I guess my general question is how to measure the effects of > > OUString::intern? > > I'd dump the ref-count + string contents of the intern table to see if > there is more wasteage. I'll try that next. For now I used systemtap which happens to have utf16 user string support. It looks all interned strings go through the function rtl_ustring_intern_internal. So probing that and printing the string gives an interesting overview. $ stap -e 'probe process("./solver/unxlngx6.pro/lib/libuno_sal.so").function("rtl_ustring_intern_internal") { log("interning: ". $str$$ . " " . user_string_utf16($str->buffer)); }' -c ./install/program/soffice interning: {.refCount=1, .length=9, .buffer=[108, ...]} links.txt interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_16.png interning: {.refCount=1, .length=18, .buffer=[114, ...]} res/mainapp_32.png interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/sx03251.png interning: {.refCount=1, .length=15, .buffer=[114, ...]} res/lx03251.png interning: {.refCount=1, .length=18, .buffer=[99, ...]} cmd/lc_openurl.png interning: {.refCount=1, .length=20, .buffer=[99, ...]} cmd/lc_adddirect.png interning: {.refCount=1, .length=17, .buffer=[99, ...]} cmd/lc_newdoc.png [...] That shows (full output attached if the mailinglist allows that) interning (at least during startup) is done 4192 times. Only 128 strings are reused. And only 6 are interned 5 times or more: 115 Regular 57 Bold 14 Bold Italic 13 Italic 5 Light 5 Book > You saw the OUString debugging code: RTL_LOG_STRING_NEW / > _STRING_DELETE etc. that can produce a long but crunch-able set of > printfs on stdout: many of which are sadly not that useful due to > OUStringBuffer mutation (IIRC - but presumably some more work could > clean that up). I hadn't seen that yet, but that might be useful to see which strings are recreated multiple times and so are candidates for interning. Is there already code to enable/trigger RTL_LOG_STRING_NEW? Or should I just write my own hooks? Thanks, Mark
interned.out.bz2
Description: BZip2 compressed data
_______________________________________________ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice