Executive summary: I've shaved somewhere around 20MB off d-i's memory use in a netboot test. Share and enjoy.
Firstly, I fixed a number of reference-counting bugs and other memory leaks in cdebconf. Most notably, the process of loading a templates file didn't properly free each parsed RFC822 stanza, so leaked memory roughly equivalent to the size of the templates database at boot time (about 2.5MB in the netboot case). I've tested this quite extensively, but there's always the possibility that in correcting memory leaks I went too far and freed memory that was being used or created some other similar bug. Please let me know if you see any weirdness in cdebconf that looks like corrupted memory. Secondly, I made a change to the way translations are handled. The core observation is that cdebconf doesn't really need to store all the translations for inactive languages in memory: all it needs is English (well, C in the general case, but it's a lot simpler to read both C and English throughout) and the currently-selected language. The reason that it hasn't skipped these up to now is that, in order to save the templates database without losing data, it needs to have read everything into memory. Various people noted that it would be OK if we didn't support changing the language after anna has run; that's far enough through the installer that it's an edge case. anna also happens to be the first time that the templates database is saved (at least while dirty, so that it actually gets written out) after startup. This suggests a somewhat cheesy hack, which I've implemented: we add a reload method to the templates database implementation to allow it to reload the database and replace localised strings in memory with those from the filesystem, and call that method each time the language is changed. It's not especially pretty, but it does work. If you change the language before anna runs, you'll still get correct translations thenceforth; once anna runs, the translations you aren't using will be irreversibly forgotten. The result is a memory saving of a good part of the prior final size of the templates file times two (once for the copies no longer held in memory, and once for the reduction in the final size of /var/lib/cdebconf/templates.dat since that's on a tmpfs). This comes to around 18MB in my tests. Again, I've tested this as best I can, but there may be corner cases in terms of changing the language or whatever that I missed. Please let me know if translations inexplicably go missing. Somebody (architecture maintainers?) should update lowmem for all of this. Could we do better than this? Yes, we could. The idea of having cdebconf mmap its templates database has been around for a while, and I discussed this a year or two back in #329743. However, on reflection I think it's going to be hard to do in rfc822db; the assumption of null-terminated strings is just too deeply embedded and it's probably harmful to code maintainability to try to extract it. However, it might be possible to design a new binary database format (let's call it mmapdb) that had its strings null-terminated right there in the file format for ease of mmapping. If properly designed, such a format could be smaller and quicker to load and save as well, which is becoming a concern for the templates database (it can easily take upwards of a second to save, and we already have several measures in place to avoid unnecessary saves). Here's a strawman pseudocode proposal for the format: enum field_id { name = 1, other = 2, /* question fields */ value = 10, flags = 11, owners = 12, variables = 13, template = 14, /* template fields */ type = 20, default = 21, choices = 22, indices = 23, description = 24, extended_description = 25, }; struct field { enum field_id id; unsigned int language; /* reference into database.languages */ char value[]; /* null-terminated */ }; struct item { unsigned int n_fields; struct field fields[]; }; struct database { unsigned int n_languages; char languages[][]; /* null-terminated, packed sequentially */ /* "" indicates the null language, e.g. Description: */ unsigned int n_items; struct item items[]; }; The memory cost here would be one pointer per language, field, and item, which is around 256KB for a current typical templates database after anna runs. We could decrease that to more like 10KB using the same trick of forgetting the pointers to fields in unused languages, but I'm not sure it's worth the bother. (This proposal should be sufficient for the questions database as well as the templates database, but there's no reason why the same database format needs to be used for both, and I'd be inclined to suggest sticking with rfc822db for the questions database since it's easier to read.) One concern with doing this is that the templates database would no longer be readable by 'debconf-get-selections --installer' after installation. To avoid this problem, I would suggest using debconf-copydb to copy the templates database to /target so that it can be converted to the rfc822db format at the same time. Whether any of this is worth the effort is debatable. At this point, the templates database after anna runs is about 300KB, and cdebconf is only going to be using about that much memory for it. It would only be worth it to save memory on the installed system too, where we can't make the same assumptions about dropping translations, or if the inability to change languages after anna becomes a problem. As such, I'm going to close bug #329743 with this change. Thoughts? Cheers, -- Colin Watson [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]