I've been reading this discussion, and we seem to have jumped past a couple of important structural questions (I think we already know what problems we're trying to solve).
The questions I can think of are: I. How is amalgamation of data from various sources done ? Is it done at the time the package is installed (ie, data is requested from the configuration manager), or is it done earlier when the sources are specified. I can see at least the following two models: (a) There is a single database in /var or /etc with all the configuration data. This acts as a cache for already-entered data, a temporary store for data which was prompted for before installation started, and a long-term store for questions which shouldn't be asked again on upgrade. If it is desired to load configuration from another system or a preprepared file it is preloaded, like a cache. (b) There is a configuration file giving several databases, many of which will be read-only, and zero or one of which are read/write or write-only. At the time a package is installed the different data sources are consulted and the right data is chosen. I don't like (b), and can come up with at least these reasons: (1): (b) is much more complicated to implement. We have to (for example) generate a configuration file syntax, and write code to explictly select which data to use explicitly. If we do (a) we only have to have a merging program which can be told whether to prefer old or new data, and the config manager core need not know about data sources. (2): (b) makes it hard to overwrite a whole category of data. For example, supposing I want to say `discard all local config data for the MTA, and fetch it from <machine> instead'. With (b) I have to be able to express that in the config file syntax, and then delete it from the config file again later. With (a) that becomes an explicit operation. (3): Writeback. The config manager will have to write back the answers it gets somewhere so that the next installation won't ask the same questions again. This introduces a fundamental asymmetry in the design, because that data `source' can't be handled like the others. (4) Fundamentally, I just like caches in this context - they have the right properties, particularly the property that you have to be aware that the data might not be available and be prepared to acquire it or fail. II. Where are the questions `defined' ? Many of the proposals that I've seen so far have a fixed list of questions in a file. I don't think this is at all sufficient. For example, a sophisticated mail system configuration might have configs for different virtual domains. The configuration items for each virtual domain should have names which include the domain. Therefore, in principle, the names of config items might not be known until bits of script belonging to the package have been run. Perhaps we want to separate out these `parameters' to config item names ? Then you could have a file which contained query strings something like this: mail-transfer-agent.virtualdomain.aliasfile.%domain "The file which lists aliases for the virtual domain %domain. Each line `<alias>: <newaddress>' in it will arrange for mail to <alias>@%domain to be forwarded to <newaddress>." In any case, in order to get backward compatibility with old versions of dpkg, the postinst script will have to be able to invoke the config manager programs to do actual queries in case dpkg didn't know how to set up the data wherever it is. III. What form does the specification of which questions to ask take ? There will clearly have to be a file or files that tell dpkg or the config manager what questions to ask. These file(s) must be useable when separated out of the .deb, so that we can do prompt-before-install. It is my belief that at least _which_ questions are asked can only be determined programatically, even if there is in some sense a set of possible questions which could be listed. The obvious thing to do is to have a script which dpkg invokes which just asks the questions and stores the answers. It would use some kind of tool which would ask the question and store the answer all in one go. However, this suffers from the problem that this script has to implement the `back' button, etc. So, I propose the following: the script asks all the questions for the package in order, but can exit with a special exit status meaning `please reinvoke me because the user pressed the back button'. When the user presses back, the query tool sets a flag in the _previous_ question asked (which has to be logged) and returns this exit status so that the script (which is using set -e) does too. Then the script goes through the questions again finding the previous answers in the cache until it gets to the flagged one. For simple packages we should provide a way to put all the questions in a file and have the query tool iterate itself. For compatibility with older versions of dpkg which don't know to invoke the new config maintainer script, or installation methods which don't have the scripts available all at once at the start, we should have the postinst call the config manager to run the config script just like dpkg does, so that it doesn't need to care about info not being available and can just retrieve it. IV. What `types' of the configuration data are there ? Is its type stored in the database of answers ? It's entirely unclear to me that anything except the user prompting part of the configuration management system (which has to be invoked with the names of the data entries et al available) needs to know the types of these entries. Carrying type around with the data also produces problems when different parts of the system (data from different sources, different packages using the same data item, old and new versions of packages and config manager) disagree with the nominal type, and means we need to invent a type naming system with all the usual backward compability features. Instead, I think that data should be transported as opaque strings and validated at the point of use. The facility that retrieves a data item should be told the expected `type' and check that the string matches it (by regexp, or whatever). V. Where to put the configuration prompting in the .deb ? I think it is important to make new packages install on old versions of dpkg. Remember, no non-backward-compatible changes without a phase-in period, during which new+new code works in the new way but new+old or old+new works in the old. We also don't want to increase the size of the .deb more than we can help, and preserve its current reasonably clean structure and be as backward compatible as possible Current suggestions seem to be to put the information in additional members of the enclosing ar archive. I disagree with this; I think we should put it in the control.tar.gz subarchive. Unpacking this part is necessary even to find out the name of the package, for crying out loud. There's no harm in requiring its unpacking for pre-configuration. This will also have the benefit that current dpkg versions will ignore it. VI. How does the config manager talk to the package ? I think there are pretty much only two sensible answers to this question: (a) program call or (b) C library call. These calls will allow the package to ask for information and give details about defaults, and should automatically use whatever UI is currently in place. If there is to be a protocol between different parts of the system it should be internal. Ian.