Configuration management

Ian Jackson 18 Aug 1998 22:06:03 -0000

I've been reading this discussion, and we seem to have jumped past a
couple of important structural questions (I think we already know what
problems we're trying to solve).


The questions I can think of are:

I. How is amalgamation of data from various sources done ?

Is it done at the time the package is installed (ie, data is requested
from the configuration manager), or is it done earlier when the
sources are specified.  I can see at least the following two models:

(a) There is a single database in /var or /etc with all the
configuration data.  This acts as a cache for already-entered data, a
temporary store for data which was prompted for before installation
started, and a long-term store for questions which shouldn't be asked
again on upgrade.  If it is desired to load configuration from another
system or a preprepared file it is preloaded, like a cache.

(b) There is a configuration file giving several databases, many of
which will be read-only, and zero or one of which are read/write or
write-only.  At the time a package is installed the different data
sources are consulted and the right data is chosen.

I don't like (b), and can come up with at least these reasons:

(1): (b) is much more complicated to implement.  We have to (for
example) generate a configuration file syntax, and write code to
explictly select which data to use explicitly.  If we do (a) we only
have to have a merging program which can be told whether to prefer old
or new data, and the config manager core need not know about data
sources.

(2): (b) makes it hard to overwrite a whole category of data.  For
example, supposing I want to say `discard all local config data for
the MTA, and fetch it from <machine> instead'.  With (b) I have to be
able to express that in the config file syntax, and then delete it
from the config file again later.  With (a) that becomes an explicit
operation.

(3): Writeback.  The config manager will have to write back the
answers it gets somewhere so that the next installation won't ask the
same questions again.  This introduces a fundamental asymmetry in the
design, because that data `source' can't be handled like the others.

(4) Fundamentally, I just like caches in this context - they have the
right properties, particularly the property that you have to be aware
that the data might not be available and be prepared to acquire it or
fail.


II. Where are the questions `defined' ?

Many of the proposals that I've seen so far have a fixed list of
questions in a file.  I don't think this is at all sufficient.

For example, a sophisticated mail system configuration might have
configs for different virtual domains.  The configuration items for
each virtual domain should have names which include the domain.

Therefore, in principle, the names of config items might not be known
until bits of script belonging to the package have been run.

Perhaps we want to separate out these `parameters' to config item
names ?  Then you could have a file which contained query strings
something like this:

  mail-transfer-agent.virtualdomain.aliasfile.%domain
  "The file which lists aliases for the virtual domain %domain.
   Each line `<alias>: <newaddress>' in it will arrange for mail
   to <alias>@%domain to be forwarded to <newaddress>."

In any case, in order to get backward compatibility with old versions
of dpkg, the postinst script will have to be able to invoke the config
manager programs to do actual queries in case dpkg didn't know how to
set up the data wherever it is.


III. What form does the specification of which questions to ask take ?

There will clearly have to be a file or files that tell dpkg or the
config manager what questions to ask.  These file(s) must be useable
when separated out of the .deb, so that we can do
prompt-before-install.

It is my belief that at least _which_ questions are asked can only be
determined programatically, even if there is in some sense a set of
possible questions which could be listed.

The obvious thing to do is to have a script which dpkg invokes which
just asks the questions and stores the answers.  It would use some
kind of tool which would ask the question and store the answer all in
one go.

However, this suffers from the problem that this script has to
implement the `back' button, etc.  So, I propose the following: the
script asks all the questions for the package in order, but can exit
with a special exit status meaning `please reinvoke me because the
user pressed the back button'.  When the user presses back, the query
tool sets a flag in the _previous_ question asked (which has to be
logged) and returns this exit status so that the script (which is
using set -e) does too.  Then the script goes through the questions
again finding the previous answers in the cache until it gets to the
flagged one.

For simple packages we should provide a way to put all the questions
in a file and have the query tool iterate itself.

For compatibility with older versions of dpkg which don't know to
invoke the new config maintainer script, or installation methods which
don't have the scripts available all at once at the start, we should
have the postinst call the config manager to run the config script
just like dpkg does, so that it doesn't need to care about info not
being available and can just retrieve it.


IV. What `types' of the configuration data are there ?  Is its type
stored in the database of answers ?

It's entirely unclear to me that anything except the user prompting
part of the configuration management system (which has to be invoked
with the names of the data entries et al available) needs to know the
types of these entries.

Carrying type around with the data also produces problems when
different parts of the system (data from different sources, different
packages using the same data item, old and new versions of packages
and config manager) disagree with the nominal type, and means we need
to invent a type naming system with all the usual backward compability
features.

Instead, I think that data should be transported as opaque strings and
validated at the point of use.  The facility that retrieves a data
item should be told the expected `type' and check that the string
matches it (by regexp, or whatever).


V. Where to put the configuration prompting in the .deb ?

I think it is important to make new packages install on old versions
of dpkg.  Remember, no non-backward-compatible changes without a
phase-in period, during which new+new code works in the new way but
new+old or old+new works in the old.

We also don't want to increase the size of the .deb more than we can
help, and preserve its current reasonably clean structure and be as
backward compatible as possible

Current suggestions seem to be to put the information in additional
members of the enclosing ar archive.  I disagree with this; I think we
should put it in the control.tar.gz subarchive.  Unpacking this part
is necessary even to find out the name of the package, for crying out
loud.  There's no harm in requiring its unpacking for
pre-configuration.

This will also have the benefit that current dpkg versions will ignore
it.


VI. How does the config manager talk to the package ?

I think there are pretty much only two sensible answers to this
question: (a) program call or (b) C library call.  These calls will
allow the package to ask for information and give details about
defaults, and should automatically use whatever UI is currently in
place.

If there is to be a protocol between different parts of the system it
should be internal.


Ian.

Configuration management

Reply via email to