Heyho, as we have many tasks on our plate but not nearly enough time nor people to do them all, let me try something and ask you, the project members, to help out. No, I don't want money, though I wouldn't say no to it, of course. Better yet. I have work to give away... And this task has the nicety of not requiring any special privileges to be carried out, so whoever out there wants to help, no matter DD/DM/interested user, as long as you have time and know Python well, speak up. :)
One of the tasks I outlined in my meeting minutes after the ftp-master meeting at [1] is number 13. "dinstall replacement". Even though it is at place 13, it is actually important to get done soon, and I know I myself won't get to it this month, so here it goes. Anyone out there with enough time and python knowledge willing to help? Read on... [1] http://lists.debian.org/debian-project/2010/09/msg00139.html (Im not entirely set on the exact way this works, I am describing my thoughts of it. Im happy to hear constructive ways making it better) (Background: dinstall is basically a set of jobs run in a defined order, sometimes in parallel. What the jobs do is unimportant for the case here (do everything neccessary to update ftp.debian.org and all its mirrors with the new uploads), but having them done in the right order at the right time etc. is pretty important) What we want is something that can about do everything and also figure it out on its own. :) I imagine a python script that in itself is small, with just the basic logic to pull it all together. It should read in a set of values from our database, like archive name, various directory settings, the basic information it needs. Additionally there is a directory with "code dumps", basically a set of python code, each file having a defined structure. The script would read in all of em and figure out what they do / when they expect to run/what they provide. That is, there would be scripts with the following attributes (and only some shown, we currently have around 60 different functions called in a run): (this table should look ok in a monospace font. At least does here :) ) +--------+-----------------+--------------+--------+---------+ |script |provides |depends |priority|archive | +--------+-----------------+--------------+--------+---------+ |override|overrides | |10 | | +--------+-----------------+--------------+--------+---------+ |filelist|filelist | |11 | | +--------+-----------------+--------------+--------+---------+ |packages|packagesfiles |overrides, |10 | | | | |filelist | | | +--------+-----------------+--------------+--------+---------+ |pdiff |pdiff |packages |15 |ftpmaster| +--------+-----------------+--------------+--------+---------+ |mirror |mirror |pdiff|packages|20 | | +--------+-----------------+--------------+--------+---------+ The new script would figure out that it has to run overrides first, then filelist followed by packages. Then, if the archive is named ftpmaster, it would run pdiff followed by mirror. (in this definition no archive entry means all archives, something set is a list of archives to run on). And so if the archive is not ftpmaster (say backports or security) it would skip pdiff and go to mirror directly. All scripts need to be run unless they are not relevant for the current archive. Priorities can be used to select which task to run first when executing them in parallel and no dependency gets any order into it. Same priority -> random, or alphabetic, or whatever order of execution) Tasks that do not depend on each other should run in parallel, up to a configurable limit of processes. There should be a way to have "sync points" in this process, ie. at such a point all tasks, however many in parallel, defined prior the "sync point" need to be finished before it goes to the next waiting task. (Yeah, much like an init system). An easy first step can be a tool that: 1. reads in the scripts 2. computes the optimal scheduling 3. outputs a list of processing steps, each step containing a list of tasks that can be run in parallel. Of course the system needs to keep the existing features of dinstall, that is, a state machine that keeps track of the advancement of the process. We need that, because we need to be able to break at any point and cleanly restart there. (Think of a sudden reboot). A second step can be the extension of the script to take this tasklist and run the tasks, keeping track of progress, handling restarts in case the process was interrupted. If it then can also store the result of step 1, so it could reuse them in later runs, provided the input values didn't change, it sounds perfect. (Yes, the scheduler CAN be pretty costly, thats fine) Its not all too hard to do, but it needs time. Time is scarce, so is there anyone out there that has enough time to spare for this task? :) (You don't actually need to write all the modules, the code for them exists already, the main script is what counts). We do have a test system, though that one is limited to DD/DM access. Setting up an own dak instance would be possible, but dak is very ungrateful if it doesn't know you, so I think we find another way there. The majority of this luckily doesn't need one (which is why I ask the whole world to help). :) Quite a bit of the software is not THAT ftp-master specific, so a lot of its development can be supported by a test suite that runs anywhere. If you are up to it, I am reachable on irc.debian.org as usual (try #debian-dak), or you can show up at debian-...@lists.debian.org, both fine. -- bye, Joerg Lisa, you’re a Buddhist, so you believe in reincarnation. Eventually, Snowball will be reborn as a higher life form… like a snowman.
pgpKmPdTQ0Yf9.pgp
Description: PGP signature