Re: portupgrade O(n^m)?

youshi10 Thu, 15 Feb 2007 11:21:00 -0800

On Thu, 15 Feb 2007, Michel Talon wrote:

Give me a few weeks, and if I can band together with a few people I
wanted to try and port sections of portupgrade and its related tools to
C++ (and maybe do some code tweaks along the way). Most of the ruby
files are over 400 lines long, sparsely commented, and I don't know ruby
enough to port right now, but I've been making some headway lately so
I'll try porting some stuff soon.


I think that porting portupgrade to C++ would be time spent in vain. In
my opinion, some of the basic ideas of portupgrade are deeply flawed,
and as much as one polishes the algorithms it will not gain much. The
idea of keeping state in databases is deeply flawed, it is constantly
broken, and doesn't help in speed at all. This was one of the
motivations of portmaster, get rid of database dependencies. In my
opinion, upgrading progressiveley, that is, port by port, is deeply
flawed. There is 90% chance that something will go wrong in the middle
and you will be stuck with an half upgraded system.

So in my opinion, what is needed is thinking radically new about the
problem, write a prototype in a scripting language to experiment with
the solutions, and then code it in C++. Personnally i have done that, i
have written a python script, which can be found here:
http://www.lpthe.jussieu.fr/~talon/pkgupgrade
(it needs the companion
http://www.lpthe.jussieu.fr/~talon/save_pkg.py).
For the time being, i still have bugs, that i am working on, but at
least these bugs show that the problem is vastly more complicated that
one can imagine at first.

Why python? because it is much more readable than perl or ruby, and much
more performant than ruby. In may opinion ruby is vastly hyperhyped, it
is much closer to rubish than anything else.
What ideas? Don't use any database, database connector, do everything
in memory, recompute needed information on the fly. It works very well,
one can count on something of the order of 1mn to 2mn to perform the
necessary analysis for 700 ports. Second, download as much precompiled
packages as possible, at full speed, that is with the same connection to
the ftp server. This works very well, if you have a good internet
connection, in 15 mn to 20 mn you have your packages.

Why packages?
because packages don't break when compiling. Compiling from source is
asking for problems. If you minimise the number of compilations you
minimise the risk of breakage. Moreover simultaneously with downloading
one can backup old packages, and so, gain time. By contrast, for every
packages, portupgrade first does dependency analysis that could be done
once, then does backup, then fetches the binary package or compiles,
then installs it, then discards backup. Al this is terrible loss of
time.

Finally my script produces a shell script able to do the upgrade. So you
can look in written form to *exactly* what will be removed, what will be
installed by binary packages, and what will be compiled. All necessary
packages for installation are already present on the machine. There is
absolutely no element of surprise, you can evaluate the risk soundly.
These are the ideas i have explored.

Now, performance wise, when you run the shell script it takes around 2
hours. This is entirely time spent by pkg_delete ( roughly 15 mn) and
pkg_add (roughly 1h45mn) for around 500 ports replaced. This is very
long, sure, but it can be optimized only by working on pkg_delete and
pkg_add. No amount of work on portupgrade or a replacement will help in
any way.

As for the remaining bugs i have, they are entirely due to the crappy
complexity that FreeBSD port developers introduce by constantly
modifying the origins of the ports. So for a given program, i can have 3
different origins, one when the port was previously installed on the
machine, another one when the last RELEASE was produced, and the last
one if i compile now the port on the machine with the present state of
the ports tree. These 3 origins may be different, i have examples.
These morons are *constantly* modifying the names, as an exercice in
bikeshed painting. For example pan -> pan2 -> pan, etc. Cycles don't
worry them at all!
Of course, for a given software, you may have all combinations, such as
inexistant or existant at the time the machine was installed, at the
time of the release, or at present.

Compare that to the situation for Debian apt-get. The names are
conserved. They have strict rules about package naming, they stick to
them and don't change them arbitrarily. All packages exist in compiled
form, you don't have to worry about prepackaged or "to be compiled, so
has 50% chance to break". You have only 2 states to consider instead of
3: the state on the machine and the state on the repository. Things are
vastly simpler. No wonders that apt-get works and portupgrade doesn't.
This has nothing to do with the fact that apt-get is written in C++


(sorry to cross post, but this thread is just as relevant to @ports as it is to 
@hackers)

Well, since you brought up Debian's apt-get system I thought it'd be a good 
idea to take a look at the Gentoo Linux emerge / portage system (patterned 
after Freebsd):

=====
Pros:
=====
-It's written in python (portable).
-It's a system which focuses on ports compilation from source, not binary 
package installation.
-Stores information in a db format (not Berkeley DB, but something 
different)for entire system in a common file; stores installed leaf package 
information in another simple textfile.
-Has flags for stability reasons, since some packages are alpha or beta and 
don't compile under certain architectures.
-Portage files are fetched via rsync.
-Has separate portage files which are phased out over time, in case the portage 
maintainers move the files in one release. The maintainers then create an 
informative message which describes what's going on while emerging the package 
or going through the portage database. If possible the outdated package is 
pruned and the newer, more recent dependency is merged.

=====
Cons:
=====
-It's written in python (not fast).
-Uses rsync.

======
Point:
======
Apart from what's listed in the above paragraph, Gentoo's portage may have 
several things that are better than FreeBSD's port system:

-Limited life cycle for versioning, which doesn't force server / desktop owners 
to fix a number of machines all at once, but instead gives them a heads up 
before a big change occurs and automatically unmerges old dependencies and 
emerges new items, if possible.
-One common interface for package / portage management--not 10 little tools 
which do basically the same thing, or are specialized for specific tasks.
-One common file for all installed packages / ports, not a series of 
directories and files.
-Separate versioning for files, which doesn't break things nearly as much as 
one common ports Makefile for each file.
-A means to search for portage items and their descriptions, without having to 
deal with a tool that doesn't really work reliably.

It's not so much that I'm trying to bash on freebsd, but there's definitely a 
revision that needs to be made to the way that ports / packages are done, 
because it seems that the commitee in charge of ports planning and the overall 
roadmap seem to have let things get a bit off track, just because of the sheer 
number of ports items available. Something can be fixed and should be. I can 
only do a portion of the load myself in so much time, since I'm going to work 
and school right now.

=======
In light of previous statement:
=======

I wasn't trying to port the pkg_* and port* utils to C++ thinking that I would 
magically get more optimized code. Sure, C++ is much better than ruby at 
optimizations if done correctly, but C++ is also easier to screw up than ruby 
or perl or python, because you have the power to shoot yourself in the foot 
easier (not as much as C or ASM, but close).

The point was that with C++ we could finally get a set of standardized tools 
and a common interface for FreeBSD for managing ports / packages which could be 
included in the base system, not a bunch of little specialized tools and 
packages.

I'll have to approach this problem from a black box perspective and be 
carefully in planning this out, but my goal is to be as backwards compatible 
friendly as possible or at least provide migration tools to ease the move from 
the old system to the new one.

Again, if anyone is interested in helping me out, it would be more than 
welcome. That way we could ensure that the project gets done in a timely manner 
and can reduce bugs and think of better solutions (more people can help in 
thinking out of the box, the larger the group).

Thanks,
-Garrett

PS Please reply on the @hackers list, if possible.

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: portupgrade O(n^m)?

Reply via email to