Hi,

On Fri, 4 Feb 2005, Larry McVoy wrote:

> > > > >               CVS             BitKeeper [*]
> > > > >       Deltas  235,956         280,212
> 
> You need to rethink your math, you are way off.  I'll explain it so that
> the rest of the people can see this is just pure FUD.  
> 
> To make sure this was apples to apples I went to the BK and CVS trees
> which are in lock step and reran the numbers:
> 
>                       CVS             BitKeeper       % in CVS
>       file deltas     210,609         218,742         96%
>       changsets       26,603          59,220          44%
> 
> You are not factoring out the ChangeSet deltas, which are BK metadata, 
> they aren't part of the source tree.  If you remove those then the
> difference between CVS and BK revision history is 209K deltas vs
> 221K deltas.  
> 
> In other words, the CVS tree is missing no more than 4% of the deltas
> to the source files.
> 
> READ THAT AGAIN, PLEASE.
> 
> The CVS tree has 96% of all the deltas to all your source files.  96%.  

Before you start shouting, how about you get your numbers right? Your 
first numbers were wrong and the second numbers are still wrong:

$ find -name \*,v -a ! -path ./BitKeeper\* -a ! -name ChangeSet,v | xargs rlog 
| egrep -e '^[0-9]{4}(/[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2}' -e '\(Logical 
change 1.[0-9]+\)' | wc -l
219242
$ find -name \*,v -a ! -path ./BitKeeper\* -a ! -name ChangeSet,v | xargs rlog 
| egrep '\(Logical change 1.[0-9]+\)' | wc -l
187576

That would bring us back to 85% and I leave it to you to find the bug.

> My good friend Stelian would have you believe that you are missing 50%
> of your data when in fact you are missing NONE of your data, you have 
> ALL of your data in an almost the identical form.

The 44% number deserves an explanation, which our good friend Larry would 
have you believe is completely unimportant and thinking otherwise is just 
"FUD". Actually on the contrary it is the more important number.
The 85% number means that the history of a _single_ file can be restored 
with a precision of 85% via bkcvs. If one is only interested in the 
history of a single file, that 85% might be sufficient. OTOH a change to 
the kernel is rarely just a change to a single file. A patch can consist 
of changes to multiple files and this relationship between multiple single 
file changes can only be restored with a precision of 44% via bkcvs.
IOW if one wants to not just look at some random file history, but 
actually restore and use some of it, one can only get 44% of the 59220 
snapshots of kernel history.

> What Stelian is complaining about is the patch set which is easily 
> extractable from CVS is at a coarser granularity than the patch set
> extractable from BK.  That's true but so what?  Before BK you only
> a handful of patches between releases, now you have thousands.

Hmm, that argument is not exactly original anymore. If Linus would say, 
that kernel history has to be exclusively recorded via bk and that the 
history available via bkcvs is the maximum that must be made available to 
other users, you would have a point.
So it's quite legitimate to at least ask occasionally how the 44% 
granularity can be improved. CVS is obviously not really capable of it, 
but RCS on which CVS is based on is. I attached an example script which 
can convert the history of a bk file into RCS. The only thing that is 
missing is to 1) generate a list revisions with their parents, 2) checkout 
a given revision, 3) generate the log for a given revision. (*) I'm quite 
sure that's trivial to do for you and this only took me a few hours 
including testing and cleanup, it only takes a little more work to make 
CVS happy and allow incremental updates, so your "3 month" number is 
absolutely ridiculous.
Any bk user can now easily prove or disprove this by completing my example 
script, I think the "bk prs" man page should be very helpful here. If you 
want to complete it into your very own bk2cvs script, I'd be happy to 
help.

bye, Roman

(*) In case anyone thinks this is simple without bk, try to find the 
parents for this changeset 
http://linux.bkbits.net:8080/linux-2.6/[EMAIL PROTECTED]
#!/usr/bin/ruby

class RcsFile
    attr_reader :path

    def initialize(path)
        @path = path
    end
    def checkout(rev)
        system("co -q -ko -r#{rev} [EMAIL PROTECTED]")
    end
    def checkin(rev, log)
        if rev == "1.1"
            open("|rcs -q -U -i -kb [EMAIL PROTECTED]", "w") { |p| }
            system("ci -q -f1.1 #{path}")
            open("|-") do |p|
                exec("rcs", "-q", "-m1.1:#{log}", @path) if !p
                print p.read
            end
        else
            open("|ci -q -f#{rev} [EMAIL PROTECTED]", "w") do |p|
                p.write log
            end
        end
    end
end

class RcsSrc < RcsFile
    def initialize(path)
        super(path)
    end

    def rev_list
        open("[EMAIL PROTECTED],m") do |f|
            @rev_list = f.readlines
            @rev_list.collect! do |l|
                l.split
            end
        end
    end

    def checkin(rev, *par)
        open(@path, "w") do |f|
            f.write "data #{rev}\n"
        end
        super(rev, log(rev))
        open("[EMAIL PROTECTED],m", "a") do |f|
            f.write "#{rev} #{par.join(" ")}\n"
        end
    end

    def log(rev)
        "checkin #{rev}"
    end
end

class RcsConv < RcsFile
    attr_reader :src, :rev_map, :merge_map

    def initialize(path, src)
        super(path)
        @src = src
        @rev_map = {}
        @merge_map = {}
        if test(?f, "#{path},m")
            open("#{path},m") do |f|
                f.each do |l|
                    a = l.split
                    @rev_map[a[0]] = a[1]
                    @merge_map[a[1]] = a[2] if a[2]
                end
            end
        end
    end
    def finish
        open("#{path},m", "w") do |f|
            @rev_map.each do |k,v|
                f.print("#{k} #{v} [EMAIL PROTECTED]")
            end
        end
    end
    def checkout(rev)
        src.checkout(rev)
        File.rename(@src.path, @path)
    end
    def checkin(par, key)
        if !par
            rev = "1.1"
        else
            rev = par.sub(/\d*$/) { |v| v.succ }
            if @rev_map.has_value?(rev)
                i = "0"
                begin
                    rev = "#{par}.#{i.succ!}.1"
                end until [EMAIL PROTECTED](rev)
            end
        end
        super(rev, src.log(key))
        print "#{key} -> #{rev}\n"
        @rev_map[key] = rev
    end

    def import(key, *par)
        return true if @rev_map.has_key?(key)
        if !par[0]
            checkout(key)
            checkin(nil, key)
            return true
        end
        return false if [EMAIL PROTECTED](par[0])
        if !par[1]
            checkout(key)
            checkin(@rev_map[par[0]], key)
            return true
        end
        return false if [EMAIL PROTECTED](par[1])
        checkout(key)
        par1 = @rev_map[par[0]]
        par2 = @rev_map[par[1]]
        par1, par2 = par2, par1 if par1.count(".") > par2.count(".")
        @merge_map[checkin(par1, key)] = par2
        return true
    end
end

system("rm -f test,? test2,?")
src = RcsSrc.new("test")
src.checkin "1.1"
src.checkin "1.2",              "1.1"
src.checkin "1.3",              "1.2"
src.checkin "1.3.1.1",          "1.3"
src.checkin "1.3.1.2",          "1.3.1.1"
src.checkin "1.3.2.1",          "1.3"
src.checkin "1.3.2.2",          "1.3.2.1"
src.checkin "1.4",              "1.3"
src.checkin "1.5",              "1.4",          "1.3.1.2"
src.checkin "1.5.1.1",          "1.5"
src.checkin "1.5.1.2",          "1.5.1.1"
src.checkin "1.5.2.1",          "1.5"
src.checkin "1.5.2.2",          "1.5.2.1"
src.checkin "1.6",              "1.5",          "1.3.2.2"
src.checkin "1.7",              "1.6"
src.checkin "1.8",              "1.7",          "1.5.1.2"
src.checkin "1.8.1.1",          "1.8",          "1.5.2.2"
src.checkin "1.8.1.2",          "1.8.1.1"
src.checkin "1.9",              "1.8"
src.checkin "1.10",             "1.9",          "1.8.1.2"

rcs = RcsConv.new("test2", src)
a = src.rev_list.dup
# just for fun we do it in random order
while !a.empty?
    i = rand(a.size)
    a.delete_at(i) if rcs.import(*a[i])
end
rcs.finish

src = RcsSrc.new("test")
src.checkin "1.8.2.1",          "1.8"
src.checkin "1.8.2.2",          "1.8.2.1"
src.checkin "1.11",             "1.10"
src.checkin "1.12",             "1.11",         "1.8.2.2"
src.checkin "1.13",             "1.12"
src.checkin "1.8.2.1.1.1",      "1.8.2.1"
src.checkin "1.8.2.1.1.2",      "1.8.2.1.1.1"
src.checkin "1.11.1.1",         "1.11",         "1.8.2.1.1.2"
src.checkin "1.14",             "1.13",         "1.11.1.1"

rcs = RcsConv.new("test2", src)
a = src.rev_list.dup
while !a.empty?
    i = rand(a.size)
    a.delete_at(i) if rcs.import(*a[i])
end
rcs.finish

Reply via email to