Hi,

I got distracted by different tasks, but now I have time to work on
reprepro again.

Am Dienstag, den 04.02.2014, 23:23 +0100 schrieb Bernhard R. Link:
> * Benjamin Drung <benjamin.dr...@profitbricks.com> [140203 13:15]:
> > Okay. Attached the patch for my prototype. Be aware: It's just a
> > prototype that is just able to run the commands that I wanted to test,
> > but isn't near to be ready for mainlining. The prototype implements case
> > 2 just because that was my initial idea, but now I tend to think that
> > case 1 might be easier/cleaner.
> 
> Thanks. I'll take a look this weekend.

Any feedback so far?

> > > It sounds quite slow either way. Perhaps the way to go is instead
> > > changing the data format, like having the version first (perhaps even in
> > > preparsed format to speed things up).
> >
> > Good idea, but is this function really time critical? It should be only
> > called when comparing duplicate keys (which shouldn't happen that often,
> > does it?).
> 
> It might also happen when updating some value otherwise. (And if the
> version is in some meta-data first one also does not have to
> differentiate between binaries and sources that much). One could also
> take the opportunity of a format change to allow for other possible
> future meta data (like the first added timestamp).

How flexible should the new data structure be? What meta data besides
the timestamp could be relevant?

> > How do you want to preparse the version?
> 
> if versions are compared they are split into epoch version and revision
> and version and revision are gain split into sequences of numbers and
> not-numbers. Dpkg for example first parsed all the functions and later
> only compares the already split part. if easily possible it could make
> sense to store it in a format like that (but then parsing a on-disk
> format of the split data might be just as time-consuming as just looking
> at the real data).

The version and revision can have a nearly unlimited amount of
concatenated numbers and not-numbers. You could store the parts as list
with type information. I doubt that a different on-disk format could
increase the speed. We could split the full version into epoch, version,
and revision and store them separately, but parsing these parts will be
more time consuming. My feeling is that we should stick with the full
version as string.

> > How would the data format change? Currently the database value contains
> > just the control junk. We could put the pair (version, control) as value
> > into the database. How should the pair separated? Maybe with a null
> > character?
> 
> something like that.
> 
> > Then we could just use the pointer to the value as version
> > string (the null character from the pair separation would also be used
> > to terminate the string).
> 
> Yes. That would be the "store verbatim" and non-preparsed variant.
> Alternatively one could first store a length of the string, so one can
> even faster jump to the control part.

What do you prefer? My current implementation just concatenates the
version string (including it's null character) and the control chunk. I
could expand the tuple to a tripple and add the timestamp (in which
format?) as third parameter.

While working on reprepro, I found a typo. A patch for that is attached.

-- 
Benjamin Drung
System Developer

ProfitBricks GmbH - The IaaS-Company
Greifswalder Str. 207
D - 10405 Berlin

Mail: benjamin.dr...@profitbricks.com
Fax:  +49 30 577 008 598
URL:  http://www.profitbricks.com

Sitz der Gesellschaft: Berlin.
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B.
Geschäftsführer: Andreas Gauger, Achim Weiss.
>From 02a9440ac87532adfdf63a4e510e783f310708a9 Mon Sep 17 00:00:00 2001
From: Benjamin Drung <benjamin.dr...@profitbricks.com>
Date: Tue, 20 May 2014 15:10:12 +0200
Subject: [PATCH 1/1] Fix typo connot -> cannot.

---
 database.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/database.c b/database.c
index 83eb652..377ede3 100644
--- a/database.c
+++ b/database.c
@@ -942,7 +942,7 @@ static const char databaseerror[] = "Internal error of the underlying BerkeleyDB
 /****************************************************************************
  * Stuff to handle data in tables                                           *
  ****************************************************************************
- There is nothing that connot be solved by another layer of indirection, except
+ There is nothing that cannot be solved by another layer of indirection, except
  too many levels of indirection. (Source forgotten) */
 
 struct table {
-- 
1.9.1

Reply via email to