Dnia 2014-09-14, o godz. 21:30:36
Tim Harder <radher...@gentoo.org> napisał(a):

> On 2014-09-14 10:46, Michał Górny wrote:
> > Dnia 2014-09-14, o godz. 15:40:06
> > Davide Pesavento <p...@gentoo.org> napisał(a):
> > > How long does the md5-cache regeneration process take? Are you sure it
> > > will be able to keep up with the rate of pushes to the repo during
> > > "peak hours"? If not, maybe we could use a time-based thing similar to
> > > the current cvs->rsync synchronization.
> > 
> > This strongly depends on how much data is there to update. A few
> > ebuilds are quite fast, eclass change isn't ;). I was thinking of
> > something along the lines of, in pseudo-code speaking:
> > 
> >   systemctl restart cache-regen
> > 
> > That is, we start the regen on every update. If it finishes in time, it
> > commits the new metadata. If another update occurs during regen, we
> > just restart it to let it catch the new data.
> > 
> > Of course, if we can't spare the resources to do intermediate updates,
> > we may as well switch to cron-based update method.
> 
> I don't see per push metadata regen working entirely well in this case
> if this is the only way we're generating the metadata cache for users to
> sync. It's easy to imagine a plausible situation where a widely used
> eclass change is made followed by commits less than a minute apart (or
> shorter than however long it would take for metadata regen to occur) for
> at least 30 minutes (rsync refresh period for most user-facing mirrors)
> during a time of high activity.

For a metadata recheck (that is, egencache run with no changes):

a. cold cache ext4:

  real    3m54.321s
  user    0m44.413s
  sys     0m13.497s

b. warm cache ext4:

  real    0m40.672s
  user    0m35.087s
  sys     0m 4.687s

I will try to re-run that on btrfs or reiserfs to get a more meaningful
numbers.

Now, that results back up your claims. However, if we can get that to
<10s, I doubt we would have a major issue. My idea works like this:

1. first update is pushed,
1a. egencache starts rechecking and updating cache,
2. second update is pushed,
2a. previous egencache is terminated,
2b. egencache starts rechecking and updating cache,
2c. egencache finishes in time and commits.

The point is, nothing gets committed to the user-reachable location
before egencache finishes. And it goes quasi-incrementally, so if
another update happens before egencache finished, it only does
the 'slow' regen on changed metadata.

I will come back with more results soon.

-- 
Best regards,
Michał Górny

Attachment: signature.asc
Description: PGP signature

Reply via email to