Re: Does Whimsy need to have a copy of Bills?

sebb Sat, 30 Nov 2019 11:03:43 -0800

Top posting:

FTR, the following are no longer needed as checkouts:


Bills
cclas
iclas
member_apps

These were the largest checkouts; the total is now 209M according to du -h.

I don't know if it's possible (necessary?) to prune any further.
It might be possible to split the remaining checkouts into essential and
optional groups.


On Tue, 26 Nov 2019 at 17:18, Sam Ruby <ru...@intertwingly.net> wrote:

> On Tue, Nov 26, 2019 at 12:06 PM sebb <seb...@gmail.com> wrote:
> >
> > My takeaway is that using 'svn ls' without recursion is OK.
>
> +1
>
> > I've just done a check on member_apps and I get:
> >
> > $ time svn up member_apps/
> > Updating 'member_apps':
> > At revision 93890.
> >
> > real 0m1.573s
> > user 0m0.055s
> > sys 0m0.027s
> >
> > $ time svn ls https://svn.apache.org/repos/private/documents/member_apps
> |
> > wc -l
> >      882
> >
> > real 0m0.943s
> > user 0m0.053s
> > sys 0m0.026s
> >
> > i.e. svn update is ~ 50% slower than svn ls
>
> Something non-linear happens somewhere.  svn update is actually
> (marginally) faster than svn ls on iclas:
>
> % time svn up iclas/
> Updating 'iclas':
> At revision 93890.
> svn up iclas/  0.21s user 0.15s system 20% cpu 1.742 total
>
> % time svn ls https://svn.apache.org/repos/private/documents/iclas | wc -l
>    11038
> svn ls https://svn.apache.org/repos/private/documents/iclas  0.08s
> user 0.02s system 4% cpu 2.088 total
> wc -l  0.00s user 0.01s system 0% cpu 2.088 total
>
> > However, when using recursion, svn ls is quite a bit slower:
> >
> > $ time svn ls https://svn.apache.org/repos/private/documents/member_apps
> -R
> > | wc -l
> >      931
> >
> > real 0m3.161s
> > user 0m0.062s
> > sys 0m0.031s
>
> Again, the size of the directory makes a difference.  Here's what I
> see for iclas:
>
> % time svn ls https://svn.apache.org/repos/private/documents/iclas -R |
> wc -l
>    12816
> svn ls https://svn.apache.org/repos/private/documents/iclas -R  0.22s
> user 0.07s system 0% cpu 1:04.50 total
> wc -l  0.00s user 0.01s system 0% cpu 1:04.50 total
>
> - Sam Ruby
>
> P.S.  Just now discovering that time is built into zsh.  Cool.
>
> > On Tue, 26 Nov 2019 at 00:36, Greg Stein <gst...@gmail.com> wrote:
> >
> > > One quick answer from earlier in the thread:
> > >
> > > > > > > What's most concerning is not just the elapsed time, but that
> this
> > > > > > > likely means that the call is expensive on the server, which
> may
> > > > > > > impact other users.
> > > > > > >
> > > > > > >
> > > > > > I see this as premature optimisation; we don't know whether svn
> list
> > > is
> > > > > > more expensive than svn update overall.
> > > > > > There may be other reasons why list is slower. Nor do we know if
> the
> > > > > > request will impact other users.
> > >
> > > "svn list" must generate the listing of 12k+ files, recursively. That
> takes
> > > some time to process and deliver over the network. I believe it is
> likely a
> > > PROPFIND which introduces some overheads on both ends (XML
> construction on
> > > server, parsing on client; sheer network size, too).
> > >
> > > "svn up" generates a diff report. "get these 3 files", and that's
> easily
> > > extracted from the difference between revision-working-copy and
> > > revision-server (plus some other concerns).
> > >
> > > So yes: an update is *way* faster, all around.
> > >
> > >
> > > On Mon, Nov 25, 2019 at 5:15 PM Sam Ruby <ru...@intertwingly.net>
> wrote:
> > >
> > > > Actually copy Greg this time.
> > > >
> > > > ---------- Forwarded message ---------
> > > > From: Sam Ruby <ru...@intertwingly.net>
> > > > Date: Mon, Nov 25, 2019, 4:15 PM
> > > > Subject: Re: Does Whimsy need to have a copy of Bills?
> > > > To: Whimsy dev <dev@whimsical.apache.org>
> > > >
> > > >
> > > > adding Greg to email.
> > > >
> > > > Recap: a change is being proposed whereas whimsy will do the
> > > > equivalent of the following command after every icla is processed:
> > > >
> > > > svn ls https://svn.apache.org/repos/private/documents/iclas --depth
> > > > infinity
> > > >
> > >
> > > Do you really need to use depth=infinity? The directory name is likely
> > > sufficient information. ?
> > >
> > > depth=immediates (the default for svn ls) is going to be just a few
> > > seconds.
> > >
> > > Currently this appears to take around forty to sixty elapsed seconds to
> > > > process.
> > > >
> > > > Questions for Greg:
> > > > 1) Does this proposed workload present an unreasonable load on the
> svn
> > > > server?
> > > >
> > >
> > > This should be fine as long as you don't put a "-v" switch in there.
> > > That'll take 10-15 minutes as it reconstructs all the files on the
> server
> > > and measures their size.
> > >
> > > The listing is just a single thread on the server, a fetch of the
> directory
> > > names, and then assembly/delivery of that result. There really
> shouldn't be
> > > any contention with other users, or heavy use of the CPU.
> > >
> > >
> > > > 2) Are there any faster alternatives which get us a list of names
> but no
> > > > data
> > > >
> > >
> > > So I experimented with a hack. I did a "full checkout" of the iclas
> > > directory, but stopped it after a single file was checked out. This
> left a
> > > partial checkout. Subversion will tell the server "I have $these. what
> am I
> > > missing?" when you run "svn status -u". You'll get a listing of the 12k
> > > missing files. Takes about 7 seconds or so.
> > >
> > > Specifically:
> > > $ # kill after reading/printing one line (the first file checked out)
> > > $ svn co https://svn.apache.org/repos/private/documents/iclas |
> python -c
> > > 'import signal,os,sys ; print sys.stdin.readline() ;
> > > os.killpg(os.getpgrp(), signal.SIGHUP)'
> > > A    iclas/jiwei-guo
> > >
> > > svn: E200015: Caught signal
> > > svn: E200042: Additional errors:
> > > svn: E200015: Caught signal
> > > Hangup
> > > $ # get a recursive listing via status
> > > $ time svn st -u iclas | wc -l
> > > 12818
> > >
> > > real    0m6.782s
> > > user    0m3.461s
> > > sys     0m2.358s
> > >
> > > The status output should be easy to parse (it is designed as a
> fixed-width
> > > set of codes, then filename).
> > >
> > > Even if you do a full/normal checkout, note that "svn status -u" may be
> > > useful. Depending on whether you need the content, or just the names,
> you
> > > may want to migrate to the status-based approach.
> > >
> > > Oh! Just realized a better way, to avoid the hack/partial checkout.
> Even
> > > better, just check out the "iclas" directory for the revision it was
> > > created. It is an empty directory in that revision (a sibling directory
> > > received a bunch of Member applications, but those won't be in this
> > > checkout).
> > >
> > > $ svn -r 9696 co https://svn.apache.org/repos/private/documents/iclas
> > > Checked out revision 9696.
> > >
> > > The "svn status" works the same against the above (empty) working copy.
> > > Also at about 8 seconds.
> > >
> > > So. In summary, use "svn status" against a HEAD checkout, or against
> r9696
> > > for those who don't want the gigabytes of ICLA forms.
> > >
> > > A similar technique can be used for any of the other Whimsy data
> > > directories, of course. To find when a particular directory was
> created:
> > >
> > > $ # run an "svn log" in reverse order, and limit/stop at the first log
> > > entry.
> > > $ svn log --stop-on-copy --limit 1 -r0:HEAD
> > > https://svn.apache.org/repos/private/documents/iclas
> > >
> ------------------------------------------------------------------------
> > > r9696 | jim | 2006-11-17 10:35:28 -0600 (Fri, 17 Nov 2006) | 3 lines
> > >
> > > Start loading of scanned docs. Start with creating
> > > the dirs and upload the member apps
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > > Note that sometimes a directory is created with content, in that
> revision.
> > > The "iclas" directory just happened to be created empty. But I imagine
> most
> > > directories will be much smaller at their creation, than they are
> today.
> > > (iow, don't expect them to always be empty at creation)
> > >
> > > Hope that helps,
> > > -g
> > >
> > >
> > >
> > > > - Sam Ruby
> > > >
> > > > On Mon, Nov 25, 2019 at 3:44 PM sebb <seb...@gmail.com> wrote:
> > > > >
> > > > > On Mon, 25 Nov 2019 at 19:48, Sam Ruby <ru...@intertwingly.net>
> wrote:
> > > > >
> > > > > > On Mon, Nov 25, 2019 at 2:29 PM sebb <seb...@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, 25 Nov 2019 at 17:59, Sam Ruby <ru...@intertwingly.net
> >
> > > > wrote:
> > > > > > >
> > > > > > > > On Sun, Nov 24, 2019 at 1:17 PM sebb <seb...@gmail.com>
> wrote:
> > > > > > > > >
> > > > > > > > > I was thinking of using svn ls to create a listing file
> which
> > > > would
> > > > > > be
> > > > > > > > > cached locally.
> > > > > > > >
> > > > > > > > Unfortunately, some observations (numbers below are
> approximate):
> > > > > > > >
> > > > > > > > svn up on a populated iclas directory: one second
> > > > > > > >
> > > > > > > > svn ls on iclas: two seconds, but only returns depth one
> > > > > > > >
> > > > > > > > svn ls on iclas --depth infinity: one minute
> > > > > > > >
> > > > > > > > What's most concerning is not just the elapsed time, but that
> > > this
> > > > > > > > likely means that the call is expensive on the server, which
> may
> > > > > > > > impact other users.
> > > > > > > >
> > > > > > > >
> > > > > > > I see this as premature optimisation; we don't know whether svn
> > > list
> > > > is
> > > > > > > more expensive than svn update overall.
> > > > > > > There may be other reasons why list is slower. Nor do we know
> if
> > > the
> > > > > > > request will impact other users.
> > > > > > >
> > > > > > > Besides, if the code checks SVN info first, it will only need
> to
> > > > fetch
> > > > > > the
> > > > > > > updated listing when there has been a change.
> > > > > > > Those directories are not busy.
> > > > > > >
> > > > > > > Furthermore, every time a new test installation is set up,
> there is
> > > > > > > definitely a large load on the server and network.
> > > > > > > This network load in particular must be orders of magnitude
> greater
> > > > than
> > > > > > > for a listing.
> > > > > >
> > > > > > Sorry for not being clear.  I would be very concerned if
> whimsy-vm4
> > > > > > were invoking svn ls --depth infinity every 10 minutes as the
> current
> > > > > > cron job does.
> > > > >
> > > > >
> > > > > That would not be the case.
> > > > >
> > > > > The job would use 'svn info' on the remote repo and only fetch the
> > > > listing
> > > > > if necessary.
> > > > >
> > > > > For the repos in question, changes are rare.
> > > > >
> > > > >
> > > > > >   Before any such change is deployed, it would be wise
> > > > > > for us to check both with the infrastructure team and the
> subversion
> > > > > > team (Greg likely can help with both).
> > > > > >
> > > > > > I'm less concerned about the overhead on development machines,
> and
> > > > > > there I suspect that most users would be happy with a svn
> checkout
> > > > > > --depth empty.
> > > > > >
> > > > > >
> > > > > This would not allow testing of the functions that need to know the
> > > list
> > > > of
> > > > > file names.
> > > > >
> > > > >
> > > > > > - Sam Ruby
> > > > > >
> > > >
> > >
>

Re: Does Whimsy need to have a copy of Bills?

Reply via email to