Top posting: FTR, the following are no longer needed as checkouts:
Bills cclas iclas member_apps These were the largest checkouts; the total is now 209M according to du -h. I don't know if it's possible (necessary?) to prune any further. It might be possible to split the remaining checkouts into essential and optional groups. On Tue, 26 Nov 2019 at 17:18, Sam Ruby <ru...@intertwingly.net> wrote: > On Tue, Nov 26, 2019 at 12:06 PM sebb <seb...@gmail.com> wrote: > > > > My takeaway is that using 'svn ls' without recursion is OK. > > +1 > > > I've just done a check on member_apps and I get: > > > > $ time svn up member_apps/ > > Updating 'member_apps': > > At revision 93890. > > > > real 0m1.573s > > user 0m0.055s > > sys 0m0.027s > > > > $ time svn ls https://svn.apache.org/repos/private/documents/member_apps > | > > wc -l > > 882 > > > > real 0m0.943s > > user 0m0.053s > > sys 0m0.026s > > > > i.e. svn update is ~ 50% slower than svn ls > > Something non-linear happens somewhere. svn update is actually > (marginally) faster than svn ls on iclas: > > % time svn up iclas/ > Updating 'iclas': > At revision 93890. > svn up iclas/ 0.21s user 0.15s system 20% cpu 1.742 total > > % time svn ls https://svn.apache.org/repos/private/documents/iclas | wc -l > 11038 > svn ls https://svn.apache.org/repos/private/documents/iclas 0.08s > user 0.02s system 4% cpu 2.088 total > wc -l 0.00s user 0.01s system 0% cpu 2.088 total > > > However, when using recursion, svn ls is quite a bit slower: > > > > $ time svn ls https://svn.apache.org/repos/private/documents/member_apps > -R > > | wc -l > > 931 > > > > real 0m3.161s > > user 0m0.062s > > sys 0m0.031s > > Again, the size of the directory makes a difference. Here's what I > see for iclas: > > % time svn ls https://svn.apache.org/repos/private/documents/iclas -R | > wc -l > 12816 > svn ls https://svn.apache.org/repos/private/documents/iclas -R 0.22s > user 0.07s system 0% cpu 1:04.50 total > wc -l 0.00s user 0.01s system 0% cpu 1:04.50 total > > - Sam Ruby > > P.S. Just now discovering that time is built into zsh. Cool. > > > On Tue, 26 Nov 2019 at 00:36, Greg Stein <gst...@gmail.com> wrote: > > > > > One quick answer from earlier in the thread: > > > > > > > > > > What's most concerning is not just the elapsed time, but that > this > > > > > > > likely means that the call is expensive on the server, which > may > > > > > > > impact other users. > > > > > > > > > > > > > > > > > > > > I see this as premature optimisation; we don't know whether svn > list > > > is > > > > > > more expensive than svn update overall. > > > > > > There may be other reasons why list is slower. Nor do we know if > the > > > > > > request will impact other users. > > > > > > "svn list" must generate the listing of 12k+ files, recursively. That > takes > > > some time to process and deliver over the network. I believe it is > likely a > > > PROPFIND which introduces some overheads on both ends (XML > construction on > > > server, parsing on client; sheer network size, too). > > > > > > "svn up" generates a diff report. "get these 3 files", and that's > easily > > > extracted from the difference between revision-working-copy and > > > revision-server (plus some other concerns). > > > > > > So yes: an update is *way* faster, all around. > > > > > > > > > On Mon, Nov 25, 2019 at 5:15 PM Sam Ruby <ru...@intertwingly.net> > wrote: > > > > > > > Actually copy Greg this time. > > > > > > > > ---------- Forwarded message --------- > > > > From: Sam Ruby <ru...@intertwingly.net> > > > > Date: Mon, Nov 25, 2019, 4:15 PM > > > > Subject: Re: Does Whimsy need to have a copy of Bills? > > > > To: Whimsy dev <dev@whimsical.apache.org> > > > > > > > > > > > > adding Greg to email. > > > > > > > > Recap: a change is being proposed whereas whimsy will do the > > > > equivalent of the following command after every icla is processed: > > > > > > > > svn ls https://svn.apache.org/repos/private/documents/iclas --depth > > > > infinity > > > > > > > > > > Do you really need to use depth=infinity? The directory name is likely > > > sufficient information. ? > > > > > > depth=immediates (the default for svn ls) is going to be just a few > > > seconds. > > > > > > Currently this appears to take around forty to sixty elapsed seconds to > > > > process. > > > > > > > > Questions for Greg: > > > > 1) Does this proposed workload present an unreasonable load on the > svn > > > > server? > > > > > > > > > > This should be fine as long as you don't put a "-v" switch in there. > > > That'll take 10-15 minutes as it reconstructs all the files on the > server > > > and measures their size. > > > > > > The listing is just a single thread on the server, a fetch of the > directory > > > names, and then assembly/delivery of that result. There really > shouldn't be > > > any contention with other users, or heavy use of the CPU. > > > > > > > > > > 2) Are there any faster alternatives which get us a list of names > but no > > > > data > > > > > > > > > > So I experimented with a hack. I did a "full checkout" of the iclas > > > directory, but stopped it after a single file was checked out. This > left a > > > partial checkout. Subversion will tell the server "I have $these. what > am I > > > missing?" when you run "svn status -u". You'll get a listing of the 12k > > > missing files. Takes about 7 seconds or so. > > > > > > Specifically: > > > $ # kill after reading/printing one line (the first file checked out) > > > $ svn co https://svn.apache.org/repos/private/documents/iclas | > python -c > > > 'import signal,os,sys ; print sys.stdin.readline() ; > > > os.killpg(os.getpgrp(), signal.SIGHUP)' > > > A iclas/jiwei-guo > > > > > > svn: E200015: Caught signal > > > svn: E200042: Additional errors: > > > svn: E200015: Caught signal > > > Hangup > > > $ # get a recursive listing via status > > > $ time svn st -u iclas | wc -l > > > 12818 > > > > > > real 0m6.782s > > > user 0m3.461s > > > sys 0m2.358s > > > > > > The status output should be easy to parse (it is designed as a > fixed-width > > > set of codes, then filename). > > > > > > Even if you do a full/normal checkout, note that "svn status -u" may be > > > useful. Depending on whether you need the content, or just the names, > you > > > may want to migrate to the status-based approach. > > > > > > Oh! Just realized a better way, to avoid the hack/partial checkout. > Even > > > better, just check out the "iclas" directory for the revision it was > > > created. It is an empty directory in that revision (a sibling directory > > > received a bunch of Member applications, but those won't be in this > > > checkout). > > > > > > $ svn -r 9696 co https://svn.apache.org/repos/private/documents/iclas > > > Checked out revision 9696. > > > > > > The "svn status" works the same against the above (empty) working copy. > > > Also at about 8 seconds. > > > > > > So. In summary, use "svn status" against a HEAD checkout, or against > r9696 > > > for those who don't want the gigabytes of ICLA forms. > > > > > > A similar technique can be used for any of the other Whimsy data > > > directories, of course. To find when a particular directory was > created: > > > > > > $ # run an "svn log" in reverse order, and limit/stop at the first log > > > entry. > > > $ svn log --stop-on-copy --limit 1 -r0:HEAD > > > https://svn.apache.org/repos/private/documents/iclas > > > > ------------------------------------------------------------------------ > > > r9696 | jim | 2006-11-17 10:35:28 -0600 (Fri, 17 Nov 2006) | 3 lines > > > > > > Start loading of scanned docs. Start with creating > > > the dirs and upload the member apps > > > > > > > ------------------------------------------------------------------------ > > > > > > Note that sometimes a directory is created with content, in that > revision. > > > The "iclas" directory just happened to be created empty. But I imagine > most > > > directories will be much smaller at their creation, than they are > today. > > > (iow, don't expect them to always be empty at creation) > > > > > > Hope that helps, > > > -g > > > > > > > > > > > > > - Sam Ruby > > > > > > > > On Mon, Nov 25, 2019 at 3:44 PM sebb <seb...@gmail.com> wrote: > > > > > > > > > > On Mon, 25 Nov 2019 at 19:48, Sam Ruby <ru...@intertwingly.net> > wrote: > > > > > > > > > > > On Mon, Nov 25, 2019 at 2:29 PM sebb <seb...@gmail.com> wrote: > > > > > > > > > > > > > > On Mon, 25 Nov 2019 at 17:59, Sam Ruby <ru...@intertwingly.net > > > > > > wrote: > > > > > > > > > > > > > > > On Sun, Nov 24, 2019 at 1:17 PM sebb <seb...@gmail.com> > wrote: > > > > > > > > > > > > > > > > > > I was thinking of using svn ls to create a listing file > which > > > > would > > > > > > be > > > > > > > > > cached locally. > > > > > > > > > > > > > > > > Unfortunately, some observations (numbers below are > approximate): > > > > > > > > > > > > > > > > svn up on a populated iclas directory: one second > > > > > > > > > > > > > > > > svn ls on iclas: two seconds, but only returns depth one > > > > > > > > > > > > > > > > svn ls on iclas --depth infinity: one minute > > > > > > > > > > > > > > > > What's most concerning is not just the elapsed time, but that > > > this > > > > > > > > likely means that the call is expensive on the server, which > may > > > > > > > > impact other users. > > > > > > > > > > > > > > > > > > > > > > > I see this as premature optimisation; we don't know whether svn > > > list > > > > is > > > > > > > more expensive than svn update overall. > > > > > > > There may be other reasons why list is slower. Nor do we know > if > > > the > > > > > > > request will impact other users. > > > > > > > > > > > > > > Besides, if the code checks SVN info first, it will only need > to > > > > fetch > > > > > > the > > > > > > > updated listing when there has been a change. > > > > > > > Those directories are not busy. > > > > > > > > > > > > > > Furthermore, every time a new test installation is set up, > there is > > > > > > > definitely a large load on the server and network. > > > > > > > This network load in particular must be orders of magnitude > greater > > > > than > > > > > > > for a listing. > > > > > > > > > > > > Sorry for not being clear. I would be very concerned if > whimsy-vm4 > > > > > > were invoking svn ls --depth infinity every 10 minutes as the > current > > > > > > cron job does. > > > > > > > > > > > > > > > That would not be the case. > > > > > > > > > > The job would use 'svn info' on the remote repo and only fetch the > > > > listing > > > > > if necessary. > > > > > > > > > > For the repos in question, changes are rare. > > > > > > > > > > > > > > > > Before any such change is deployed, it would be wise > > > > > > for us to check both with the infrastructure team and the > subversion > > > > > > team (Greg likely can help with both). > > > > > > > > > > > > I'm less concerned about the overhead on development machines, > and > > > > > > there I suspect that most users would be happy with a svn > checkout > > > > > > --depth empty. > > > > > > > > > > > > > > > > > This would not allow testing of the functions that need to know the > > > list > > > > of > > > > > file names. > > > > > > > > > > > > > > > > - Sam Ruby > > > > > > > > > > > > > >