On Sat, 2011-02-19 at 04:53 +0100, Stefan Sperling wrote: > On Fri, Feb 18, 2011 at 09:19:56PM -0500, Greg Stein wrote: > > Can somebody provide a pointer to some of the latest speed analysis? > > Neels is on vacation this week. When he returns, I'll prod him > about running his performance tests again and sharing the results.
* neels prodded if my tests are going to be "official", I feel they need some verification / opinions. Possibly also extension so they test more than ra_local. - I run a pseudo-randomized checkout-switch-modify-merge-resolve series in ra_local only. This emphasizes the timings of lib_wc, so that additional working copy overhead causes a bad time factor. Example: The test may spit out a time factor of 2 (twice as slow) even if the network comm were commonly magnitudes slower and 'real' ra_* access would never notice such a bad factor. - On the other hand, if trunk for some reason were needing more ra_ connections than 1.6.x, we won't see that, since ra_local access timing is negligible. (Maybe it would be better to talk about added seconds of run time instead of factors.) Anyone else keen on forming an opinion on my humble tests? Let's break it down. I've got one py script that is able to run N tests for a single svn build in a specific dir depth / dir spread config, and it writes its results into a python pickle file. The results add up the times that each subcommand takes to complete, by name. E.g. all 'svn update' runs are added up. Later runs can combine and compare pickle files and print stats. A bash script calls a series of such svn-version/dir-depth/dir-spread runs and finally compares the pickle files to print overall stats. The svn commands run, roughly; these are python functions that call svn in the way their names suggest: [[[ run_cmd(['svnadmin', 'create', repos]) svn('checkout', file_url, wc) trunk = j(wc, 'trunk') create_tree(trunk, levels, spread) add(trunk) st(wc) ci(wc) up(wc) propadd_tree(trunk, 0.5) ci(wc) up(wc) st(wc) trunk_url = file_url + '/trunk' branch_url = file_url + '/branch' svn('copy', '-mm', trunk_url, branch_url) st(wc) up(wc) st(wc) svn('checkout', trunk_url, wc2) st(wc2) modify_tree(wc2, 0.5) st(wc2) ci(wc2) up(wc2) up(wc) svn('switch', branch_url, wc2) modify_tree(wc2, 0.5) st(wc2) ci(wc2) up(wc2) up(wc) modify_tree(trunk, 0.5) st(wc) ci(wc) up(wc2) up(wc) svn('merge', '--accept=postpone', trunk_url, wc2) st(wc2) svn('resolve', '--accept=mine-conflict', wc2) st(wc2) svn('resolved', '-R', wc2) st(wc2) ci(wc2) up(wc2) up(wc) svn('merge', '--accept=postpone', '--reintegrate', branch_url, trunk) st(wc) svn('resolve', '--accept=mine-conflict', wc) st(wc) svn('resolved', '-R', wc) st(wc) ci(wc) up(wc2) up(wc) svn('delete', j(wc, 'branch')) ci(wc) up(wc2) up(wc) ]]] Excerpts from the "outer layer" shell script: [[[ batch(){ levels="$1" spread="$2" N="$3" pre="${levels}x${spread}_" eval "$(pat bashrc)" pat use 1.6 ./benchmark.py run ${pre}1.6_1.runs $levels $spread $N ./benchmark.py run ${pre}1.6_2.runs $levels $spread $N pat use 1.7 ./benchmark.py run ${pre}1.7_1.runs $levels $spread $N ./benchmark.py run ${pre}1.7_2.runs $levels $spread $N <combine stats> <print stats> ]]] This is a bash function that switches to svn 1.6 (using my humble helper 'pat' [1] to modify the PATH environment), runs the whole test N*2 times, then switches to svn 1.7 and again runs the thing 2N times. It runs each build twice so that it can also compare two identical runs, for us to verify whether those timing factors are sufficiently near 1.0. Then that whole thing is run in three configurations (a: 4x4, b: 100x1, c: 1x100); meaning how deep the deepest dir tree is ("levels") and how many child dirs each dir has ("spread"), and that N times. We can very easily modify these few numbers to choose test run size from tiny to "infinite". [[[ N=3 # run a: levels 4, spread 4 (4x4) al=4 as=4 # run b: levels 100, spread 1 (100x1) bl=100 bs=1 # run c... cl=1 cs=100 batch $al $as $N batch $bl $bs $N batch $cl $cs $N <combine stats> <print overall stats> ]]] I'd be delighted if anyone else wants to hack this stuff -- with or w/o me. ~Neels [1] I wrote pat for myself to take care of repetitive svn devel tasks. I also use it to maintain several different svn builds alongside each other, so it's rather large and unreviewed. In this test, pat is only used to modify the PATH variable towards the 1.6 or the 1.7 build, respectively. http://hofmeyr.de/code/pat/
#!/usr/bin/env python """ usage: benchmark.py run <run_file> <levels> <spread> [N] benchmark.py show <run_file> benchmark.py compare <run_file1> <run_file2> Test data is written to run_file. If a run_file exists, data is added to it. <levels> is the number of directory levels to create <spread> is the number of child trees spreading off each dir level If <N> is provided, the run is repeated N times. """ import os, sys, time import tempfile from datetime import datetime, timedelta from subprocess import Popen, PIPE, call import random import shutil import cPickle VERBOSE = False DEFAULT_TIMINGS_PATH = './benchmark_py_last_run.py-pickle' timings = None def run_cmd(cmd, stdin=None, shell=False): if shell: printable_cmd = 'CMD: ' + cmd else: printable_cmd = 'CMD: ' + ' '.join(cmd) if VERBOSE: print printable_cmd if stdin: stdin_arg = PIPE else: stdin_arg = None p = Popen(cmd, stdin=stdin_arg, stdout=PIPE, stderr=PIPE, shell=shell) stdout,stderr = p.communicate(input=stdin) if VERBOSE: if (stdout): print "STDOUT: [[[\n%s]]]" % ''.join(stdout) if (stderr): print "STDERR: [[[\n%s]]]" % ''.join(stderr) return stdout,stderr def timedelta_to_seconds(td): return ( float(td.seconds) + float(td.microseconds) / (10**6) + td.days * 24 * 60 * 60 ) class Timings: def __init__(self): self.timings = {} self.current_name = None self.tic_at = None def tic(self, name): self.toc() self.current_name = name self.tic_at = datetime.now() def toc(self): if self.current_name and self.tic_at: toc_at = datetime.now() self.submit_timing(self.current_name, timedelta_to_seconds(toc_at - self.tic_at)) self.current_name = None self.tic_at = None def submit_timing(self, name, seconds): times = self.timings.get(name) if not times: times = [] self.timings[name] = times times.append(seconds) def summary(self): s = ['count min max avg operation (unit is seconds)'] for name, timings in self.timings.items(): if not name or not timings: continue s.append('%5d %7.3f %7.3f %7.3f %s' % ( len(timings), min(timings), max(timings), reduce(lambda x,y: x + y, timings) / len(timings), name)) return '\n'.join(s) def compare_to(self, other): s = [' min max avg operation (unit is factor between runs)'] def do_div(a, b): if b: return float(a) / float(b) else: return 0.0 for name, timings in self.timings.items(): other_timings = other.timings.get(name) if not other_timings: continue s.append('%7.3f %7.3f %7.3f %s' % ( do_div(min(timings), min(other_timings)), do_div(max(timings), max(other_timings)), do_div(reduce(lambda x,y: x + y, timings) / len(timings), reduce(lambda x,y: x + y, other_timings) / len(other_timings)), name)) return '\n'.join(s) def add(self, other): for name, other_times in other.timings.items(): my_times = self.timings.get(name) if not my_times: my_times = [] self.timings[name] = my_times my_times.extend(other_times) j = os.path.join _create_count = 0 def next_name(prefix): global _create_count _create_count += 1 return '_'.join((prefix, str(_create_count))) def create_tree(in_dir, levels, spread=5): try: os.mkdir(in_dir) except: pass for i in range(spread): # files fn = j(in_dir, next_name('file')) f = open(fn, 'w') f.write('This is %s\n' % fn) f.close() # dirs if (levels > 1): dn = j(in_dir, next_name('dir')) create_tree(dn, levels - 1, spread) def svn(*args): global timings name = args[0] cmd = ['svn'] cmd.extend(args) if VERBOSE: print 'svn cmd: ' + ' '.join(cmd) stdin = None if stdin: stdin_arg = PIPE else: stdin_arg = None timings.tic(name) try: p = Popen(cmd, stdin=stdin_arg, stdout=PIPE, stderr=PIPE, shell=False) stdout,stderr = p.communicate(input=stdin) finally: timings.toc() if VERBOSE: if (stdout): print "STDOUT: [[[\n%s]]]" % ''.join(stdout) if (stderr): print "STDERR: [[[\n%s]]]" % ''.join(stderr) return stdout,stderr def add(*args): return svn('add', *args) def ci(*args): return svn('commit', '-mm', *args) def up(*args): return svn('update', *args) def st(*args): return svn('status', *args) _chars = [chr(x) for x in range(ord('a'), ord('z') +1)] def randstr(len=8): return ''.join( [random.choice(_chars) for i in range(len)] ) def _copy(path): dest = next_name(path + '_copied') svn('copy', path, dest) def _move(path): dest = path + '_moved' svn('move', path, dest) def _propmod(path): so, se = svn('proplist', path) propnames = [line.strip() for line in so.strip().split('\n')[1:]] # modify? if len(propnames): svn('ps', propnames[len(propnames) / 2], randstr(), path) # del? if len(propnames) > 1: svn('propdel', propnames[len(propnames) / 2], path) def _propadd(path): # set a new one. svn('propset', randstr(), randstr(), path) def _mod(path): if os.path.isdir(path): return _propmod(path) f = open(path, 'a') f.write('\n%s\n' % randstr()) f.close() def _add(path): if os.path.isfile(path): return _mod(path) if random.choice((True, False)): # create a dir svn('mkdir', j(path, next_name('new_dir'))) else: # create a file new_path = j(path, next_name('new_file')) f = open(new_path, 'w') f.write(randstr()) f.close() svn('add', new_path) def _del(path): svn('delete', path) _mod_funcs = (_mod, _add, _propmod, _propadd, )#_copy,) # _move, _del) def modify_tree(in_dir, fraction): child_names = os.listdir(in_dir) for child_name in child_names: if child_name[0] == '.': continue if random.random() < fraction: path = j(in_dir, child_name) random.choice(_mod_funcs)(path) for child_name in child_names: if child_name[0] == '.': continue path = j(in_dir, child_name) if os.path.isdir(path): modify_tree(path, fraction) def propadd_tree(in_dir, fraction): for child_name in os.listdir(in_dir): if child_name[0] == '.': continue path = j(in_dir, child_name) if random.random() < fraction: _propadd(path) if os.path.isdir(path): propadd_tree(path, fraction) def run(levels, spread): global timings # ensure identical modifications for every run of this script random.seed(0) base = tempfile.mkdtemp() try: repos = j(base, 'repos') wc = j(base, 'wc') wc2 = j(base, 'wc2') file_url = 'file://%s' % repos so, se = run_cmd(['which', 'svn']) if not so: print "Can't find svn." exit(1) print '\nRunning svn benchmark in', base print 'dir levels: %s; new files and dirs per leaf: %s' % (levels, spread) so, se = svn('--version') print ', '.join( so.split('\n')[:2] ) started = datetime.now() try: run_cmd(['svnadmin', 'create', repos]) svn('checkout', file_url, wc) trunk = j(wc, 'trunk') create_tree(trunk, levels, spread) add(trunk) st(wc) ci(wc) up(wc) propadd_tree(trunk, 0.5) ci(wc) up(wc) st(wc) trunk_url = file_url + '/trunk' branch_url = file_url + '/branch' svn('copy', '-mm', trunk_url, branch_url) st(wc) up(wc) st(wc) svn('checkout', trunk_url, wc2) st(wc2) modify_tree(wc2, 0.5) st(wc2) ci(wc2) up(wc2) up(wc) svn('switch', branch_url, wc2) modify_tree(wc2, 0.5) st(wc2) ci(wc2) up(wc2) up(wc) modify_tree(trunk, 0.5) st(wc) ci(wc) up(wc2) up(wc) svn('merge', '--accept=postpone', trunk_url, wc2) st(wc2) svn('resolve', '--accept=mine-conflict', wc2) st(wc2) svn('resolved', '-R', wc2) st(wc2) ci(wc2) up(wc2) up(wc) svn('merge', '--accept=postpone', '--reintegrate', branch_url, trunk) st(wc) svn('resolve', '--accept=mine-conflict', wc) st(wc) svn('resolved', '-R', wc) st(wc) ci(wc) up(wc2) up(wc) svn('delete', j(wc, 'branch')) ci(wc) up(wc2) up(wc) finally: stopped = datetime.now() print '\nDone with svn benchmark in', (stopped - started) timings.submit_timing('TOTAL RUN', timedelta_to_seconds(stopped - started)) # rename ps to prop mod if timings.timings.get('ps'): has = timings.timings.get('prop mod') if not has: has = [] timings.timings['prop mod'] = has has.extend( timings.timings['ps'] ) del timings.timings['ps'] print timings.summary() finally: shutil.rmtree(base) def read_from_file(file_path): f = open(file_path, 'rb') try: instance = cPickle.load(f) finally: f.close() return instance def write_to_file(file_path, instance): f = open(file_path, 'wb') cPickle.dump(instance, f) f.close() def usage(): print __doc__ if __name__ == '__main__': if len(sys.argv) > 1 and 'compare'.startswith(sys.argv[1]): if len(sys.argv) < 4: usage() exit(1) p1,p2 = sys.argv[2:4] t1 = read_from_file(p1) t2 = read_from_file(p2) print p1 print t1.summary() print '---' print p2 print t2.summary() print '---' print p2, '/', p1 print t2.compare_to(t1) elif len(sys.argv) > 1 and 'combine'.startswith(sys.argv[1]): if len(sys.argv) < 5: usage() exit(1) p1,p2,dest = sys.argv[2:5] t1 = read_from_file(p1) t2 = read_from_file(p2) t1.add(t2) print t1.summary() write_to_file(dest, t1) elif len(sys.argv) > 1 and 'run'.startswith(sys.argv[1]): try: timings_path = sys.argv[2] levels = int(sys.argv[3]) spread = int(sys.argv[4]) if len(sys.argv) > 5: N = int(sys.argv[5]) else: N = 1 except: usage() raise print '\nHi, going to run a Subversion benchmark (series)...' if os.path.isfile(timings_path): print 'Going to add results to existing file', timings_path timings = read_from_file(timings_path) else: print 'Going to write results to new file', timings_path timings = Timings() for i in range(N): run(levels, spread) write_to_file(timings_path, timings) elif len(sys.argv) > 1 and 'show'.startswith(sys.argv[1]): if len(sys.argv) < 2: usage() exit(1) for timings_path in sys.argv[2:]: timings = read_from_file(timings_path) print '---\n%s' % timings_path print timings.summary() else: usage()
run
Description: application/shellscript