Johan Corveleyn <jcor...@gmail.com> writes: > Another question: a shell script might not be good, because not > portable (and not fast)? Should I use python for this? Maybe the > "write line by line with a line number in a for loop" would be a lot > faster in Python? I don't know a lot of python, but it might be a good > opportunity to learn some ...
A shell script is probably fine. What I want is some data that I can use on my machine to test your patches. Here's a crude python script. With the default values it generates two 4.3MB files in less than 2 seconds on my machine. Subversion diff takes over 10 seconds to compare the files, GNU diff less than one second. Using --num-prefix=2 makes the script slight slower, since it generates more random numbers, and the time to run Subversion diff on the output goes up to 2min. GNU diff still takes a fraction of a second, and with --minimal the time is 35s. So for big improvements you probably want to concentrate on shortcut heuristics, rather than low-level optimisation. #!/usr/bin/python import random, sys from optparse import OptionParser random.seed('abc') # repeatable def write_file_contents(f, num_lines, num_prefix, num_suffix, percent_middle, unique): for i in range(num_lines): if num_prefix > 1: prefix = random.randint(1, num_prefix) else: prefix = 1 line = str(prefix) + "-common-prefix-" + str(prefix) middle = random.randint(1, 100) if middle <= percent_middle: line += " " + str(12345678 + i) + " " else: line += " " + str(9999999999 + i) + unique + " " if num_suffix > 1: suffix = random.randint(1, num_suffix) else: suffix = 1 line += str(suffix) + "-common-suffix-" + str(suffix) f.write(line + '\n') parser = OptionParser('Generate files for diff') parser.add_option('--num-lines', type=int, default=100000, dest='num_lines', help='number of lines, default 100000') parser.add_option('--num-prefix', type=int, default=1, dest='num_prefix', help='number of distinct prefixes, default 1') parser.add_option('--num-suffix', type=int, default=1, dest='num_suffix', help='number of distinct suffixes, default 1') parser.add_option('--percent-middle', type=int, default=99, dest='percent_middle', help='percentage matching middles, default 99') (options, args) = parser.parse_args(sys.argv) f1 = open('file1.txt', 'w') write_file_contents(f1, options.num_lines, options.num_prefix, options.num_suffix, options.percent_middle, 'a') f2 = open('file2.txt', 'w') write_file_contents(f2, options.num_lines, options.num_prefix, options.num_suffix, options.percent_middle, 'b') -- Philip