> You mean enhancing the script to split across arbitrarily long prefixes? > That would be great.
I've now a script that does something like that: ~/test$ find /data/vjoost/gnu/gcc_trunk/gcc/gcc/testsuite/gfortran.dg/ -maxdepth 1 -type f -printf "%f\n" | ./generate_patterns.py 500 foo All 3947 files matched the pattern ^[0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+ without exception Final 12 patterns and match count: (^[j-z_#+-][p-z_#+-][0-9A-Za-i][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-][0-9A-Za-o][0-9A-Za-m]([.][0-9A-Za-z_#+-]+)+) matching 469 files (^[0-9A-Za-i][0-9A-Za-n][0-9A-Za-n][0-9A-Za-o][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^([.][0-9A-Za-z_#+-]+)+) matching 433 files (^[j-z_#+-][0-9A-Za-o][n-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][0-9A-Za-n][o-z_#+-]([.][0-9A-Za-z_#+-]+)+) matching 400 files (^[j-z_#+-][p-z_#+-][j-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i]([.][0-9A-Za-z_#+-]+)+) matching 371 files (^[0-9A-Za-i][o-z_#+-][s-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][0-9A-Za-n][0-9A-Za-n]([.][0-9A-Za-z_#+-]+)+) matching 323 files (^[0-9A-Za-i][o-z_#+-][0-9A-Za-r][o-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-][p-z_#+-]([.][0-9A-Za-z_#+-]+)+) matching 314 files (^[0-9A-Za-i][o-z_#+-][0-9A-Za-r][0-9A-Za-n][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-][0-9A-Za-o]([.][0-9A-Za-z_#+-]+)+) matching 314 files (^[j-z_#+-][0-9A-Za-o][0-9A-Za-m][0-9A-Za-i][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[j-z_#+-]([.][0-9A-Za-z_#+-]+)+) matching 272 files (^[0-9A-Za-i][0-9A-Za-n][0-9A-Za-n][p-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][o-z_#+-]([.][0-9A-Za-z_#+-]+)+) matching 270 files (^[0-9A-Za-i][0-9A-Za-n][o-z_#+-][0-9A-Za-l][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][0-9A-Za-n]([.][0-9A-Za-z_#+-]+)+) matching 265 files (^[0-9A-Za-i][0-9A-Za-n][o-z_#+-][m-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+|^[0-9A-Za-i][o-z_#+-][0-9A-Za-r]([.][0-9A-Za-z_#+-]+)+) matching 260 files ^[j-z_#+-][0-9A-Za-o][0-9A-Za-m][j-z_#+-][0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+ matching 256 files It is a set of patterns that will match any file of the form '^[0-9A-Za-z_#+-]*([.][0-9A-Za-z_#+-]+)+', but such that it splits a list of input files roughly in equal chunks (e.g. between 500 and 500/2 in this example), even if files have long overlapping prefixes. However, I'm unsure if/how this can be integrated, i.e. what precisely is allowed for testsuite filenames, and if this regexp format can be employed in gcc makefiles / tcl / expect harness, suggestions/help appreciated.