Digest to filenames question, including Python code to answer (sorry) Cheerio,
Graeme ----------------------- Majority along the lines of some_prefix_numbers.suffix e.g. foo_bar_1_0001.cbf however suggestion that there are cases where (numbers) start from 1, 2, 3 rather than 001, 002, 003 etc - do the file names do not have a consistent length. Since both Mosflm and XDS can't cope with these, I will not worry about the regexp :o) Sometimes there's a shortage of underscores (perhaps with lab sources) e.g. image0001.img. I'm also aware of prefix.numbers e.g. prefix.0001 etc. What had clobbered my existing expressions was prefix_1.8A_001.img etc., i.e. having additional number.something expressions in there. Solution: Assume that the frame number is the *last* numerical value in the file name, allowing for cases where the file name extension includes numbers. Turns out working on the reversed file name makes things much easier: # N.B. these are reversed patterns... patterns = [r'([0-9]{2,12})\.(.*)', r'(.*)\.([0-9]{2,12})_(.*)', r'(.*)\.([0-9]{2,12})(.*)'] joiners = ['.', '_', ''] # Python code follows compiled_patterns = [re.compile(pattern) for pattern in patterns] def template_regex(filename): '''Try a bunch of templates to work out the most sensible. N.B. assumes that the image index will be the last digits found in the file name.''' rfilename = filename[::-1] global patterns, compiled_patterns for j, cp in enumerate(compiled_patterns): match = cp.match(rfilename) if not match: continue groups = match.groups() if len(groups) == 3: exten = '.' + groups[0][::-1] digits = groups[1][::-1] prefix = groups[2][::-1] + joiners[j] else: exten = '' digits = groups[0][::-1] prefix = groups[1][::-1] + joiners[j] template = prefix + ''.join(['#' for d in digits]) + exten break return template, int(digits) def work_template_regex(): questions_answers = { 'foo_bar_001.img':'foo_bar_###.img', 'foo_bar001.img':'foo_bar###.img', 'foo_bar_1.8A_001.img':'foo_bar_1.8A_###.img', 'foo_bar.001':'foo_bar.###', 'foo_bar_001.img1000':'foo_bar_###.img1000', 'foo_bar_00001.img':'foo_bar_#####.img' } for filename in questions_answers: answer = template_regex(filename) assert answer[0] == questions_answers[filename] On 30 April 2012 09:19, Graeme Winter <graeme.win...@gmail.com> wrote: > Hi Folks, > > Following some bug reports I spent a few minutes over the weekend wrangling > with regular expressions to digest image file names - the dismantling of e.g. > foo_bar_001.img to foo_bar_###.img, 1 etc. I think now that the scheme I have > should work for everything, however what I could really do with is a proper > list of test cases. > > So (foolishly he asks) please could people email me *off list* with example > image names if they don't fall into the following structure: > > prefix_numbers.extension i.e. foo_bar_001.img > prefix.numbers i.e. foo_bar.001 > > I'll send back a digest of any responses I get. Ideally if you could indicate > where the images come from when you do this I'd be obliged. > > Best wishes, > > Graeme