Digest to filenames question, including Python code to answer (sorry)
Cheerio,
Graeme
-----------------------
Majority along the lines of
some_prefix_numbers.suffix
e.g. foo_bar_1_0001.cbf however suggestion that there are cases where
(numbers) start from 1, 2, 3 rather than 001, 002, 003 etc - do the
file names do not have a consistent length. Since both Mosflm and XDS
can't cope with these, I will not worry about the regexp :o)
Sometimes there's a shortage of underscores (perhaps with lab sources)
e.g. image0001.img. I'm also aware of prefix.numbers e.g. prefix.0001
etc. What had clobbered my existing expressions was
prefix_1.8A_001.img etc., i.e. having additional number.something
expressions in there.
Solution:
Assume that the frame number is the *last* numerical value in the file
name, allowing for cases where the file name extension includes
numbers. Turns out working on the reversed file name makes things much
easier:
# N.B. these are reversed patterns...
patterns = [r'([0-9]{2,12})\.(.*)',
r'(.*)\.([0-9]{2,12})_(.*)',
r'(.*)\.([0-9]{2,12})(.*)']
joiners = ['.', '_', '']
# Python code follows
compiled_patterns = [re.compile(pattern) for pattern in patterns]
def template_regex(filename):
'''Try a bunch of templates to work out the most sensible. N.B. assumes that
the image index will be the last digits found in the file name.'''
rfilename = filename[::-1]
global patterns, compiled_patterns
for j, cp in enumerate(compiled_patterns):
match = cp.match(rfilename)
if not match:
continue
groups = match.groups()
if len(groups) == 3:
exten = '.' + groups[0][::-1]
digits = groups[1][::-1]
prefix = groups[2][::-1] + joiners[j]
else:
exten = ''
digits = groups[0][::-1]
prefix = groups[1][::-1] + joiners[j]
template = prefix + ''.join(['#' for d in digits]) + exten
break
return template, int(digits)
def work_template_regex():
questions_answers = {
'foo_bar_001.img':'foo_bar_###.img',
'foo_bar001.img':'foo_bar###.img',
'foo_bar_1.8A_001.img':'foo_bar_1.8A_###.img',
'foo_bar.001':'foo_bar.###',
'foo_bar_001.img1000':'foo_bar_###.img1000',
'foo_bar_00001.img':'foo_bar_#####.img'
}
for filename in questions_answers:
answer = template_regex(filename)
assert answer[0] == questions_answers[filename]
On 30 April 2012 09:19, Graeme Winter <[email protected]> wrote:
> Hi Folks,
>
> Following some bug reports I spent a few minutes over the weekend wrangling
> with regular expressions to digest image file names - the dismantling of e.g.
> foo_bar_001.img to foo_bar_###.img, 1 etc. I think now that the scheme I have
> should work for everything, however what I could really do with is a proper
> list of test cases.
>
> So (foolishly he asks) please could people email me *off list* with example
> image names if they don't fall into the following structure:
>
> prefix_numbers.extension i.e. foo_bar_001.img
> prefix.numbers i.e. foo_bar.001
>
> I'll send back a digest of any responses I get. Ideally if you could indicate
> where the images come from when you do this I'd be obliged.
>
> Best wishes,
>
> Graeme