> I have a file with a lot of the following ocurrences: > > denmark.handa.1-10 > denmark.handa.1-12344 > denmark.handa.1-4 > denmark.handa.1-56
Each on its own line? Scattered throughout the text? With other content that needs to be un-changed? With other stuff on the same line? > denmark.handa.1-10_1 > denmark.handa.1-12344_1 > denmark.handa.1-4_1 > denmark.handa.1-56_1 > > so basically I add "_1" at the end of each ocurrence. > > I thought about using sed, but as each "root" is different I have no > clue how to go through this. How are the roots different? Do they all begin with "denmark.handa."? Or can the be found by a pattern of "stuff period stuff period number dash number"? A couple sed solutions, since you considered them first: sed '/denmark\.handa/s/$/_1/' sed 's/denmark\.handa\.\d+-\d+/&_1/g' sed 's/[a-z]+\.[a-z]+\.\d+-\d+/&_1/g' Or are you just looking for "number dash number" and want to suffix the "_1"? sed 's/\d+-\d+/&_1/g' Most of the sed versions translate pretty readily into Python regexps in the .sub() call. import re r = re.compile(r'[a-z]+\.[a-z]+\.\d+-\d+') out = file('out.txt', 'w') for line in file('in.txt'): out.write(r.sub(r'\g<0>_1', line)) out.close() Tweak the regexps accordingly. -tkc -- http://mail.python.org/mailman/listinfo/python-list