On Jun 11, 9:22 am, Brian D <brianden...@gmail.com> wrote: > On Jun 11, 2:01 am, Lie Ryan <lie.1...@gmail.com> wrote: > > > > > 504cr...@gmail.com wrote: > > > I've encountered a problem with my RegEx learning curve -- how to > > > escape hash characters # in strings being matched, e.g.: > > > >>>> string = re.escape('123#abc456') > > >>>> match = re.match('\d+', string) > > >>>> print match > > > > <_sre.SRE_Match object at 0x00A6A800> > > >>>> print match.group() > > > > 123 > > > > The correct result should be: > > > > 123456 > > > > I've tried to escape the hash symbol in the match string without > > > result. > > > > Any ideas? Is the answer something I overlooked in my lurching Python > > > schooling? > > > As you're not being clear on what you wanted, I'm just guessing this is > > what you wanted: > > > >>> s = '123#abc456' > > >>> re.match('\d+', re.sub('#\D+', '', s)).group() > > '123456' > > >>> s = '123#this is a comment and is ignored456' > > >>> re.match('\d+', re.sub('#\D+', '', s)).group() > > > '123456' > > Sorry I wasn't more clear. I positively appreciate your reply. It > provides half of what I'm hoping to learn. The hash character is > actually a desirable hook to identify a data entity in a scraping > routine I'm developing, but not a character I want in the scrubbed > data. > > In my application, the hash makes a string of alphanumeric characters > unique from other alphanumeric strings. The strings I'm looking for > are actually manually-entered identifiers, but a real machine-created > identifier shouldn't contain that hash character. The correct pattern > should be 'A1234509', but is instead often merely entered as '#12345' > when the first character, representing an alphabet sequence for the > month, and the last two characters, representing a two-digit year, can > be assumed. Identifying the hash character in a RegEx match is a way > of trapping the string and transforming it into its correct machine- > generated form. > > I'm surprised it's been so difficult to find an example of the hash > character in a RegEx string -- for exactly this type of situation, > since it's so common in the real world that people want to put a pound > symbol in front of a number. > > Thanks!
By the way, other forms the strings can take in their manually created forms: A#12345 #1234509 Garbage in, garbage out -- I know. I wish I could tell the people entering the data how challenging it is to work with what they provide, but it is, after all, a screen-scraping routine. -- http://mail.python.org/mailman/listinfo/python-list