Erik schreef: > I need to recognize latin1 letters in a regexp. How is it done? The > reason is that I want to fix a program called code2html, which is > written in Perl. I have the following regular expression for Ada > identifiers: > \\b[a-zA-Z](_?[a-zA-Z0-9])*\\b > > but this is wrong because Ada identifiers include any latin1 letters, > not just a-z and A-Z. Anyone knows how?
For [a-zA-Z] you can use [[:alpha:]], see `perldoc perlre`. Why are the double backslashes there? I assume the \b's are meant as word boundaries. /\b[[:alpha:]](?:_?[[:alpha:]0-9])*\b/ The (?:...) is a non-capturing group. If you dont want to allow "a_", you can make it /\b[[:alpha:]]+(?:_?[[:alpha:]0-9]+)*\b/ It is a good idea to put an use encoding 'latin1'; in the top of your source, because the default is utf8 (which is the Perl-variant of UTF-8). That "use" further limits what [:alpha:] and [:digit:] or \d will match, so you can even change your regexp to: /\b[[:alpha:]]+(?:_?[[:alpha:]\d]+)*\b/ and then to: /\b[[:alpha:]]+(?:_?[^\W_]+)*\b/ If an ending "_" is no problem, you can make it /\b[[:alpha:]]\w*/ -- Affijn, Ruud "Gewoon is een tijger." -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>