Erik schreef:

> I need to recognize latin1 letters in a regexp. How is it done? The
> reason is that I want to fix a program called code2html, which is
> written in Perl. I have the following regular expression for Ada
> identifiers:
> \\b[a-zA-Z](_?[a-zA-Z0-9])*\\b
>
> but this is wrong because Ada identifiers include any latin1 letters,
> not just a-z and A-Z. Anyone knows how?

For [a-zA-Z] you can use [[:alpha:]], see `perldoc perlre`.

Why are the double backslashes there? I assume the \b's are meant as
word boundaries.

  /\b[[:alpha:]](?:_?[[:alpha:]0-9])*\b/

The (?:...) is a non-capturing group.


If you dont want to allow "a_", you can make it

  /\b[[:alpha:]]+(?:_?[[:alpha:]0-9]+)*\b/


It is a good idea to put an

  use encoding 'latin1';

in the top of your source, because the default is utf8 (which is the
Perl-variant of UTF-8).


That "use" further limits what [:alpha:] and [:digit:] or \d will match,
so you can even change your regexp to:

  /\b[[:alpha:]]+(?:_?[[:alpha:]\d]+)*\b/

and then to:

  /\b[[:alpha:]]+(?:_?[^\W_]+)*\b/


If an ending "_" is no problem, you can make it

  /\b[[:alpha:]]\w*/

-- 
Affijn, Ruud

"Gewoon is een tijger."



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to