from:"spir"

compound regex

2009-02-09 Thread spir

Hello,

(new here)

Below an extension to standard module re. The point is to allow writing and 
testing sub-expressions individually, then nest them into a super-expression. 
More or less like using a parser generator -- but keeping regex grammar and 
power.
I used the format {sub_expr_name}: as in standard regexes {} are only used to 
express repetition number, a pair of curly braces nesting an identifier should 
not conflict.

The extension is new, very few tested. I would enjoy comments, critics, etc. I 
would like to know if you find such a feature useful. You will probably find 
the code simple enough ;-) 

Denis
--
la vida e estranya

===
# coding: utf-8

''' super_regex

Define & check sub-patterns individually,
then include them in global super-pattern.

uses format {name} for inclusion:
sub1 = Regex(...)
sub2 = Regex(...)
super_format = "...{sub1}...{sub2}..."
# final regex object:
super_regex = superRegex(super_format)
'''

from re import compile as Regex

# sub-pattern inclusion format
sub_pattern = Regex(r"{[a-zA-Z_][a-zA-Z_0-9]*}")

# sub-pattern expander
def sub_pattern_expansion(inclusion, dic=None):
name = inclusion.group()[1:-1]
### namespace dict may be specified -- else globals()
if dic is None:
dic = globals()
if name not in dic:
raise NameError("Cannot find sub-pattern '%s'." % name)
return dic[name].pattern

# super-pattern generator
def superRegex(format):
expanded_format = sub_pattern.sub(sub_pattern_expansion, format)
return Regex(expanded_format)

if __name__ == "__main__": # purely artificial example use
# pattern
time = Regex(r"\d\d:\d\d:\d\d") # hh:mm:ss
code = Regex(r"\S{5}")  # non-whitespace x 5
desc = Regex(r"[\w\s]+$")   # alphanum|space --> EOL
ref_format = "^ref: {time} #{code} --- {desc}"
ref_regex = superRegex(ref_format)
# output
print 'super pattern:\n"%s" ==>\n"%s"\n' % 
(ref_format,ref_regex.pattern)
text = "ref: 12:04:59 #%+.?% --- foo 987 bar"
result = ref_regex.match(text)
print 'text: "%s" ==>\n"%s"' %(text,result.group())
--
http://mail.python.org/mailman/listinfo/python-list

Re: [Tutor] loop performance in global namespace (python-2.6.1)

2009-03-12 Thread spir

Le Thu, 12 Mar 2009 11:13:33 -0400,
Kent Johnson  s'exprima ainsi:

> Because local name lookup is faster than global name lookup. Local
> variables are stored in an array in the stack frame and accessed by
> index. Global names are stored in a dict and accessed with dict access
> (dict.__getitem__()).

? I thought this was mainly because a name has first to be searched 
(unsuccessfully) locally before a global lookup is launched.
Also, are locals really stored in an array? How does lookup then proceed? Is it 
a kind of (name,ref) sequence?

Denis
--
la vita e estrany
--
http://mail.python.org/mailman/listinfo/python-list

compound regex

Re: [Tutor] loop performance in global namespace (python-2.6.1)

2 matches

Site Navigation

Mail list logo

Footer information