Hello all, I'm writing a module that takes user input as strings and (effectively) translates them to function calls with arguments and keyword arguments.to pass a list I use a sort of 'list constructor' - so the syntax looks a bit like :
checkname(arg1, "arg 2", 'arg 3', keywarg="value", keywarg2='value2', default=list("val1", 'val2')) Worst case anyway :-) I can handle this with regular expressions but they are becoming truly horrible. I wonder if anyone has any suggestions on optimising them. I could hand write a parser - which would be more code, probably slower - but less error prone. (Regualr expressions are subject to obscure errors - especially the ones I create). The trouble is that I have to pull out the separate arguments, then pull apart the keyword arguments and the list keyword arguments. This makes it a 'multi-pass' task - and I wondered if there was a better way to do it. As I use ``findall`` to pull out all the arguments - so I also have to use a *very similar* regex to first check that there are no errors (as findall will just miss out badly formed parts of the input). My current approach is : pull out the checkname and *all* the arguments using : '(.+?)\((.*)\)' I then have : _paramstring = r''' (?: ( (?: [a-zA-Z_][a-zA-Z0-9_]*\s*=\s*list\( (?: \s* (?: (?:".*?")| # double quotes (?:'.*?')| # single quotes (?:[^'",\s\)][^,\)]*?) # unquoted ) \s*,\s* )* (?: (?:".*?")| # double quotes (?:'.*?')| # single quotes (?:[^'",\s\)][^,\)]*?) # unquoted )? # last one \) )| (?: (?:".*?")| # double quotes (?:'.*?')| # single quotes (?:[^'",\s=][^,=]*?)| # unquoted (?: # keyword argument [a-zA-Z_][a-zA-Z0-9_]*\s*=\s* (?: (?:".*?")| # double quotes (?:'.*?')| # single quotes (?:[^'",\s=][^,=]*?) # unquoted ) ) ) ) (?: (?:\s*,\s*)|(?:\s*$) # comma ) ) ''' I can use ``_paramstring`` with findall to pull out all the arguments. However - as I said, I first need to check that the entrie input is well formed. So I do a match against : _matchstring = '^%s*' % _paramstring Having done a match I can use findall and ``_paramstring`` to pull out *all* the parameters as a list - and go through each one checking if it is a single argument, keyword argument or list constructor. For keyword arguments and lists constructors I use another regular expression (the appropriate part of _paramstring basically) to pull out the values from that. Now this approach works - but it's hardly "optimal" (for some value of optimal). I wondered if anyone could suggest a better approach. All the best, Fuzzyman http://www.voidspace.org.uk/python/index.shtml -- http://mail.python.org/mailman/listinfo/python-list