Re: a simple unicode question

rurpy Thu, 22 Oct 2009 13:12:37 -0700

On 10/22/2009 03:23 AM, Gabriel Genellina wrote:
> En Wed, 21 Oct 2009 15:14:32 -0300, <[email protected]> escribió:
>
>> On Oct 21, 4:59 am, Bruno Desthuilliers <bruno.
>> [email protected]> wrote:
>>> beSTEfar a écrit :
>>> (snip)
>>>  > When parsing strings, use Regular Expressions.
>>>
>>> And now you have _two_ problems <g>
>>>
>>> For some simple parsing problems, Python's string methods are powerful
>>> enough to make REs overkill. And for any complex enough parsing (any
>>> recursive construct for example - think XML, HTML, any programming
>>> language etc), REs are just NOT enough by themselves - you need a full
>>> blown parser.
>>
>> But keep in mind that many XML, HTML, etc parsing problems
>> are restricted to a subset where you know the nesting depth
>> is limited (often to 0 or 1), and for that large set of
>> problems, RE's *are* enough.
>
> I don't think so. Nesting isn't the only problem. RE's cannot handle
> comments, by example. And you must support unquoted attributes, single and
> double quotes, any attribute ordering, empty tags, arbitrary whitespace...
> If you don't, you are not reading XML (or HTML), only a specific file
> format that resembles XML but actually isn't.


OK, then let me rephrase my point as: in the real world it is often
not necessary to parse XML in it's full generality; parsing, as you
put it, "a specific file format that resembles XML" is all that is
really needed.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: a simple unicode question

Reply via email to