On Jul 15, 2019, at 18:44, Nam Nguyen <[email protected]> wrote:
>
> I have implemented a tiny (~200 SLOCs) package at
> https://gitlab.com/nam-nguyen/parser_compynator that demonstrates something
> like this is possible. There are several examples for you to have a feel of
> it, as well as some early benchmark numbers to consider. This is far smaller
> than any of the Python parsing libraries I have looked at, yet more universal
> than many of them. I hope that it would convert the skeptics ;).
For at least some of your use cases, I don’t think it’s a problem that it’s 70x
slower than the custom parsers you’d be replacing. How often do you need to
parse a million URLs in your inner loop? Also, if the function composition is
really the performance hurdle, can you optimize that away relatively simply,
just by building an explicit tree (expression-template style) and walking the
tree in a __call__ method, rather than building an implicit tree of nested
calls? (And that could be optimized further if needed, e.g. by turning the tree
walk into a simple virtual machine where all of the fundamental operations are
inlined into the loop, and maybe even accelerating that with C code.)
But I do think it’s a problem that there seems to be no way to usefully
indicate failure to the caller, and I’m not sure that could be fixed as easily.
Invalid inputs in your readme examples don’t fail, they successfully return an
empty set. There also doesn’t seem to be any way to trigger a hard fail rather
than a backtrack. So I’m not sure how a real urlparse replacement could do the
things the current one does, like raising a ValueError on
https://abc.d[ef.ghi/ complaining that the netloc looks like an invalid IPv6
address. (Maybe you could def a function that raises a ValueError and attach it
as a where somewhere in the parser tree? But even if that works, wouldn’t you
get a meaningless exception that doesn’t have any information about where in
the source text or where in the parse tree it came from or why it was raised,
and, as your readme says, a stack trace full of garbage?) Can you add failure
handling without breaking the “~200LOC and easy to read” feature of the
library, and without breaking the “easy to read once you grok parser
combinators” feature of the parsers built with it?
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/VCJEYLMDVEGNED4ODYXV236JEKXWWNLM/
Code of Conduct: http://python.org/psf/codeofconduct/