Re: Refactoring in a large code base

2016-01-22 Thread Marko Rauhamaa
Ben Finney :

> The author points out there are times when a code base is large and
> complex enough that refactoring puts the programmer in a state of not
> knowing whether they're making progress, because until the whole
> refactoring is complete the errors just cascade and it's hard to tell
> which ones are relevant.

I've been there. I think the root problem is to have a code base that's
so large and complex.

It *could* be avoided if the engineering director only cared.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Rustom Mody
On Friday, January 22, 2016 at 1:59:15 PM UTC+5:30, Marko Rauhamaa wrote:
> Ben Finney :
> 
> > The author points out there are times when a code base is large and
> > complex enough that refactoring puts the programmer in a state of not
> > knowing whether they're making progress, because until the whole
> > refactoring is complete the errors just cascade and it's hard to tell
> > which ones are relevant.
> 
> I've been there. I think the root problem is to have a code base that's
> so large and complex.

Bizarre comment... Are you saying large and complex code-bases should non-exist?

> 
> It *could* be avoided if the engineering director only cared.

Some problems are trivially solvable... for those who have the knowhow
Some problems are inherently hard but easily detectable as such... once again 
for those who have the knowhow
And some are literally (and ironically trivially) unsolvable

The CS-trinity: 'normal' problems, problems in NP, undecidable problems is a 
classic example of this.
However applying that in real-world practice can be highly non-trivial,
requiring from specialized knowledge to intelligence to genius.

IOW "engineering director does not care" is likely true but also a gross 
oversimplification
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Marko Rauhamaa
Rustom Mody :

> On Friday, January 22, 2016 at 1:59:15 PM UTC+5:30, Marko Rauhamaa wrote:
>> I've been there. I think the root problem is to have a code base
>> that's so large and complex.
>
> Bizarre comment... Are you saying large and complex code-bases should
> non-exist?

Why, yes, I am.

>> It *could* be avoided if the engineering director only cared.
>
> Some problems are trivially solvable... for those who have the knowhow
> Some problems are inherently hard but easily detectable as such...
> once again for those who have the knowhow And some are literally (and
> ironically trivially) unsolvable

The knowhow, vision and skill is apparently very rare. On the product
management side, we have the famous case of Steve Jobs, who simply told
the engineers to go back to the drawing boards when he didn't like the
user experience. Most others would have simply surrendered to the
mediocre designs and shipped the product.

We need similar code sanity management. Developers are given much too
much power to mess up the source code. That's why "legacy" is considered
a four-letter word among developers.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Deprecation warnings for the future async and await keywords

2016-01-22 Thread Marco Buttu
I enabled the deprecation warnings in Python 3.5.1 and Python 3.6 dev, 
and I noticed that assigning to async or await does not issue any 
deprecation warning:


$ python -Wd -c "import sys; print(sys.version); async = 33"
3.5.1 (default, Jan 21 2016, 19:59:28)
[GCC 4.8.4]
$ python -Wd -c "import sys; print(sys.version); async = 33"
3.6.0a0 (default:4b434a4770a9, Jan 12 2016, 13:01:29)
[GCC 4.8.4]


Is it normal?
--
Marco Buttu
--
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Chris Angelico
On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa  wrote:
> The knowhow, vision and skill is apparently very rare. On the product
> management side, we have the famous case of Steve Jobs, who simply told
> the engineers to go back to the drawing boards when he didn't like the
> user experience. Most others would have simply surrendered to the
> mediocre designs and shipped the product.
>
> We need similar code sanity management. Developers are given much too
> much power to mess up the source code. That's why "legacy" is considered
> a four-letter word among developers.

So what do you do with a huge program? Do you send it back to the
developers and say "Do this is less lines of code"?

CPython is a large and complex program. How do you propose doing it "right"?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


PyDev 4.5.3 Released

2016-01-22 Thread Fabio Zadrozny
Release Highlights:
---

* Debugger

* Fixed issue in set next statement (#PyDev 651).

* pydevd.settrace was stopping inside the debugger and not in user code
(#PyDev 648).

* subprocess.Popen could crash when running non python executable (#PyDev
650).

* PyUnit view

* The last pinned test suite appears as the first entry in the history.

* More information is shown on the test run history.

* A string representation of the test suite can be saved in the clipboard
(last item in the test run history).

* Indexing: fixed issue where the indexing and code-analysis could race
with each other and one could become corrupt.


What is PyDev?
---

PyDev is an open-source Python IDE on top of Eclipse for Python, Jython and
IronPython development.

It comes with goodies such as code completion, syntax highlighting, syntax
analysis, code analysis, refactor, debug, interactive console, etc.

Details on PyDev: http://pydev.org
Details on its development: http://pydev.blogspot.com


What is LiClipse?
---

LiClipse is a PyDev standalone with goodies such as support for Multiple
cursors, theming, TextMate bundles and a number of other languages such as
Django Templates, Jinja2, Kivy Language, Mako Templates, Html, Javascript,
etc.

It's also a commercial counterpart which helps supporting the development
of PyDev.

Details on LiClipse: http://www.liclipse.com/



Cheers,

--
Fabio Zadrozny
--
Software Developer

LiClipse
http://www.liclipse.com

PyDev - Python Development Environment for Eclipse
http://pydev.org
http://pydev.blogspot.com

PyVmMonitor - Python Profiler
http://www.pyvmmonitor.com/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Charles T. Smith
On Fri, 22 Jan 2016 12:19:50 +0200, Marko Rauhamaa wrote:

> We need similar code sanity management. Developers are given much too
> much power to mess up the source code. That's why "legacy" is considered
> a four-letter word among developers.

When I started in this business, in the mid-70s, there was the prospect
of my working under a "programmer-analyst" - there was, then, a whole
hierarchy of programmers.  I resisted that bitterly and was "lucky"
enough to be at the forefront of changes - I would be able to avoid that
until the concept was dead.

Now, 40 years later ... it seems like a good idea to me ... but more
dead than it's ever been and getting deader all the time.

Part of the problem is that the whole profession is dead - now the
people doing the programming are the application experts, and just
programmers are considered at the level that we used to consider
"operators" were at   :)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Marko Rauhamaa
Chris Angelico :

> On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa  wrote:
> So what do you do with a huge program?

Modularize. Treat each module as a separate product with its own release
cycle, documentation, apis, ownership etc.

What is a reasonable size of a module? It is something you would
consider replacing with a new implementation with a moderate effort
(say, in a single quarter).

> CPython is a large and complex program. How do you propose doing it
> "right"?

I don't know CPython specifically to give solid recommendations, but I
would imagine the core language engine should be in a repository
separate from the standard library, and most standard library modules
should be in their respective repositories and have their individual
internal release cycles.

A CPython release would then weave the package together from the
components that were previously (internally) released.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Rustom Mody
On Friday, January 22, 2016 at 4:49:19 PM UTC+5:30, Chris Angelico wrote:
> On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa  wrote:
> > The knowhow, vision and skill is apparently very rare. On the product
> > management side, we have the famous case of Steve Jobs, who simply told
> > the engineers to go back to the drawing boards when he didn't like the
> > user experience. Most others would have simply surrendered to the
> > mediocre designs and shipped the product.
> >
> > We need similar code sanity management. Developers are given much too
> > much power to mess up the source code. That's why "legacy" is considered
> > a four-letter word among developers.
> 
> So what do you do with a huge program? Do you send it back to the
> developers and say "Do this is less lines of code"?
> 
> CPython is a large and complex program. How do you propose doing it "right"?

Put thus 'generistically' this is a rhetorical question and makes Marko look 
like
he's making a really foolish point

Specifically, what little Ive seen under the CPython hood looked distinctly 
improvable. egs.

1. My suggestion to have the docs re. generator-function vs generator-objects
cleaned up had no takers
2. My students trying to work inside the lexer made a mess because the extant 
lexer is a mess.
I.e. while python(3) *claims* to accept Unicode input, the actual lexer is
an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to unicode

These are just specific examples that I am familiar with
Chris' general point still stands, viz take the large and complex program that 
is cpython
and clean up these messinesses: You will still have a large and complex program
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Marko Rauhamaa
Rustom Mody :

> These are just specific examples that I am familiar with Chris'
> general point still stands, viz take the large and complex program
> that is cpython and clean up these messinesses: You will still have a
> large and complex program

No, as long as the ugly parts are compartmentalized, you have a better
chance at refactoring them -- or replacing them altogether.

Modularization is an obvious, but under-practiced, method of managing
complexity.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Chris Angelico
On Fri, Jan 22, 2016 at 10:54 PM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa  wrote:
>> So what do you do with a huge program?
>
> Modularize. Treat each module as a separate product with its own release
> cycle, documentation, apis, ownership etc.
>
> What is a reasonable size of a module? It is something you would
> consider replacing with a new implementation with a moderate effort
> (say, in a single quarter).
>
>> CPython is a large and complex program. How do you propose doing it
>> "right"?
>
> I don't know CPython specifically to give solid recommendations, but I
> would imagine the core language engine should be in a repository
> separate from the standard library, and most standard library modules
> should be in their respective repositories and have their individual
> internal release cycles.
>
> A CPython release would then weave the package together from the
> components that were previously (internally) released.

Okay. So let's suppose we strip out huge slabs of the standard library
and make an absolutely minimal "base library", with an "extended
library" that can run on its own separate release cycle. (This has
already been discussed; the biggest problems with the idea aren't
technical, but logistical - not just for the Python devs but for
everyone who has to get approval for software upgrades.) Let's suppose
the base library consists of just the modules necessary for a basic
invocation:

rosuav@sikorsky:~$ python3 -c 'import sys; print(sorted(sys.modules.keys()))'
['__main__', '_codecs', '_collections_abc', '_frozen_importlib',
'_frozen_importlib_external', '_imp', '_io', '_signal',
'_sitebuiltins', '_stat', '_sysconfigdata', '_thread', '_warnings',
'_weakref', '_weakrefset', 'abc', 'builtins', 'codecs', 'encodings',
'encodings.aliases', 'encodings.latin_1', 'encodings.utf_8', 'errno',
'genericpath', 'io', 'marshal', 'os', 'os.path', 'posix', 'posixpath',
'site', 'stat', 'sys', 'sysconfig', 'zipimport']

Alright. Can you rewrite all of those modules in three months? Not
even digging into the language itself, just the base library. This is
the bare minimum to get a viable Python execution environment going
(you might be able to cut it down a bit, but not much), so it can't be
modularized into separate projects.

And then there's the language itself. The cpython/Python directory has
58 .c files, many of which are closely tied to each other. The
cpython/Objects directory has another 39, representing specific object
types (bytes, tuple, range, method) that are implemented in C. And
cpython/Parser has 17 more just to handle the language parser. Edits
often affect multiple files and must be kept in sync. How would you
modularize that out? Which part would you spin off as a separate
project with its own release cycle? The garbage collector? The string
object? The peephole optimizer? The import machinery? Each of these is
already too big to rewrite in three months, plus they're fairly
tightly linked to all the other modules. All that code represents the
accumulation of hundreds of thousands of fixes to prevent tens of
millions of bugs (some of which will be visible on bugs.python.org,
but most would have been found and prevented during early testing);
throwing the code away means throwing all that away.

http://www.joelonsoftware.com/articles/fog69.html

I don't agree with everything Joel says, but seriously, do not waste
your time with a full rewrite - even in theory. And I can say this
from hard experience on both sides. I have an active project for a MUD
server, which was originally deployed as a byte-oriented service (it
took ASCII-compatible bytes from clients and sent those same octets
out to other clients). When I decided that the server should work with
Unicode text internally (expecting and transmitting UTF-8), I kept on
coming across stupid problems where the code had been written with
faulty assumptions, and I had to keep on fixing those. Would it have
been better to throw the code away and start over? Well, let me tell
you, it would certainly have made the Unicode handling a lot easier,
so if you're looking at starting your own project, make sure you learn
from my hassles and bake in Unicode support from the start! But that
would have meant throwing away all the bugfixes for all the bugs that
I'd noticed across the years, such as:

1) On login, typing "quit" when prompted for a user name or password
would log you out. The "passwd" (change password) command had to also
prevent you from *setting* your password to "quit", because that would
effectively lock your account against login.

2) Some clients send backspace as 08; others send FF; some send 08 20
08. Cope with them all.

3) If a bug prevents the admin account from working, there needs to be
a way to diagnose and fix that code using shell access to the back-end
server, without needing the actual admin account.

Etcetera, etcetera, etcetera. There's no way to "rewrite but k

Re: Refactoring in a large code base

2016-01-22 Thread Thomas Mellman
On Fri, 22 Jan 2016 04:04:44 -0800, Rustom Mody wrote:

> These are just specific examples that I am familiar with Chris' general
> point still stands, viz take the large and complex program that is
> cpython and clean up these messinesses: You will still have a large and
> complex program

I'm not really sure what the point is we're working on...  let me
propose these:

- unix principle is good: keep things simple, limited in scope.
  Then leverage that.

- there will always be complexity, but if the complexity is
  modularized, it's controlled.

  In particular, the complexity of a program should represent the
  complexity of the problem.  I call that "structural complexity".
  To be avoided, corrected, is "superficial complexity",
  where the complexity of a system is squished into a single (or
  reduced number of) planes.  Like vomiting a program onto a desk.

- "Advice" that the program needs to be refracted is generally not helpful.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Chris Angelico
On Fri, Jan 22, 2016 at 11:04 PM, Rustom Mody  wrote:
> On Friday, January 22, 2016 at 4:49:19 PM UTC+5:30, Chris Angelico wrote:
>> On Fri, Jan 22, 2016 at 9:19 PM, Marko Rauhamaa  wrote:
>> > The knowhow, vision and skill is apparently very rare. On the product
>> > management side, we have the famous case of Steve Jobs, who simply told
>> > the engineers to go back to the drawing boards when he didn't like the
>> > user experience. Most others would have simply surrendered to the
>> > mediocre designs and shipped the product.
>> >
>> > We need similar code sanity management. Developers are given much too
>> > much power to mess up the source code. That's why "legacy" is considered
>> > a four-letter word among developers.
>>
>> So what do you do with a huge program? Do you send it back to the
>> developers and say "Do this is less lines of code"?
>>
>> CPython is a large and complex program. How do you propose doing it "right"?
>
> Put thus 'generistically' this is a rhetorical question and makes Marko look 
> like
> he's making a really foolish point
>
> Specifically, what little Ive seen under the CPython hood looked distinctly 
> improvable. egs.
>
> 1. My suggestion to have the docs re. generator-function vs generator-objects
> cleaned up had no takers
> 2. My students trying to work inside the lexer made a mess because the extant 
> lexer is a mess.
> I.e. while python(3) *claims* to accept Unicode input, the actual lexer is
> an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to 
> unicode
>
> These are just specific examples that I am familiar with

Yes, there are some parts of CPython that can be improved. That's true
of every large project (it's said that every program has at least one
bug and could be shortened by at least one instruction, from which it
can be deduced that every program can be reduced to a single
instruction that doesn't work).

Regarding lexers specifically, I have never seen any full-size
language parser that I've wanted to tinker with. They're always highly
optimized pieces of code, dealing with innumerable edge and corner
cases, and exploring them is always like dipping my toe into something
that's either ice-cold water or highly caustic acid, and I can't tell
which.

> Chris' general point still stands, viz take the large and complex program 
> that is cpython
> and clean up these messinesses: You will still have a large and complex 
> program

Right. You could definitely spin off *some* of CPython into a separate
project (flip through the standard library - quite a few of those
modules, if proposed for stdlib inclusion today, would be denied
"better on PyPI"), but my point isn't that it can't be improved, but
that there's an irreducible complexity to it that exceeds the "rewrite
in a quarter" mark by a huge margin.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Marko Rauhamaa
Chris Angelico :

> Alright. Can you rewrite all of those modules in three months?

The point is not to rewrite modules except as a fallback for a
hopelessly badly written module.

> And then there's the language itself. The cpython/Python directory has
> 58 .c files, many of which are closely tied to each other.

Putting modularization in place after the spaghetti has been created is
very hard. It can be done, though, and good things come out of the
effort.

> I don't agree with everything Joel says, but seriously, do not waste
> your time with a full rewrite - even in theory. And I can say this
> from hard experience on both sides.

As long as you are happy with your code base, you don't have to change
everything just for an abstract principle.

However, as a matter of rule, older code bases have been bloated till
they can barely be maintained. That's when the management starts to
listen to new ideas. Better late than never.

> Would it have been better to throw the code away and start over?

Again, modularization doesn't entail rewriting -- it simply makes
localized rewriting a practical option for desperate situations.

> So how can you rewrite *any* large project in three months?

Let go of the rewriting already.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Chris Angelico
On Sat, Jan 23, 2016 at 12:00 AM, Marko Rauhamaa  wrote:
> However, as a matter of rule, older code bases have been bloated till
> they can barely be maintained. That's when the management starts to
> listen to new ideas. Better late than never.

Okay. Start persuading "management" (presumably the PSU) that CPython
needs to be more modular, with different release cycles for different
components. Your first step is to figure out the boundaries between
those components. Get started.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Rustom Mody
On Friday, January 22, 2016 at 6:05:02 PM UTC+5:30, Chris Angelico wrote:
> On Fri, Jan 22, 2016 at 11:04 PM, Rustom Mody  wrote:
> > 2. My students trying to work inside the lexer made a mess because the 
> > extant lexer is a mess.
> > I.e. while python(3) *claims* to accept Unicode input, the actual lexer is
> > an ASCII lexer special-cased for unicode rather than pre-lexing utf8 to 
> > unicode
> >
> > These are just specific examples that I am familiar with
> 
> 
> Regarding lexers specifically, I have never seen any full-size
> language parser that I've wanted to tinker with. They're always highly
> optimized pieces of code, dealing with innumerable edge and corner
> cases, and exploring them is always like dipping my toe into something
> that's either ice-cold water or highly caustic acid, and I can't tell
> which.
> 

You just gave a graphic vivid description...
of the same thing Marko is describing: ;-) viz.
A full-size language parser is something that you - an experienced developer -
make a point of avoiding.
So then the question comes down to this: Is this the order of nature?
Or is it man-made disorder?
Jury's out on that one for lexers/parsers specifically.
For arbitrary code in general, the problem that it may be arbitrarily and 
unboundedly 
complex/complicated is the oldest problem in computer science: the halting 
problem.

IOW anyone who thinks that *arbitrary* complexity can *always* be tamed either
has a visa to utopia or needs to re-evaluate (or get) a CS degree
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Marko Rauhamaa
Rustom Mody :

> IOW anyone who thinks that *arbitrary* complexity can *always* be
> tamed either has a visa to utopia or needs to re-evaluate (or get) a
> CS degree

Not all complexity can be tamed, but what you can't tame you shouldn't
release, either.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Marko Rauhamaa
Chris Angelico :

> Okay. Start persuading "management" (presumably the PSU) that CPython
> needs to be more modular, with different release cycles for different
> components. Your first step is to figure out the boundaries between
> those components. Get started.

Gladly, I don't need to do anything about CPython. This particular
strawman was erected by you:

> CPython is a large and complex program. How do you propose doing it
> "right"?

It's up to you to take my proposal or ignore it.

However, I have had my share of windmills to battle; I'm happy to say
I've managed to bring down a few of them!


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Rustom Mody
On Friday, January 22, 2016 at 7:13:49 PM UTC+5:30, Marko Rauhamaa wrote:
> Rustom Mody :
> 
> > IOW anyone who thinks that *arbitrary* complexity can *always* be
> > tamed either has a visa to utopia or needs to re-evaluate (or get) a
> > CS degree
> 
> Not all complexity can be tamed, but what you can't tame you shouldn't
> release, either.

And how do you propose to legislate that?
If you leave it to the wetware (untamed!) boxes atop our shoulders will not
two developers have wildly different complexity thresholds?

And as soon as you suggest an objective (∴ algorithmic) solution to detecting 
complexity you have landed with the halting problem (or more precisely Rice 
theorem)

tl;dr The HP is amazingly deceptive and you just got tripped by it

-- 
https://mail.python.org/mailman/listinfo/python-list


Question about how to do something in BeautifulSoup?

2016-01-22 Thread inhahe
I hope this is an appropriate mailing list for BeautifulSoup questions,
it's been a long time since I've used python-list and I don't remember if
third-party modules are on topic. I did try posting to the BeautifulSoup
mailing list on Google groups, but I've waited a day or two and my message
hasn't been approved yet.

Say I have the following HTML (I hope this shows up as plain text here
rather than formatting):

"Is
today the day?"

And I want to extract the "Is today the day?" part. There are other places
in the document with  and , but this is the only place that
uses color #00, so I want to extract anything that's within a color
#00 style, even if it's nested multiple levels deep within that.

- Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's defined
as #00
- Sometimes the  is within the  and sometimes the  is
within the .
- There may be other discrepancies I haven't noticed yet

How can I do this in BeautifulSoup (or is this better done in lxml.html)?
Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Chris Angelico
On Sat, Jan 23, 2016 at 12:30 AM, Rustom Mody  wrote:
> You just gave a graphic vivid description...
> of the same thing Marko is describing: ;-) viz.
> A full-size language parser is something that you - an experienced developer -
> make a point of avoiding.

It's worth noting that "experienced developer" covers a huge range of
skills. There are quite a few other areas that I do not tinker with
(crypto, CPU-level optimizations, and such), not because they're
impossible to understand, but because *I* have not the skill to
understand and improve them. This does mean they're complicated
(they're beyond the "one weekend of tinkering" barrier that any
serious geek should be able to invest), but I'm sure there are
language nerds out there who are so familiar with the grammar of
 that they'll pick up CPython's grammar and make
a change with confidence that it'll do what they expect.

> So then the question comes down to this: Is this the order of nature?
> Or is it man-made disorder?
> Jury's out on that one for lexers/parsers specifically.

Lexers/parsers are as complicated as the grammars they parse. A lexer
for a simple structured text file can be pretty easy to implement; for
instance, JSON is pretty straight-forward, with only a handful of
cases (insignificant whitespace, three keywords, two recursive
structures that start with specific characters ('{' and '['), strings
(which start with '"'), and numbers (which start with a digit or a
hyphen)), so a parser need only look for those few possibilities and
it knows exactly what else to fetch up. I could probably write a JSON
parser in a fairly short space of time, and wouldn't be scared of
digging into the internals of someone else's. It's when the grammar
adds complexities to deal with the real-world issues of full size
programming languages that it becomes hairier. The CPython grammar is
only ~150 lines of fairly readable directives, but the parser that
implements it is ~3500 lines of C code. Pike merges the two into a
YACC file of nearly 5000 lines of highly optimized code (it has
different grammar paths for things a human would consider the same, in
order to produce distinct code). That's where I'm ubercautious.

> For arbitrary code in general, the problem that it may be arbitrarily and 
> unboundedly
> complex/complicated is the oldest problem in computer science: the halting 
> problem.
>
> IOW anyone who thinks that *arbitrary* complexity can *always* be tamed either
> has a visa to utopia or needs to re-evaluate (or get) a CS degree

Exactly.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Refactoring in a large code base

2016-01-22 Thread Chris Angelico
On Sat, Jan 23, 2016 at 12:48 AM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> Okay. Start persuading "management" (presumably the PSU) that CPython
>> needs to be more modular, with different release cycles for different
>> components. Your first step is to figure out the boundaries between
>> those components. Get started.
>
> Gladly, I don't need to do anything about CPython. This particular
> strawman was erected by you:
>
>> CPython is a large and complex program. How do you propose doing it
>> "right"?
>
> It's up to you to take my proposal or ignore it.
>
> However, I have had my share of windmills to battle; I'm happy to say
> I've managed to bring down a few of them!

Okay. You have fun releasing nothing that you aren't confident can fit
within your definitions (and by the way, you were the one who brought
up the three-month rewrite, not me); I'm going to keep on following
the principle that "practicality beats purity", and release actual
working code.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How clean/elegant is Python's syntax?

2016-01-22 Thread Rustom Mody
On Friday, May 31, 2013 at 1:06:29 AM UTC+5:30, Steven D'Aprano wrote:
> On Thu, 30 May 2013 10:12:22 -0700, rusi wrote:
> 
> > On Thu, May 30, 2013 at 9:34 AM, Ma Xiaojun wrote:
> 
> >> Wait a minute! Isn't the most nature way of doing/thinking "generating
> >> 9x9 multiplication table" two nested loop?
> > 
> > Thats like saying that the most natur(al) way of using a car is to
> > attach a horse to it.
> >[...]
> > Likewise in the world of programming, 90% of programmers think
> > imperative/OO programming is natural while functional programming is
> > strange.  Just wait 10 years and see if things are not drastically
> > different!
> 
> It won't be. Functional programming goes back to Lisp, which is nearly as 
> old as Fortran and older than Cobol. There have been many decades for 
> functional languages to become mainstream, but they've never quite done 
> it. There's no reason to think that the next decade will see a change to 
> this.

Interesting point...
With interesting (counter)examples: 
http://blog.languager.org/2016/01/how-long.html

[With apologies for necroposting]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about how to do something in BeautifulSoup?

2016-01-22 Thread Peter Otten
inhahe wrote:

> I hope this is an appropriate mailing list for BeautifulSoup questions,
> it's been a long time since I've used python-list and I don't remember if
> third-party modules are on topic. I did try posting to the BeautifulSoup
> mailing list on Google groups, but I've waited a day or two and my message
> hasn't been approved yet.
> 
> Say I have the following HTML (I hope this shows up as plain text here
> rather than formatting):
> 
> "Is today the day?"
> 
> And I want to extract the "Is today the day?" part. There are other places
> in the document with  and , but this is the only place that
> uses color #00, so I want to extract anything that's within a color
> #00 style, even if it's nested multiple levels deep within that.
> 
> - Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's
> defined as #00
> - Sometimes the  is within the  and sometimes the  is
> within the .
> - There may be other discrepancies I haven't noticed yet
> 
> How can I do this in BeautifulSoup (or is this better done in lxml.html)?
> Thanks

I don't see how to do this with a lot of glue code, but it may get you 
started:

def recursive_attr(elem, path):
path = path.split("/")
for name in path:
if elem is None:
break
elem = getattr(elem, name)
return elem

def find(soup):
for outer in soup.find_all(
"span",
style=re.compile(r"color:\s*(RGB\(0,\s*0,\s* 0\)|#00)")):
for inner in [
recursive_attr(outer, "strong/em"),
recursive_attr(outer, "em/strong"),]:
if inner is not None:
yield inner.string

def normalize_ws(s):
return " ".join(s.split())

html = ...
soup = bs4.BeautifulSoup(html)
for match in find(soup):
print(normalize_ws(match))


-- 
https://mail.python.org/mailman/listinfo/python-list


one more question on regex

2016-01-22 Thread mg
python 3.4.3 

import re
re.search('(ab){2}','abzzabab')
<_sre.SRE_Match object; span=(4, 8), match='abab'>

>>> re.findall('(ab){2}','abzzabab')
['ab']

Why for search() the match is 'abab' and for findall the match is 'ab'? 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread Peter Otten
mg wrote:

> python 3.4.3
> 
> import re
> re.search('(ab){2}','abzzabab')
> <_sre.SRE_Match object; span=(4, 8), match='abab'>
> 
 re.findall('(ab){2}','abzzabab')
> ['ab']
> 
> Why for search() the match is 'abab' and for findall the match is 'ab'?

I suppose someone thought it was convenient for findall to return the 
explicit groups if there are any. If you want the whole match aka group(0) 
you can get that with

>>> re.findall('(?:ab){2}','abzzabab')
['abab']


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread mg
Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto:

> python 3.4.3
> 
> import re re.search('(ab){2}','abzzabab')
> <_sre.SRE_Match object; span=(4, 8), match='abab'>
> 
 re.findall('(ab){2}','abzzabab')
> ['ab']
> 
> Why for search() the match is 'abab' and for findall the match is 'ab'?

finditer seems to be consistent with search:
regex = re.compile('(ab){2}')

for match in regex.finditer('abzzababab'): 
  print ("%s: %s" % (match.start(), match.span() ))
... 
4: (4, 8)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Deprecation warnings for the future async and await keywords

2016-01-22 Thread Ian Kelly
On Fri, Jan 22, 2016 at 4:12 AM, Marco Buttu  wrote:
> I enabled the deprecation warnings in Python 3.5.1 and Python 3.6 dev, and I
> noticed that assigning to async or await does not issue any deprecation
> warning:
>
> $ python -Wd -c "import sys; print(sys.version); async = 33"
> 3.5.1 (default, Jan 21 2016, 19:59:28)
> [GCC 4.8.4]
> $ python -Wd -c "import sys; print(sys.version); async = 33"
> 3.6.0a0 (default:4b434a4770a9, Jan 12 2016, 13:01:29)
> [GCC 4.8.4]
>
>
> Is it normal?

They're not reserved words, see
https://www.python.org/dev/peps/pep-0492/#transition-plan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Deprecation warnings for the future async and await keywords

2016-01-22 Thread Marco Buttu

On 22/01/2016 16:59, Ian Kelly wrote:


On Fri, Jan 22, 2016 at 4:12 AM, Marco Buttu  wrote:



>I enabled the deprecation warnings in Python 3.5.1 and Python 3.6 dev, and I
>noticed that assigning to async or await does not issue any deprecation
>warning:
>
>$ python -Wd -c "import sys; print(sys.version); async = 33"
>3.5.1 (default, Jan 21 2016, 19:59:28)
>[GCC 4.8.4]



They're not reserved words, see
https://www.python.org/dev/peps/pep-0492/#transition-plan


Of course not, in fact I wrote "future keywords" in the subject, because 
they will be keywords in Python 3.7:


https://www.python.org/dev/peps/pep-0492/#deprecation-plans
--
Marco Buttu
--
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread Vlastimil Brom
2016-01-22 16:50 GMT+01:00 mg :
> Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto:
>
>> python 3.4.3
>>
>> import re re.search('(ab){2}','abzzabab')
>> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>>
> re.findall('(ab){2}','abzzabab')
>> ['ab']
>>
>> Why for search() the match is 'abab' and for findall the match is 'ab'?
>
> finditer seems to be consistent with search:
> regex = re.compile('(ab){2}')
>
> for match in regex.finditer('abzzababab'):
>   print ("%s: %s" % (match.start(), match.span() ))
> ...
> 4: (4, 8)
>
> --
> https://mail.python.org/mailman/listinfo/python-list

Hi,
as was already pointed out, findall "collects" the content of the
capturing groups (if present), rather than the whole matching text;

for repeated captures the last content of them is taken discarding the
previous ones; cf.:

>>> re.findall('(?i)(a)x(b)+','axbB')
[('a', 'B')]
>>>
(for multiple capturing groups in the pattern, a tuple of captured
parts are collected)

or with your example with differenciated parts of the string using
upper/lower case:
>>> re.findall('(?i)(ab){2}','aBzzAbAB')
['AB']
>>>

hth,
   vbr
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: one more question on regex

2016-01-22 Thread mg
Il Fri, 22 Jan 2016 21:10:44 +0100, Vlastimil Brom ha scritto:

> 2016-01-22 16:50 GMT+01:00 mg :
>> Il Fri, 22 Jan 2016 15:32:57 +, mg ha scritto:
>>
>>> python 3.4.3
>>>
>>> import re re.search('(ab){2}','abzzabab')
>>> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>>>
>> re.findall('(ab){2}','abzzabab')
>>> ['ab']
>>>
>>> Why for search() the match is 'abab' and for findall the match is
>>> 'ab'?
>>
>> finditer seems to be consistent with search:
>> regex = re.compile('(ab){2}')
>>
>> for match in regex.finditer('abzzababab'):
>>   print ("%s: %s" % (match.start(), match.span() ))
>> ...
>> 4: (4, 8)
>>
>> -- https://mail.python.org/mailman/listinfo/python-list
> 
> Hi,
> as was already pointed out, findall "collects" the content of the
> capturing groups (if present), rather than the whole matching text;
> 
> for repeated captures the last content of them is taken discarding the
> previous ones; cf.:
> 
 re.findall('(?i)(a)x(b)+','axbB')
> [('a', 'B')]

> (for multiple capturing groups in the pattern, a tuple of captured parts
> are collected)
> 
> or with your example with differenciated parts of the string using
> upper/lower case:
 re.findall('(?i)(ab){2}','aBzzAbAB')
> ['AB']


> hth,
>vbr

You explanation of re.findall() results is correct. My point is that the 
documentation states:

re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of 
strings

and this is not what re.findall does. IMHO it should be more reasonable 
to get back the whole matches, since this seems to me the most useful 
information for the user. In any case I'll go with finditer, that returns 
in match object all the infos that anyone can look for.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Question about how to do something in BeautifulSoup?

2016-01-22 Thread Mario R. Osorio

I think you'd do better using the pyparsing library


On Friday, January 22, 2016 at 9:02:00 AM UTC-5, inhahe wrote:
> I hope this is an appropriate mailing list for BeautifulSoup questions,
> it's been a long time since I've used python-list and I don't remember if
> third-party modules are on topic. I did try posting to the BeautifulSoup
> mailing list on Google groups, but I've waited a day or two and my message
> hasn't been approved yet.
> 
> Say I have the following HTML (I hope this shows up as plain text here
> rather than formatting):
> 
> "Is
> today the day?"
> 
> And I want to extract the "Is today the day?" part. There are other places
> in the document with  and , but this is the only place that
> uses color #00, so I want to extract anything that's within a color
> #00 style, even if it's nested multiple levels deep within that.
> 
> - Sometimes the color is defined as RGB(0, 0, 0) and sometimes it's defined
> as #00
> - Sometimes the  is within the  and sometimes the  is
> within the .
> - There may be other discrepancies I haven't noticed yet
> 
> How can I do this in BeautifulSoup (or is this better done in lxml.html)?
> Thanks

-- 
https://mail.python.org/mailman/listinfo/python-list